Interactive Post Mining Association Rules using Cost Complexity Pruning and Ontologies KDD

Size: px
Start display at page:

Download "Interactive Post Mining Association Rules using Cost Complexity Pruning and Ontologies KDD"

Transcription

1 Interactive Post Mining Association Rules using Cost Complexity Pruning and Ontologies KDD Ch. Raja Ramesh, Scholar, K.L.University. Associate Professor in CSE, SVEC, Tadepalligudem. K V V Ramana Assistant Professor in CSE, SVEC Tadepalligudem. K. Raghava Rao, Professor in CSE, KL University, Guntur. C V Sastry Professor In CSE RIT, Yanam. ABSTRACT In data mining, association rule mining is very strong and limited by the huge amount of delivered rules, because of these so many problems facing to implementation. To overcome these drawbacks, several methods were proposed in the literature such as item sets concise, redundancy reduction, and post processing. Based on statistical information by using these methods the extracted rules may not be useful for the user. Thus, it is crucial to help the decision-controller with an efficient post processing step in order to reduce the number of rules. In this paper we have implemented a new interactive approach using cost complexity pruning and filter discovered rules, by using this approach we have to reduce the tree size with minimum number of error of validation set. Key Words: clustering, classification, and association rules, knowledge discover database 1. INTRODUCTION Association rule mining, introduced in [1], is one of the most important tasks Knowledge Discovery in large Databases [2]. It aims to discover the implicative tendencies between the frequent data items in the databases that can be valuable information for the decision-controller. An association rule is defined as the implication form of X -> Y, described by using two measures support and confidence, where X and Y are the sets of items and X Y = ǿ. Apriori [1] is the first algorithm proposed in the association rule mining field and many other algorithms were derived from it, it proposes to extract all association rules satisfying minimum thresholds of support and confidence. It is very well known that mining algorithms can discover a prohibitive amount of association rules; for instance, thousands of rules are extracted from a database of several dozens of attributes and several hundreds of transactions. Moreover, if we increase the support threshold, then the efficient algorithms are discovered rules are obviously less; these are interesting for the user. As a result, it is necessary to bring the support threshold low in order to extract valuable information. Unfortunately, the large volume of rules is generated to become almost impossible to use when the number of rules are overpasses 100. Thus, it is crucial to help the decision-controller with an efficient technique for reducing the number of rules. To overcome this drawback, several methods were proposed in the literature. On the one hand, different algorithms we reintroduced to reduce the number of item sets by generating closed [4], maximal [5] or optimal item sets [6], and several algorithms to reduce the number of rules, using non redundant rules [7], [8], or pruning techniques [9]. On the other hand, post processing methods can improve the selection of discovered rules. Different complementary post processing methods may be used, like pruning, summarizing, grouping, or visualization [10]. Pruning consists in removing uninteresting or redundant rules. In summarizing, concise sets of rules are generated. Groups of rules are produced in the grouping process; and the visualization improves the readability of a large number of rules by using adapted graphical representations Knowledge Representation Knowledge-based systems have a computational model of some domain of interest in which symbols serve as surrogates for real world domain artifacts, such as physical objects, events, relationships, etc. The domain of interest can cover any part of the real world or any hypothetical system about which one desires to represent knowledge for computational purposes. A knowledgebased system maintains a knowledge base which stores the symbols of the computational model in form of statements about the domain, and it performs reasoning by manipulating these symbols. Applications can base their decisions on domain-relevant questions posed to a knowledge base Association Rule Notations And Definitions The association rule mining task can be stated as follows: let I = {i 1,i 2,i 3,.,i n } be a set of literals, called items. Let D = {t 1,t 2,t 3,.,t n } be a set of transactions over I. A nonempty subset of I is called item set and is defined as X ={ i 1,i 2,i 3,.,i k } In short, item set X can also be denoted as X = i 1,i 2,i 3,.,i n. For an item set, the number of items 16

2 is called length of the item set and an item set of length k is referred to as k-itemset. Each transaction t i contains an item set i 1,i 2,i 3,.,i k with a variable k number of items for each t i Definition 1. Let X is subset of I and T subset of D. We define the set of all transactions that contain the item set X as: Similarly, we describe the item sets contained in all the transactions T by: Definition 2. An association rule is an implication X Y, where X and Y are two item sets and X \ Y ¼ ;. The former, X, is called the antecedent of the rule, and the latter, Y, is called the consequent. A rule X Y is described using two important statistical factors: The support of the rule, defined as supp (X Y) = supp(x U Y) = t(x U Y), is the ratio of the number of transactions containing X U Y. If supp (X Y) = s, s % of transactions contains the item set X U Y The confidence of the rule, defined as con f(x Y ) = supp( X Y)/ =supp(x) = supp(x U Y)/ =supp(x)=c, Is the ratio (c %) of the number of transactions that, containing X, also contains Y User-Driven Association Rule Mining Interestingness measures were proposed in order to discover only those association rules that are interesting according to these measures. They have been divided into objective measures and subjective measures. Objective measures depend only on data structure. Many survey papers summarize and compare the objective measure definitions and properties [13], [14]. Unfortunately, being restricted to data evaluation, the objective measures are not sufficient to reduce the number of extracted rules and to capture the interesting ones. Several approaches integrating user knowledge have been proposed. In addition, subjective measures were proposed to integrate explicitly the decision-maker knowledge and to offer a better selection of interesting association rules. Silbershatz and Tuzilin [3] proposed a classification of subjective measures in unexpectedness a pattern is interesting if it is surprising to the user and actionability a pattern is interesting if it can help the user take some actions. As early as 1994, in the KEFIR system [15], the key finding and deviation notions were suggested. Grouped in findings, deviations represent the difference between the actual and the expected values. KEFIR defines interestingness of a key finding in terms of the estimated benefits, and potential savings of taking corrective actions that restore the deviation back to its expected value. These corrective actions are specified in advance by the domain expert for various classes of deviations. Later, Klemettinen et al. [16] proposed templates to describe the form of interesting rules (inclusive templates) and not interesting rules (restrictive templates). The idea of using templates for association rule extraction was reused in [17]. Other approaches proposed to use a rule-like formalism to express user expectations [3], [10], [18], and the discovered association rules are pruned/summarized by comparing them to user expectations. Imielinski et al. [19] proposed a query language for association rule pruning based on SQL, called M-SQL. It allows imposing constraints on the condition and/or the consequent of the association rules. In the same domain of query-based association rule pruning, but more constraints driven, Ng et al. [20] proposed architecture for exploratory mining of rules. The authors suggested a set of solutions for several problems: the lack of user exploration and control, the rigid notion of relationship, and the lack of focus. In order to overcome these problems, Ng et al. proposed a new query language called Constrained Association Query and they pointed out the importance of user feedback and user flexibility in choosing interestingness metrics. Another related approach was proposed by An et al. in [21] where the authors introduced domain knowledge in order to prune and summarize discovered rules. The first algorithm uses data taxonomy, defined by user, in order to describe the semantic distance between rules, and in order to group the rules. The second algorithm allows grouping the discovered rules that share at least one item in the antecedent and the consequent. In 2007, a new methodology was proposed in [22] to prune and organize rules with the same consequent. The authors suggested transforming the database in an association rule base in order to extract second-level association rules. Called meta rules, the extracted rules r1! r2 express relations between the two association rules and help pruning/grouping discovered rules. 2. ONTOLOGIES IN DATA MINING Ontology describes basic concepts in a domain and defines relations among them. Basic building blocks of ontology design include: classes or concepts, properties of each concept describing various features and attributes of the concept, restrictions on slots (facets). In knowledge engineering and Semantic Web fields, ontologies have interested researchers since their first proposition in the philosophy branch by Aristotle. Ontologies have evolved over the years from controlled vocabularies to thesauri (glossaries), and later, to taxonomies [23]. In the early 1990s, ontology was defined by Gruber as a formal, explicit specification of a shared conceptualization [24]. By conceptualization, we understand here an abstract model of some phenomenon described by its important concepts. The formal notion denotes the idea that machines should be able to interpret ontology. Moreover, explicit refers to the transparent 17

3 definition of ontology elements. Finally, shared outlines that an ontology brings together some knowledge common to a certain group, and not individual knowledge. Several other definitions are proposed in the literature. For instance, in [25], an ontology is viewed as a logical theory accounting for the intended meaning of a formal vocabulary, and later, in 2001, Maedche and Staab proposed a more artificial-intelligence-oriented definition. Thus, ontologies are described as (Meta) data schemas, providing a controlled vocabulary of concepts, each with an explicitly defined and machine processable semantics [11]. Depending on the granularity, four types of ontologies are proposed in the literature: upper (or top level) ontologies, domain ontologies, task ontologies, and application ontologies [25]. Top-level ontologies deal with general concepts; while the other three types deal with domains specific concepts. Ontologies, introduced in data mining for the first time in early 2000, can be used in several ways [26]: Domain and Background Knowledge Ontologies, Ontologies for Data Mining Process, or Metadata Ontologies. Background Knowledge Ontologies organize domain knowledge and play important roles at several levels of the knowledge discovery process. Ontologies for Data Mining Process codify mining process description and choose the most appropriate task according to the given problem; while Metadata Ontologies describe the construction process of items. In this paper, we focus on Domain and Background Knowledge Ontologies. The first idea of using Domain Ontologies was introduced by Srikant and Agrawal with the concept of Generalized Association Rules (GAR) [26]. The authors proposed taxonomies of mined data (an is-a hierarchy) in order to generalize/specify rules. In [27], it is suggested that ontology of background knowledge can benefit all the phases of a KDD cycle described in CRISP- DM. The role of ontologies is based on the given mining task and method, and on data characteristics. From business understanding to deployment, the authors delivered a complete example of using ontologies in a cardiovascular risk domain. Related to Generalized Association Rules, the notion of rising was presented in [28]. Rising is the operation of generalizing rules (making rules more abstract) in order to increase support in keeping confidence high enough. This allows for strong rules to be discovered and also to obtain sufficient support for rules that, before raising, would not have minimum support due to the particular items they referred to. The difference with Generalized Association Rules is that this solution proposes to use a specific level for raising and mining. Another contribution, very close to [26], [27], uses taxonomies to generalize and prune association rules. The authors developed an algorithm, called GART [43], which, having several taxonomies over attributes, uses iteratively each taxonomy in order to generalize rules, and then, prunes redundant rules at each step. A very recent approach, [30], uses ontologies in a preprocessing step. Several domain-specific and user-defined constraints are introduced and grouped into two types: Pruning constraints, meant to filter uninteresting items, and abstraction constraints permitting the generalization of items toward ontology concepts. The data set is first preprocessed according to the constraints extracted from the ontology, and then, the data mining step takes place. The difference with our approach is that, first; they apply constraints in the preprocessing task, whereas we work in the post processing task. The advantage of the pruning constraints is that it permits to exclude from the start the information that the Fig 1: Description of the frame work User is not interested in, thus permitting to apply the Apriori algorithm to this new database. Let us consider that the user is not sure about which items he/she should prune. In this case, he/she should create several pruning tests, and for each test, he/she will have to apply the Apriori algorithm whose execution time is very high. Second, they use SeRQL in order to express user knowledge, and we propose a more expressive and flexible language for user expectation representation, i.e., Rule Schemas. The item-relatedness filter was proposed by Natarajan and Shekar [31]. Starting from the idea that the discovered rules are generally obvious, they introduced the idea of relatedness between the items measuring their similarity according to item taxonomies. This measure computes the relatedness of all the couples of rule items. We can notice that we can compute the relatedness for the items of the condition or/and the consequent, or between the condition and the consequent of the rule. While Natarajan and Shekar measure the itemrelatedness of an association rule, Garcia et al. developed in [32] and extended in [33] a novel technique called Knowledge Cohesion (KC). The proposed metric is composed of two new ones: Semantic Distance (SD) and Relevance assessment (RA). The SD one measures how close two items are semantically, using the ontology each type of relation being weighted differently. The numerical value RA expresses the interest of the user for certain pairs of items in order to encourage the selection of rules containing those pairs. In this paper, the ontology is used only for the SD computation, differing from our approach which uses ontologies for Rule Schemas definition. Moreover, the authors propose a metric-based approach for item set selection, while we propose 18

4 pruning/filtering schemas based method of association rules. Pruned subtree: For any tree T, T 2 is a pruned subtree of T if T 2 is a tree with the same root node as T and all nodes of T are also nodes of T. Denote T _T if T is a pruned subtree of T. Given a tree T and a real number α, the cost-complexity risk of T with respect to α is Where is the number of terminal nodes and R(T) is the re-substitution risk estimate of T To generate a sequence of pruned subtree in step 1, the cost complexity pruning technique developed by Breiman et. At is used. In generating the sequence of subtree, only the learning sample is used. Given any real value and an initial tree T 0, there exists a sequence of real values - < α 1 = α min < α 1 < α 2 < α 3..<+ and a sequence of pruned subtree T 0 > T 1 > T k such that the smallest optimally pruned subtree of T 0 for a given α is Where confidence of any of its simplifications. We are taken the example from [34]. Example. Let us consider the following three association rules: We can note that the last two rules are the simplifications of the first one. The theory of Bayardo et al. tells us that the first rule is interesting only if its confidence improves the confidence of all its simplifications. In our case, the first rule does not improve the confidence of 90 percent of its simplifications (the second rule), so it is not considered as an interesting rule, and it is not selected. The itemrelatedness filter (IRF) was proposed by Shekar and Natarajan [31]. Starting from the idea that the discovered rules are generally obvious, they introduced the idea of relatedness between items measuring their semantic distance in item taxonomies. This measure computes the relatedness of all the couples of rule items. We can notice that we can compute the relatedness for the items of the condition or/and the consequent, or between the condition and the consequent of the rule. In our approach we use the relatedness because users are interested to find association between itemset with different functionalities, coming from different domains. This measure is computed as the minimum distance between the condition items and the consequent items as presented hereafter. The distance between each pair of items from the condition and, respectively, the consequent is computed as the minimum path that connects the two items in the ontology, defined as d(a,b). Thus, the itemrelatedness (IR) for a rule is defined as the minimum of all the distance computed between the items in the condition and the consequent: Table 1Example of Questions and meaning is the branch of T k stemming from node t, and R(t) is the re-substitution risk estimate of node t based on the learning sample. 3. FILTERS In order to reduce the number of rules, three filters integrate the framework: operators applied over rule schemas, minimum improvement constraint filter [12], and item relatedness filter [31]. Minimum improvement constraint filter [12] ( MICF) selects only those rules whose confidence is greater with minimp than the Question number and description q1 q2 q3 q11 q44 q47 q48 q70 Convenient transport City center access Shopping facilities Is it safe to take a walk in the district at night Apartment sound proofing Apartment ventilation Apartment standards Clarity of the documents from Nantes Habitat 19

5 4. EXPERIMENTAL SETUP The study is based on a questionnaires data base provided by the Nantes Habitat, dealing with customer satisfaction concerning accommodation. Since 2003, customers samples are available in database, out of we have taken 1500 sample, 67 questions are available in the questionnaires with following four possible answers: very satisfaction, quite satisfied, rather not satisfied and dissatisfied coaded as {1,2,3,4}. The above table 1 introduces a sample questions with the meaning of each instance. The item q1=1 that describes customer is very satisfied by the transport in his district. The most interesting rule, we fixed a minimum support 2% and a maximum support of 30% and a minimum confidence of 80% for the association rule mining process, we have to use appiori algorithm in order to extent association rules.for example, the following association rule describes the relationship between questions q2, q3, q47, and the question q70. Thus, if the customers are very satisfied by the access to the city center (q2), the shopping facilities (q3), and the apartment ventilation (q47), then they can be satisfied by the documents received from Nantes Habitat Agency (q70) with a confidence of 90.6%: R1 : q2 =1 q3 =1 q47=1 ==> q70 =1, Support =13.8%, confidence = 90.6% 5. CONCLUSION By applying the a new interactive approach using cost complexity pruning and filter discovered rules we allowed to integration of domain expert knowledge in the post processing in order to reduce the number of rules to very less number, based on this the quality of the filtered rules was validated by the expert through the interactive process. 6. REFERENCES [1] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules between Sets of Items in Large Databases, Proc. ACM SIGMOD, pp , [2] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. AAAI/MITPress, [3] A. Silberschatz and A. Tuzhilin, What Makes Patterns Interesting in Knowledge Discovery Systems, IEEE Trans. Knowledge and Data Eng. vol. 8, no. 6, pp , Dec [4] M.J. Zaki and M. Ogihara, Theoretical Foundations of Association Rules, Proc. Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD 98), pp. 1-8, June [5] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, Mafia: A Maximal Frequent Itemset Algorithm, IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp , Nov [6] J. Li, On Optimal Rule Discovery, IEEE Trans. Knowledge and Data Eng., vol. 18, no. 4, pp , Apr [7] M.J. Zaki, Generating Non-Redundant Association Rules, Proc. Int l Conf. Knowledge Discovery and Data Mining, pp , [8] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Efficient Mining of Association Rules Using Closed Itemset Lattices, Information Systems, vol. 24, pp , [9] B. Baesens, S. Viaene, and J. Vanthienen, Post-Processing of Association Rules, Proc. Workshop Post-Processing in Machine Learning and Data Mining: Interpretation, Visualization, Integration,and Related Topics with Sixth ACM SIGKDD, pp , [10] B. Liu, W. Hsu, K. Wang, and S. Chen, Visually Aided Exploration of Interesting Association Rules, Proc. Pacific-Asia Conf. Knowledge iscovery and Data Mining (PAKDD), pp , [11] A. Maedche and S. Staab, Ontology Learning for the Semantic Web, IEEE Intelligent Systems, vol. 16, no. 2, pp , Mar [12] R.J. Bayardo, Jr., R. Agrawal, and D. Gunopulos, Constraint-Based Rule Mining in Large, Dense Databases, Proc. 15th Int lconf. Data Eng. (ICDE 99), pp , [13] F. Guillet and H. Hamilton, Quality Measures in Data Mining. Springer, [14] P.-N. Tan, V. Kumar, and J. Srivastava, Selecting the Right Objective Measure for Association Analysis, Information Systems, vol. 29, pp , [15] G. Piatetsky-Shapiro and C.J. Matheus, The Interestingness of Deviations, Proc. AAAI 94 Workshop Knowledge Discovery in Databases, pp , [16] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, Finding Interesting Rules from Large Sets of Discovered Association Rules, Proc. Int l Conf. Information and Knowledge Management (CIKM), [17] E. Baralis and G. Psaila, Designing Templates for Mining Association Rules, J. Intelligent Information Systems, vol. 9, pp. 7-32, [18] B. Padmanabhan and A. Tuzhuilin, Unexpectedness as a Measure of Interestingness in Knowledge Discovery, Proc. Workshop Information Technology and Systems (WITS), pp ,1997. [19] T. Imielinski, A. Virmani, and A. Abdulghani, Datamine: Application Programming Interface and Query Language for Database Mining, Proc. Int l Conf. Knowledge Discovery and Data Mining (KDD), pp , [20] R.T. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, Exploratory Mining and Pruning Optimizations of 20

6 Constrained Associations Rules, Proc. ACM SIGMOD Int l Conf. Management of Data, vol. 27,pp , [21] A. An, S. Khan, and X. Huang, Objective and Subjective Algorithms for Grouping Association Rules, Proc. Third IEEE Int l Conf. Data Mining (ICDM 03), pp , [22] A. Berrado and G.C. Runger, Using Meta rules to Organize and Group Discovered Association Rules, Data Mining and Knowledge Discovery, vol. 14, no. 3, pp , [23] M. Us hold and M. Gru ninger, Ontologies: Principles, Methods,and Applications, Knowledge Eng. Rev., vol. 11, pp , [24] T.R. Gruber, A Translation Approach to Portable Ontology Specifications, Knowledge Acquisition, vol. 5, pp , [25] N. Guarino, Formal Ontology in Information Systems, Proc. FirstInt l Conf. Formal Ontology in Information Systems, pp. 3-15, [26] H. Nigro, S.G. Cisaro, and D. Xodo, Data Mining with Ontologies: Implementations, Findings and Frameworks. Idea Group, Inc., [27] V. Svatek and M. Tomeckova, Roles of Medical Ontology inassociation Mining Crisp-dm Cycle, Proc. Workshop Knowledge Discovery and Ontologies in ECML/PKDD, [28] X. Zhou and J. Geller, Raising, to Enhance Rule Mining in Web Marketing with the Use of an Ontology, Data Mining with Ontologies: Implementations, Findings and Frameworks, pp ,Idea Group Reference, [29] M.A. Domingues and S.A. Rezende, Using Taxonomies to Facilitate the Analysis of the Association Rules, Proc. Second Int l Workshop Knowledge Discovery and Ontologies, held with ECML/PKDD, pp , [30] A. Bellandi, B. Furletti, V. Grossi, and A. Romei, Ontology- Driven Association Rule Extraction: A Case Study, Proc. Workshop Context and Ontologies: Representation and Reasoning, pp. 1-10,2007. [31] R. Natarajan and B. Shekar, A Relatedness-Based Data- Driven Approach to Determination of Interestingness of AssociationRules, Proc ACM Symp. Applied Computing (SAC), pp , [32] A.C.B. Garcia and A.S. Vivacqua, Does Ontology Help Make Sense of a Complex World or Does It Create a Biased Interpretation? Proc. Sensemaking Workshop in CHI 08 Conf. Human Factorsin Computing Systems, [33] A.C.B. Garcia, I. Ferraz, and A.S. Vivacqua, From Data to Knowledge Mining, Artificial Intelligence for Eng. Design, Analysis and Manufacturing, vol. 23, pp , [34] Claudia Marinica and Fabrice Guillet Knowledge-Based Interactive Postmining of Association Rules Using Ontologies evol. 22, NO. 6, JUN. 21

Post-Processing of Discovered Association Rules Using Ontologies

Post-Processing of Discovered Association Rules Using Ontologies Post-Processing of Discovered Association Rules Using Ontologies Claudia Marinica, Fabrice Guillet and Henri Briand LINA Ecole polytechnique de l'université de Nantes, France {claudia.marinica, fabrice.guillet,

More information

Ontology Based Data Analysing Approach for Actionable Knowledge Discovery

Ontology Based Data Analysing Approach for Actionable Knowledge Discovery IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 78-661,p-ISSN: 78-877, Volume 16, Issue 6, Ver. IV (Nov Dec. 14), PP 39-45 Ontology Based Data Analysing Approach for Actionable Knowledge Discovery

More information

Reduce convention for Large Data Base Using Mathematical Progression

Reduce convention for Large Data Base Using Mathematical Progression Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 4 (2016), pp. 3577-3584 Research India Publications http://www.ripublication.com/gjpam.htm Reduce convention for Large Data

More information

AN ASSOCIATION RULES SURVEY FOR REDUNDANCY REDUCTION AND DESIRED RULES WITH ONTOLOGY

AN ASSOCIATION RULES SURVEY FOR REDUNDANCY REDUCTION AND DESIRED RULES WITH ONTOLOGY AN ASSOCIATION RULES SURVEY FOR REDUNDANCY REDUCTION AND DESIRED RULES WITH ONTOLOGY Dr. V. Vijaya Kumar 1, Dr. P. Suresh Varma 2, G. Srinivas 3 1 Dean, Dept of CSE, GIET, Rajhamundry, E.G.Dist,A.P 2,

More information

Local Mining of Association Rules with Rule Schemas

Local Mining of Association Rules with Rule Schemas Local Mining of Association Rules with Rule Schemas Andrei Olaru Claudia Marinica Fabrice Guillet LINA, Ecole Polytechnique de l'universite de Nantes rue Christian Pauc BP 50609 44306 Nantes Cedex 3 E-mail:

More information

ETP-Mine: An Efficient Method for Mining Transitional Patterns

ETP-Mine: An Efficient Method for Mining Transitional Patterns ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com

More information

Discovering interesting rules from financial data

Discovering interesting rules from financial data Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques

Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques Tom Brijs, Koen Vanhoof and Geert Wets Limburg University Centre, Faculty of Applied Economic Sciences, B-3590 Diepenbeek, Belgium

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets : A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets J. Tahmores Nezhad ℵ, M.H.Sadreddini Abstract In recent years, various algorithms for mining closed frequent

More information

Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach

Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach Gediminas Adomavicius Computer Science Department Alexander Tuzhilin Leonard N. Stern School of Business Workinq Paper Series

More information

Item Set Extraction of Mining Association Rule

Item Set Extraction of Mining Association Rule Item Set Extraction of Mining Association Rule Shabana Yasmeen, Prof. P.Pradeep Kumar, A.Ranjith Kumar Department CSE, Vivekananda Institute of Technology and Science, Karimnagar, A.P, India Abstract:

More information

Association Rules: Problems, solutions and new applications

Association Rules: Problems, solutions and new applications Association Rules: Problems, solutions and new applications María N. Moreno, Saddys Segrera and Vivian F. López Universidad de Salamanca, Plaza Merced S/N, 37008, Salamanca e-mail: mmg@usal.es Abstract

More information

arxiv: v1 [cs.db] 7 Dec 2011

arxiv: v1 [cs.db] 7 Dec 2011 Using Taxonomies to Facilitate the Analysis of the Association Rules Marcos Aurélio Domingues 1 and Solange Oliveira Rezende 2 arxiv:1112.1734v1 [cs.db] 7 Dec 2011 1 LIACC-NIAAD Universidade do Porto Rua

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad

More information

An Efficient Algorithm for Mining Association Rules using Confident Frequent Itemsets

An Efficient Algorithm for Mining Association Rules using Confident Frequent Itemsets 0 0 Third International Conference on Advanced Computing & Communication Technologies An Efficient Algorithm for Mining Association Rules using Confident Frequent Itemsets Basheer Mohamad Al-Maqaleh Faculty

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Efficient Incremental Mining of Top-K Frequent Closed Itemsets Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,

More information

Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods

Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods Chapter 6 Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods 6.1 Bibliographic Notes Association rule mining was first proposed by Agrawal, Imielinski, and Swami [AIS93].

More information

Mining Imperfectly Sporadic Rules with Two Thresholds

Mining Imperfectly Sporadic Rules with Two Thresholds Mining Imperfectly Sporadic Rules with Two Thresholds Cu Thu Thuy and Do Van Thanh Abstract A sporadic rule is an association rule which has low support but high confidence. In general, sporadic rules

More information

Approximation of Frequency Queries by Means of Free-Sets

Approximation of Frequency Queries by Means of Free-Sets Approximation of Frequency Queries by Means of Free-Sets Jean-François Boulicaut, Artur Bykowski, and Christophe Rigotti Laboratoire d Ingénierie des Systèmes d Information INSA Lyon, Bâtiment 501 F-69621

More information

A Review on Mining Top-K High Utility Itemsets without Generating Candidates

A Review on Mining Top-K High Utility Itemsets without Generating Candidates A Review on Mining Top-K High Utility Itemsets without Generating Candidates Lekha I. Surana, Professor Vijay B. More Lekha I. Surana, Dept of Computer Engineering, MET s Institute of Engineering Nashik,

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

A Literature Review of Modern Association Rule Mining Techniques

A Literature Review of Modern Association Rule Mining Techniques A Literature Review of Modern Association Rule Mining Techniques Rupa Rajoriya, Prof. Kailash Patidar Computer Science & engineering SSSIST Sehore, India rprajoriya21@gmail.com Abstract:-Data mining is

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Association Rules Mining:References

Association Rules Mining:References Association Rules Mining:References Zhou Shuigeng March 26, 2006 AR Mining References 1 References: Frequent-pattern Mining Methods R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm

More information

Knowledge discovery from XML Database

Knowledge discovery from XML Database Knowledge discovery from XML Database Pravin P. Chothe 1 Prof. S. V. Patil 2 Prof.S. H. Dinde 3 PG Scholar, ADCET, Professor, ADCET Ashta, Professor, SGI, Atigre, Maharashtra, India Maharashtra, India

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results

Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results Yaochun Huang, Hui Xiong, Weili Wu, and Sam Y. Sung 3 Computer Science Department, University of Texas - Dallas, USA, {yxh03800,wxw0000}@utdallas.edu

More information

Finding frequent closed itemsets with an extended version of the Eclat algorithm

Finding frequent closed itemsets with an extended version of the Eclat algorithm Annales Mathematicae et Informaticae 48 (2018) pp. 75 82 http://ami.uni-eszterhazy.hu Finding frequent closed itemsets with an extended version of the Eclat algorithm Laszlo Szathmary University of Debrecen,

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

arxiv: v1 [cs.db] 6 Mar 2008

arxiv: v1 [cs.db] 6 Mar 2008 Selective Association Rule Generation Michael Hahsler and Christian Buchta and Kurt Hornik arxiv:0803.0954v1 [cs.db] 6 Mar 2008 September 26, 2018 Abstract Mining association rules is a popular and well

More information

DESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE

DESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE DESIGN AND CONSTRUCTION OF A FREQUENT-PATTERN TREE 1 P.SIVA 2 D.GEETHA 1 Research Scholar, Sree Saraswathi Thyagaraja College, Pollachi. 2 Head & Assistant Professor, Department of Computer Application,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES Veronica Oliveira de Carvalho Professor of Centro Universitário de Araraquara Araraquara, São Paulo, Brazil Student of São Paulo University

More information

Association Rule Selection in a Data Mining Environment

Association Rule Selection in a Data Mining Environment Association Rule Selection in a Data Mining Environment Mika Klemettinen 1, Heikki Mannila 2, and A. Inkeri Verkamo 1 1 University of Helsinki, Department of Computer Science P.O. Box 26, FIN 00014 University

More information

Data Structure for Association Rule Mining: T-Trees and P-Trees

Data Structure for Association Rule Mining: T-Trees and P-Trees IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new

More information

Objective-Oriented Utility-Based Association Mining

Objective-Oriented Utility-Based Association Mining Objective-Oriented Utility-Based Association Mining Yi-Dong Shen Laboratory of Computer Science Institute of Software Chinese Academy of Sciences Beijing 00080 P. R. China ydshen@ios.ac.cn Qiang Yang Zhong

More information

A Quantified Approach for large Dataset Compression in Association Mining

A Quantified Approach for large Dataset Compression in Association Mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 3 (Nov. - Dec. 2013), PP 79-84 A Quantified Approach for large Dataset Compression in Association Mining

More information

Grouping Association Rules Using Lift

Grouping Association Rules Using Lift Grouping Association Rules Using Lift Michael Hahsler Intelligent Data Analysis Group, Southern Methodist University 11th INFORMS Workshop on Data Mining and Decision Analytics November 12, 2016 M. Hahsler

More information

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking Shariq Bashir National University of Computer and Emerging Sciences, FAST House, Rohtas Road,

More information

Keywords : Data mining, Association Rule Mining, Closed itemset, Frequent Itemset, KDD,PEPP. GJCST Classification : H.2.8

Keywords : Data mining, Association Rule Mining, Closed itemset, Frequent Itemset, KDD,PEPP. GJCST Classification : H.2.8 Global Journal of Computer Science and Technology Volume 11 Issue 19 Version 1.0 2011 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN:

More information

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and

More information

A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

More information

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical

More information

Memory issues in frequent itemset mining

Memory issues in frequent itemset mining Memory issues in frequent itemset mining Bart Goethals HIIT Basic Research Unit Department of Computer Science P.O. Box 26, Teollisuuskatu 2 FIN-00014 University of Helsinki, Finland bart.goethals@cs.helsinki.fi

More information

Data Mining Query Scheduling for Apriori Common Counting

Data Mining Query Scheduling for Apriori Common Counting Data Mining Query Scheduling for Apriori Common Counting Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

The Interestingness Tool for Search in the Web

The Interestingness Tool for Search in the Web The Interestingness Tool for Search in the Web e-print, Gilad Amar and Ran Shaltiel Software Engineering Department, The Jerusalem College of Engineering Azrieli POB 3566, Jerusalem, 91035, Israel iaakov@jce.ac.il,

More information

Optimizing subset queries: a step towards SQL-based inductive databases for itemsets

Optimizing subset queries: a step towards SQL-based inductive databases for itemsets 0 ACM Symposium on Applied Computing Optimizing subset queries: a step towards SQL-based inductive databases for itemsets Cyrille Masson INSA de Lyon-LIRIS F-696 Villeurbanne, France cyrille.masson@liris.cnrs.fr

More information

The Fuzzy Search for Association Rules with Interestingness Measure

The Fuzzy Search for Association Rules with Interestingness Measure The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of

More information

Closed Non-Derivable Itemsets

Closed Non-Derivable Itemsets Closed Non-Derivable Itemsets Juho Muhonen and Hannu Toivonen Helsinki Institute for Information Technology Basic Research Unit Department of Computer Science University of Helsinki Finland Abstract. Itemset

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Mining the optimal class association rule set

Mining the optimal class association rule set Knowledge-Based Systems 15 (2002) 399 405 www.elsevier.com/locate/knosys Mining the optimal class association rule set Jiuyong Li a, *, Hong Shen b, Rodney Topor c a Dept. of Mathematics and Computing,

More information

Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities

Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities RESEARCH ARTICLE OPEN ACCESS Multidimensional Data Mining to Determine Association Rules in an Assortment of Granularities C. Usha Rani 1, B. Rupa Devi 2 1, 2 Asst. Professor of Department of CSE ABSTRACT

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology

More information

Sensitive Rule Hiding and InFrequent Filtration through Binary Search Method

Sensitive Rule Hiding and InFrequent Filtration through Binary Search Method International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 5 (2017), pp. 833-840 Research India Publications http://www.ripublication.com Sensitive Rule Hiding and InFrequent

More information

Mixture models and frequent sets: combining global and local methods for 0 1 data

Mixture models and frequent sets: combining global and local methods for 0 1 data Mixture models and frequent sets: combining global and local methods for 1 data Jaakko Hollmén Jouni K. Seppänen Heikki Mannila Abstract We study the interaction between global and local techniques in

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

Structure of Association Rule Classifiers: a Review

Structure of Association Rule Classifiers: a Review Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be

More information

Data Mining Support in Database Management Systems

Data Mining Support in Database Management Systems Data Mining Support in Database Management Systems Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology, Poland {morzy,marek,mzakrz}@cs.put.poznan.pl Abstract. The most

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

Parallel Mining of Maximal Frequent Itemsets in PC Clusters Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan

More information

Data Mining: Concepts and Techniques. Mining Frequent Patterns, Associations, and Correlations. Chapter 5.4 & 5.6

Data Mining: Concepts and Techniques. Mining Frequent Patterns, Associations, and Correlations. Chapter 5.4 & 5.6 Data Mining: Concepts and Techniques Mining Frequent Patterns, Associations, and Correlations Chapter 5.4 & 5.6 January 18, 2007 CSE-4412: Data Mining 1 Chapter 5: Mining Frequent Patterns, Association

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Implementation of object oriented approach to Index Support for Item Set Mining (IMine)

Implementation of object oriented approach to Index Support for Item Set Mining (IMine) Implementation of object oriented approach to Index Support for Item Set Mining (IMine) R.SRIKANTH, 2/2 M.TECH CSE, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, ADITYA INSTITUTE OF TECHNOLOGY AND MANAGEMENT,

More information

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Reducing redundancy in characteristic rule discovery by using integer programming techniques

Reducing redundancy in characteristic rule discovery by using integer programming techniques Intelligent Data Analysis 4 (2000) 229 240 229 IOS Press Reducing redundancy in characteristic rule discovery by using integer programming techniques Tom Brijs, Koen Vanhoof and Geert Wets Department of

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

Association Rule Learning

Association Rule Learning Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association

More information

AN ONTOLOGY-BASED FRAMEWORK FOR MINING PATTERNS IN THE PRESENCE OF BACKGROUND KNOWLEDGE

AN ONTOLOGY-BASED FRAMEWORK FOR MINING PATTERNS IN THE PRESENCE OF BACKGROUND KNOWLEDGE AN ONTOLOGY-BASED FRAMEWORK FOR MINING PATTERNS IN THE PRESENCE OF BACKGROUND KNOWLEDGE C. Antunes* *Instituto Superior Técnico / Technical University of Lisbon, Portugal claudia.antunes@ist.utl.pt Keywords:

More information

DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING

DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING Huebner, Richard A. Norwich University rhuebner@norwich.edu ABSTRACT Association rule interestingness measures are used to help select

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS ABSTRACT V. Purushothama Raju 1 and G.P. Saradhi Varma 2 1 Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P., India 2 Department

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania

More information

Sequential Pattern Mining Methods: A Snap Shot

Sequential Pattern Mining Methods: A Snap Shot IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-661, p- ISSN: 2278-8727Volume 1, Issue 4 (Mar. - Apr. 213), PP 12-2 Sequential Pattern Mining Methods: A Snap Shot Niti Desai 1, Amit Ganatra

More information

rule mining can be used to analyze the share price R 1 : When the prices of IBM and SUN go up, at 80% same day.

rule mining can be used to analyze the share price R 1 : When the prices of IBM and SUN go up, at 80% same day. Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules Anthony K. H. Tung 1 Hongjun Lu 2 Jiawei Han 1 Ling Feng 3 1 Simon Fraser University, British Columbia, Canada. fkhtung,hang@cs.sfu.ca

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare

More information

Role of Association Rule Mining in DNA Microarray Data - A Research

Role of Association Rule Mining in DNA Microarray Data - A Research Role of Association Rule Mining in DNA Microarray Data - A Research T. Arundhathi Asst. Professor Department of CSIT MANUU, Hyderabad Research Scholar Osmania University, Hyderabad Prof. T. Adilakshmi

More information

Mining association rules using frequent closed itemsets

Mining association rules using frequent closed itemsets Mining association rules using frequent closed itemsets Nicolas Pasquier To cite this version: Nicolas Pasquier. Mining association rules using frequent closed itemsets. Encyclopedia of Data Warehousing

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

CMRULES: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences

CMRULES: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences CMRULES: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences Philippe Fournier-Viger 1, Usef Faghihi 1, Roger Nkambou 1, Engelbert Mephu Nguifo 2 1 Department of Computer Sciences,

More information

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS K. Kavitha 1, Dr.E. Ramaraj 2 1 Assistant Professor, Department of Computer Science,

More information

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept

More information

Induction of Association Rules: Apriori Implementation

Induction of Association Rules: Apriori Implementation 1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University

More information