Challenges and Interesting Research Directions in Associative Classification
|
|
- Miranda Lynch
- 5 years ago
- Views:
Transcription
1 Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Abstract Utilising association rule discovery methods to construct classification systems in data mining is known as associative classification. In the last few years, associative classification algorithms such as CBA, CMAR and MMAC showed experimentally that they generate more accurate classifiers than traditional classification approaches such as decision trees and rule induction. However, there is room to improve further the performance and/or the outcome quality of these algorithms. This paper highlights new research directions within associative classification approach, which could improve solution quality and performance and also minimise drawbacks and limitations. We discuss potential research areas such as incremental learning, noise in test data sets, exponential growth of rules and many others. 1. Introduction Since the introduction of association rule discovery, it continues to be an active research area in data mining. Association rule discovery finds associations among items in a transactional database [1]. Classification is another important data miming task. The goal of classification is to build a set of rules (a classifier) from labelled examples known as the training data set, in order to classify previously unseen examples, known as test data set, as accurately as possible. The primary difference between classification and association rule is that the former goal is to predict the class attribute in the test data set, whereas the latter aims to discover correlations among items in a database. Associative classification (AC) employs association rule discovery methods to find the rules from classification benchmarks. In 1998, AC was successfully used to build classifiers by [7] and later attracted many researchers, e.g. [6, 13], from data mining and machine learning communities. Several studies [6, 7, 11, 13] provided evidence that AC algorithms are able to extract more accurate classifiers than traditional classification techniques, such as decision trees [9], rule induction [3] and probabilistic [4] approaches. However, there are some challenges and issues (described in Section 3) in AC which if considered will make this approach widely used especially for real world classification problems. Examples for such challenges are incremental learning, noise in test data sets, and the extraction of multi-label rules. The goal of this paper is to discuss drawbacks and limitations of AC approach and to highlight some of its important future research directions. This could be useful for researchers who are interested to explore this scientific field. The rest of the paper is organised as follows: AC and a simple example to demonstrate its main phases are given in Section. Important issues and future trends in AC are raised in Sections 3. Finally, Section 4 is devoted to conclusions.. Associative Classification Problem In associative classification, the training data set T has m distinct attributes A 1, A,, A m and C is a list of class labels. The number of rows in T is denoted T. Attributes could be categorical (meaning they take a value from a finite set of possible values) or continuous (where they are real or integer). In the case of categorical attributes, all possible values are mapped to a set of positive integers. For continuous attributes, a discretisation method is first used to transform these attributes into categorical ones. 1
2 Definition 1: An item can be described as an attribute name A i and its value a i, denoted (A i, a i ). Definition : The j th row or a training object in T can be described as a list of items (A j1, a j1 ),, (A jk, a jk ), plus a class denoted by c j. Definition 3: An itemset can be described as a set of disjoint attribute values contained in a training object, denoted < (A i1, a i1 ),, (A ik, a ik )>. Definition 4: A ruleitem r is of the form <cond, c>, where condition cond is an itemset and cεc is a class. Definition 5: The actual occurrence (actoccr) of a ruleitem r in T is the number of rows in T that match r s itemset. Definition 6: The support count (suppcount) of ruleitem r = <cond, c> is the number of rows in T that matches r s itemset, and belongs to a class c. Definition 7: The occurrence (occitm) of an itemset I in T is the number of rows in T that match I. Definition 8: An itemset i passes the minimum support (minsupp) threshold if (occitm(i)/ T ) minsupp. Such an itemset is called frequent itemset. Definition 9: A ruleitem r passes the minsupp threshold if, suppcount(r)/ T minsupp. Such a ruleitem is said to be a frequent ruleitem. Definition 10: A ruleitem r passes the minimum confidence (minconf) threshold if suppcount(r) / actoccr(r) minconf. Definition 11: A rule is represented in the form: cond c j, where the left-hand-side of the rule (antecedent) is an itemset and the right-hand-side of the rule (consequent) is a class labels. A classifier is a mapping form H : A Y, where A is the set of items and Y is the set of class labels. The main task of AC is to construct a classifier that is able to predict the classes of previously unseen data set as accurately as possible. In other words, the goal is to find a classifier h ε H that maximises the probability that h (a) = y for each test data object. Consider the training data set shown in Table 1, which represents whether or not a person is likely to buy a new car. Assume that minsupp = and minconf = 50%. Frequent ruleitems discovered in the learning step (phase 1) along with their relevant support and confidence values are shown in Table. Table 1: Car sales training data Age Income has a car Buy/class senior middle n yes youth low y no junior high y yes youth middle y yes senior high n yes junior low n no senior middle n no Table : Possible Ruleitems from Table 1 Frequent Ruleitems Itemset Class Support Confidence {low} no /7 / {high} yes /7 / {middle} yes /7 /3 {senior} yes /7 /3 {y} yes /7 /3 {n} yes /7 /4 {n} no /7 /4 {senior, no} yes /7 /3 3. Associative Classification Challenges and Interesting Research Directions 3.1 Multi-label Rules Classifiers Existing AC techniques create only the most obvious class correlated to a rule and simply ignore the other classes even though such classes when associated with these rules may be significant and useful. For example, assume that an itemset a is stored in a database and is associated with three potential classes, e.g. f 1, f and f 3, 35, 34 and 31 times, respectively. Assume that a holds enough support and confidence when associated with the three classes. Typically, existing AC techniques generate only one rule for itemset a, e.g. a f1, since class f 1 is the largest frequency class among the others associated with a. The other two potential rules, i.e. a f, a f3, are simply discarded. However, these two rules may play a useful role in the prediction step because they are highly representative and hold useful information. The difference between the chosen rule and the ignored two rules is quite small. For itemset a, a rule like a f1 f f3, that hold all potential classes that survive support and confidence thresholds is more appropriate for decision makers in many applications. A recently proposed multiple labels algorithm called MMAC [11] could be seen as a starting point for research on multi-label AC. The MMAC generates classifiers that contain rules with multiple labels from multi-class and multi-label data, extracting important knowledge that would have been discarded by existing techniques. A rule in the MMAC classifier takes the form: cond c1 c... c n, where cond is an itemset and the consequent is a list of ranked class labels each of which is assigned a weight during the training step. The multiple classes in the consequent provide useful knowledge that end-user and decision makers may benefit from. The MMAC approach employs a recursive learning phase that search for
3 1 st, nd, n th class associated with each itemset in the training data; rather than just looking for only the dominant class. Empirical studies [11] on various known multi-class benchmark problems as well as real world multiple label optimisation problem show that MMAC outperformed popular AC algorithms such as CBA and traditional techniques such as C4.5 and RIPPER with reference to error-rate. For applications such as medical diagnoses it is more appropriate to produce the list of all classes associated with symptoms based on their distribution frequencies in the training data. As a result, there is a need for developing algorithms for real world multiclass and multi-label classification data that consider all available classes that pass certain user thresholds for each itemset. 3. Rule Ranking Sorting of rules according to certain criteria plays an important role in the classification process since the majority of AC algorithms such as [6, 7, 13] utilise rule ranking procedures as the basis for selecting the classifier during pruning. In particular, CBA and CMAR algorithms for example use the database coverage pruning [7] to build their classifiers, where using this pruning, rules are tested according to their ranks. In addition, the ranking of rules plays an important role in the prediction step as the top ranked rules are used more frequently than others in classifying test objects. The precedence of the rules is usually determined according to several parameters such as the support, confidence and the length of a rule (cardinality). In AC, normally a very small support is used and since most classification data sets are dense, the expected number of rules with identical support, confidence and cardinality is high. For example, if someone mines the tic-tac data set, which has been downloaded from [14] with a minsupp of % and minconf of 50% using the CBA algorithm [7] and without using any pruning, there will be numerous numbers of rules, which have the same support and confidence values. Specifically, the confidence, support and rule length for more than 16 rules are identical, and thus CBA has to discriminate between them using random selection. There have been few attempts to consider other parameters in rule ranking beside support and confidence such as the distribution frequency of class labels [10] in the training data. Experimental results [10] against 1 classification data sets revealed the frequent use of the class distribution parameter within their proposed algorithm, which positively improves upon the accuracy of the generated classifiers. Particularly, when using the class distribution after considering confidence, support and rule length, the accuracy of the derived classifiers has improved on average +0.6% and +0.40% over (support, confidence) and (support, confidence, rule length) rule sorting approaches, respectively. This provides evidence that adding more appropriate constraints to break ties slightly improves the predictive power of the classifiers. 3.3 Noise in Test Data Roughly speaking, a classifier is constructed from labelled data records, and later is used to forecast classes of previously unseen data records. Training and test data sets may contain noise, including, missing or incorrect values inside records. One has to think carefully about the importance of missing or incorrect values in training or test data sets. As a result, only human experts in the application domains used to generate the data sets can make an implicit assumption about the significance of missing or invalid values. Several classification algorithms that have been proposed in data mining produce classifiers with an acceptable error rate. However, most of these algorithms assume that all records in the test data set are complete and no missing data are present. When test data sets suffer from missing attribute values or incomplete records, classification algorithms may produce poor classifiers with reference to prediction accuracy. This is due to that these algorithms tend to tailor the training data set too much [9]. In real world applications, it is common that a training or test data contains attribute with missing values. For instance, the labor and hepatid data sets published in the UCI data repository [8] contain missing records. Thus, its is imperative to build classifiers that are able to predict accurately the classes for test data sets with missing attribute values. These classifiers are normally called robust classifiers [5]. Unlike traditional classifiers, which assume that the test data is complete, robust classifiers deal with existing and non-existing values in test data sets. There have been some solutions to avoid noise in the training data sets. Naïve Bayes [4] for instance, ignores missing values during the computation of probabilities, and thus missing values have no effect on the prediction since they have been omitted. Although omitting missing values may not be the ideal solution since these unknown values may provide a good deal of information. Other classification techniques like CBA assume that the absence of missing values may be of some importance, and therefore they treat them as other existing known values in the training data set. However, if this is not the case, then missing values should be treated in a special way rather than just considering them as other possible values that the attribute might take. Decision tree algorithms [9] deal with missing values using probabilities, which are calculated from the frequencies of the different 3
4 values for an attribute at a particular node in the decision tree. The problem of dealing with unknown values inside test data sets has not yet been explored well in AC approach. One possible simple solution for this problem is to select the common value of the attribute that contains missing values from the training data set. The common value could be selected from the attribute objects that occur with the same class to which the missing value belongs. Finally, each missing value for that attribute and its corresponding class in the training data set is substituted with the common value. The common value represents the value that has the largest frequency with the attribute in the training data set. We could also use common values from the test data set the same way as described above to substitute attributes with missing values. Another possible solution for missing values in test data set is using weights or probabilities similar to C4.5 algorithm. 3.4 Incremental Learning Existing AC algorithms mine the training data set as whole in order to produce the outcome. When data operations (adding, deleting and editing) occur on the training data set, current algorithms have to scan the complete training data set one more time in order to reflect changes done. Further, since data are collected in most application domains on a daily, weekly or monthly basis, training data sets can rapidly grow. As a result of that, the cost of the repetitive scan each time a training data gets modified in order to update the set of rules is costly with regards to I/O and CPU times. Incremental AC algorithms, which can keep the last mining results and only consider data records that have been updated, are a more efficient approach, which can lead to a huge saving in computational time. To explain the incremental mining problem more precisely in AC, consider a training data set T. The following operations may occur on T: The original training data T can be incremented by T + records (adding). T - records can be removed from the original training data T (deleting). T + records can be added to T and T - records can be removed from T (updating). The result of any of the operations described above on T is an updated training data T. The question is how the outcome (rules) of the original data set T can be updated to reflect changes done on T without having to perform extensive computations. This problem can be divided further into sub-problems according to the possible ruleitems contained in T after performing a data manipulation operation. For example, ruleitems in T can be divided into the following groups after inserting new records (T + ): a. ruleitems that are frequent in T and T + b. ruleitems that are frequent in T and not frequent in T + c. ruleitems that are frequent in T + and not frequent in T d. ruleitems that are neither frequent in T + nor T The ruleitems in groups 1 and can be identified in a straightforward manner. For instance, if ruleitem Y is frequent in T, then it s support count in the updated training data (T ), Y count = Y count +Y + count, where Y count is known and Y + count can be obtained after scanning T +. The challenge is to find frequent ruleitems that are not frequent in T but frequent in T + since these ruleitems are not determined after scanning T or T +. There has been some research work on incremental association rule discovery, i.e. [1], which can be considered as a starting point for research on incremental AC. 3.5 Rules Overlapping Classic rule-based classification approaches such as rule induction and covering consider building the classifier in a heuristic way. Once a rule is evaluated during the learning step, all training objects covered by it are discarded, thus a training instance is covered only by a single rule. Association rule discovery on the other hand, considers the correlation between all possible items in a database, and therefore, rules overlap in their training objects. In other words, multiple rules could be generated from a database transaction. Since AC employs association rule methods to discover the rules, rules created share training objects as well. In most existing AC techniques [, 6, 7], when a rule is evaluated during construction of the classifier, all its related training data objects are removed from the training data set using pruning heuristics. However, these training objects may also be used by other potential rules during the training phase. Consider for instance two rules, e.g. r : a b c 1 1 and r : b c 1, and assume that r1 p r, i.e. r 1 precedes r. Assume that r 1 covers rows (1,, 3) and these rows are associated with class c 1 in the training data, whereas r covers rows (1,, 3, 4, 5) and rows (4,5) are associated with class c in the training data. Now, once r 1 is evaluated and inserted into the classifier using an AC technique such as CBA or CMAR, all training objects associated with r 1 are removed, i.e. rows (1,, 3), using the database coverage pruning. The removal of the evaluated rule, i.e. r1, training objects, may influence other potential rules that share with r 1 its training objects, i.e. r. Consequently, after inserting r 1 into the classifier, the statistically fittest class c 1 of rule r would not be the fittest class any more; 4
5 rather a new class at that point becomes the fittest class, c, because it has the largest representation among the remaining r rows, i.e. (4, 5), in the training data. The effect of the removal of training data objects for each evaluated rule should be considered for all other candidate rules that use these objects. If the removal is not considered, it could lead to a classifier that contains rules that predict class labels that have a low representation and sometimes no representation at all in the training data. If the effect of removal of training data objects for the evaluated rules is considered on other potential rules in the training phase, then a more realistic classifier that assigns the true class fitness to each rule will result. 4. Conclusions Associative classification is becoming a common approach in classification since it extracts very competitive classifiers with regards to prediction accuracy if compared with rule induction, probabilistic and decision tree approaches. However, challenges such as efficiency of rule discovery methods, the exponential growth of rules, rule ranking and noise in test data set need more consideration. Furthermore, there are new research directions in associative classification, which have not yet been explored such as incremental learning, multi-label classifiers and rules overlapping. This paper has highlighted and discussed these challenges and potential research directions. California, Department of Information and Computer Science. [9] Quinlan, J. (1993) C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. [10] Thabtah F (006): Rule Preference Effect in Associative Classification Mining. Journal of Information and Knowledge Management, Vol 5(1):1-7, [11] Thabtah, F., Cowling, P., and Peng, Y. (004) MMAC: A new multi-class, multi-label associative classification approach. Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 04), (pp. 17-4). Brighton, UK. (Nominated for the Best paper award). [1] Tsai, P., Lee, C., and Chen A. (1999) An efficient approach for incremental association rule mining. Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining, (pp ). London, UK. [13] Yin, X., and Han, J. (003) CPAR: Classification based on predictive association rule. Proceedings of the SDM (pp ). San Francisco, CA. [14] WEKA (000): Data Mining Software in Java: References [1] Agrawal, R., and Srikant, R. (1994) Fast algorithms for mining association rule. Proceedings of the 0th International Conference on Very Large Data Bases (pp ). [] Baralis, E., and Torino, P. (000) A lazy approach to pruning classification rules. Proceedings of the 00 IEEE ICDM'0, (pp. 35). [3] Cohen, W. (1995) Fast effective rule induction. Proceedings of the 1 th International Conference on Machine Learning, (pp ). Morgan Kaufmann, CA. [4] Duda, R., and Hart, P. (1973) Pattern classification and scene analysis. John Wiley & son, [5] Hu, H., and Li, J. (005) Using association rules to make rule-based classifiers robust. Proceedings of the Sixteenth Australasian Database Conference, (pp ). Newcastle, Australia. [6] Li, W., Han, J., and Pei, J. (001) CMAR: Accurate and efficient classification based on multiple-class association rule. Proceedings of the ICDM 01 (pp ). San Jose, CA. [7] Liu, B., Hsu, W., and Ma, Y. (1998) Integrating classification and association rule mining. Proceedings of the KDD, (pp ). New York, NY. [8] Merz, C., and Murphy, P. (1996) UCI repository of machine learning databases. Irvine, CA, University of 5
Pruning Techniques in Associative Classification: Survey and Comparison
Survey Research Pruning Techniques in Associative Classification: Survey and Comparison Fadi Thabtah Management Information systems Department Philadelphia University, Amman, Jordan ffayez@philadelphia.edu.jo
More informationRule Pruning in Associative Classification Mining
Rule Pruning in Associative Classification Mining Fadi Thabtah Department of Computing and Engineering University of Huddersfield Huddersfield, HD1 3DH, UK F.Thabtah@hud.ac.uk Abstract Classification and
More informationA review of associative classification mining
The Knowledge Engineering Review, Vol. 22:1, 37 65. Ó 2007, Cambridge University Press doi:10.1017/s0269888907001026 Printed in the United Kingdom A review of associative classification mining FADI THABTAH
More informationClass Strength Prediction Method for Associative Classification
Class Strength Prediction Method for Associative Classification Suzan Ayyat Joan Lu Fadi Thabtah Department of Informatics Huddersfield University Department of Informatics Huddersfield University Ebusiness
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017
International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 17 RESEARCH ARTICLE OPEN ACCESS Classifying Brain Dataset Using Classification Based Association Rules
More informationCombinatorial Approach of Associative Classification
Int. J. Advanced Networking and Applications 470 Combinatorial Approach of Associative Classification P. R. Pal Department of Computer Applications, Shri Vaishnav Institute of Management, Indore, M.P.
More informationA Classification Rules Mining Method based on Dynamic Rules' Frequency
A Classification Rules Mining Method based on Dynamic Rules' Frequency Issa Qabajeh Centre for Computational Intelligence, De Montfort University, Leicester, UK P12047781@myemail.dmu.ac.uk Francisco Chiclana
More informationEnhanced Associative classification based on incremental mining Algorithm (E-ACIM)
www.ijcsi.org 124 Enhanced Associative classification based on incremental mining Algorithm (E-ACIM) Mustafa A. Al-Fayoumi College of Computer Engineering and Sciences, Salman bin Abdulaziz University
More informationReview and Comparison of Associative Classification Data Mining Approaches
Review and Comparison of Associative Classification Data Mining Approaches Suzan Wedyan Abstract Associative classification (AC) is a data mining approach that combines association rule and classification
More informationA SURVEY OF DIFFERENT ASSOCIATIVE CLASSIFICATION ALGORITHMS
Asian Journal Of Computer Science And Information Technology 3 : 6 (2013)88-93. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science And Information Technology Journal
More informationA Novel Algorithm for Associative Classification
A Novel Algorithm for Associative Classification Gourab Kundu 1, Sirajum Munir 1, Md. Faizul Bari 1, Md. Monirul Islam 1, and K. Murase 2 1 Department of Computer Science and Engineering Bangladesh University
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationStructure of Association Rule Classifiers: a Review
Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be
More informationA Survey on Algorithms for Market Basket Analysis
ISSN: 2321-7782 (Online) Special Issue, December 2013 International Journal of Advance Research in Computer Science and Management Studies Research Paper Available online at: www.ijarcsms.com A Survey
More informationA dynamic rule-induction method for classification in data mining
Journal of Management Analytics, 2015 Vol. 2, No. 3, 233 253, http://dx.doi.org/10.1080/23270012.2015.1090889 A dynamic rule-induction method for classification in data mining Issa Qabajeh a *, Fadi Thabtah
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationA Conflict-Based Confidence Measure for Associative Classification
A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA
More informationASSOCIATIVE CLASSIFICATION WITH KNN
ASSOCIATIVE CLASSIFICATION WITH ZAIXIANG HUANG, ZHONGMEI ZHOU, TIANZHONG HE Department of Computer Science and Engineering, Zhangzhou Normal University, Zhangzhou 363000, China E-mail: huangzaixiang@126.com
More informationA neural-networks associative classification method for association rule mining
Data Mining VII: Data, Text and Web Mining and their Business Applications 93 A neural-networks associative classification method for association rule mining P. Sermswatsri & C. Srisa-an Faculty of Information
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationAn Evolutionary Algorithm for Mining Association Rules Using Boolean Approach
An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,
More informationACN: An Associative Classifier with Negative Rules
ACN: An Associative Classifier with Negative Rules Gourab Kundu, Md. Monirul Islam, Sirajum Munir, Md. Faizul Bari Department of Computer Science and Engineering Bangladesh University of Engineering and
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationCOMPARATIVE STUDY ON ASSOCIATIVECLASSIFICATION TECHNIQUES
COMPARATIVE STUDY ON ASSOCIATIVECLASSIFICATION TECHNIQUES Ravi Patel 1, Jay Vala 2, Kanu Patel 3 1 Information Technology, GCET, patelravi32@yahoo.co.in 2 Information Technology, GCET, jayvala1623@gmail.com
More informationCategorization of Sequential Data using Associative Classifiers
Categorization of Sequential Data using Associative Classifiers Mrs. R. Meenakshi, MCA., MPhil., Research Scholar, Mrs. J.S. Subhashini, MCA., M.Phil., Assistant Professor, Department of Computer Science,
More informationDISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH
International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationAn Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining
An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,
More informationFeature Selection Based on Relative Attribute Dependency: An Experimental Study
Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationAn approach to calculate minimum support-confidence using MCAR with GA
An approach to calculate minimum support-confidence using MCAR with GA Brijkishor Kumar Gupta Research Scholar Sri Satya Sai Institute Of Science & Engineering, Sehore Gajendra Singh Chandel Reader Sri
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationGraph Based Approach for Finding Frequent Itemsets to Discover Association Rules
Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery
More informationTendency Mining in Dynamic Association Rules Based on SVM Classifier
Send Orders for Reprints to reprints@benthamscienceae The Open Mechanical Engineering Journal, 2014, 8, 303-307 303 Open Access Tendency Mining in Dynamic Association Rules Based on SVM Classifier Zhonglin
More informationIntroducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values
Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine
More informationMining High Average-Utility Itemsets
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering
More informationMining Quantitative Association Rules on Overlapped Intervals
Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationProduct presentations can be more intelligently planned
Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules
More informationA Novel Rule Ordering Approach in Classification Association Rule Mining
A Novel Rule Ordering Approach in Classification Association Rule Mining Yanbo J. Wang 1, Qin Xin 2, and Frans Coenen 1 1 Department of Computer Science, The University of Liverpool, Ashton Building, Ashton
More informationCloNI: clustering of JN -interval discretization
CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationThe Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti
Information Systems International Conference (ISICO), 2 4 December 2013 The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria
More informationA Two Stage Zone Regression Method for Global Characterization of a Project Database
A Two Stage Zone Regression Method for Global Characterization 1 Chapter I A Two Stage Zone Regression Method for Global Characterization of a Project Database J. J. Dolado, University of the Basque Country,
More informationA Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *
A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationA Novel Rule Weighting Approach in Classification Association Rule Mining
A Novel Rule Weighting Approach in Classification Association Rule Mining (An Extended Version of 2007 IEEE ICDM Workshop Paper) Yanbo J. Wang 1, Qin Xin 2, and Frans Coenen 1 1 Department of Computer
More informationUncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique
Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department
More informationOptimization using Ant Colony Algorithm
Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department
More informationA Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm
A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of
More informationWEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1
WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationA Literature Review of Modern Association Rule Mining Techniques
A Literature Review of Modern Association Rule Mining Techniques Rupa Rajoriya, Prof. Kailash Patidar Computer Science & engineering SSSIST Sehore, India rprajoriya21@gmail.com Abstract:-Data mining is
More informationCOMPACT WEIGHTED CLASS ASSOCIATION RULE MINING USING INFORMATION GAIN
COMPACT WEIGHTED CLASS ASSOCIATION RULE MINING USING INFORMATION GAIN S.P.Syed Ibrahim 1 and K.R.Chandran 2 1 Assistant Professor, Department of Computer Science and Engineering, PSG College of Technology,
More informationHierarchical Online Mining for Associative Rules
Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationAssociating Terms with Text Categories
Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationAssociation Rule Mining. Introduction 46. Study core 46
Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent
More informationEvolving SQL Queries for Data Mining
Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationA Comparative Study of Selected Classification Algorithms of Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220
More informationPredicting Missing Items in Shopping Carts
Predicting Missing Items in Shopping Carts Mrs. Anagha Patil, Mrs. Thirumahal Rajkumar, Assistant Professor, Dept. of IT, Assistant Professor, Dept of IT, V.C.E.T, Vasai T.S.E.C, Bandra Mumbai University,
More informationDiscovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree
Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania
More informationAn Empirical Study on feature selection for Data Classification
An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of
More informationC-NBC: Neighborhood-Based Clustering with Constraints
C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is
More informationAssociation Technique in Data Mining and Its Applications
Association Technique in Data Mining and Its Applications Harveen Buttar *, Rajneet Kaur ** * (Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India.) **(Assistant
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationAssociation Rule Mining from XML Data
144 Conference on Data Mining DMIN'06 Association Rule Mining from XML Data Qin Ding and Gnanasekaran Sundarraj Computer Science Program The Pennsylvania State University at Harrisburg Middletown, PA 17057,
More informationHandling Missing Values via Decomposition of the Conditioned Set
Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,
More informationA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationChapter 2. Related Work
Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.
More informationAnalysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan
International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan Abstract - Data mining
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationJournal of Emerging Trends in Computing and Information Sciences
An Associative Classification Data Mining Approach for Detecting Phishing Websites 1 Suzan Wedyan, 2 Fadi Wedyan 1 Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, Jordan 2 Department
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationUpper bound tighter Item caps for fast frequent itemsets mining for uncertain data Implemented using splay trees. Shashikiran V 1, Murali S 2
Volume 117 No. 7 2017, 39-46 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Upper bound tighter Item caps for fast frequent itemsets mining for uncertain
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationDATA MINING II - 1DL460
Uppsala University Department of Information Technology Kjell Orsborn DATA MINING II - 1DL460 Assignment 2 - Implementation of algorithm for frequent itemset and association rule mining 1 Algorithms for
More informationReal World Performance of Association Rule Algorithms
To appear in KDD 2001 Real World Performance of Association Rule Algorithms Zijian Zheng Blue Martini Software 2600 Campus Drive San Mateo, CA 94403, USA +1 650 356 4223 zijian@bluemartini.com Ron Kohavi
More informationThe Fuzzy Search for Association Rules with Interestingness Measure
The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of
More informationSalah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai
EFFICIENTLY MINING FREQUENT ITEMSETS IN TRANSACTIONAL DATABASES This article has been peer reviewed and accepted for publication in JMST but has not yet been copyediting, typesetting, pagination and proofreading
More informationAdaptive Metric Nearest Neighbor Classification
Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California
More informationOptimized Class Association Rule Mining using Genetic Network Programming with Automatic Termination
Optimized Class Association Rule Mining using Genetic Network Programming with Automatic Termination Eloy Gonzales, Bun Theang Ong, Koji Zettsu Information Services Platform Laboratory Universal Communication
More informationA Comparative Study of Association Rules Mining Algorithms
A Comparative Study of Association Rules Mining Algorithms Cornelia Győrödi *, Robert Győrödi *, prof. dr. ing. Stefan Holban ** * Department of Computer Science, University of Oradea, Str. Armatei Romane
More informationPerformance Analysis of Apriori Algorithm with Progressive Approach for Mining Data
Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India
More informationA Graph-Based Approach for Mining Closed Large Itemsets
A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and
More informationINFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM
INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India
More informationPerformance Analysis of Data Mining Algorithms
! Performance Analysis of Data Mining Algorithms Poonam Punia Ph.D Research Scholar Deptt. of Computer Applications Singhania University, Jhunjunu (Raj.) poonamgill25@gmail.com Surender Jangra Deptt. of
More informationTutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory
Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home
More informationOrdering attributes for missing values prediction and data classification
Ordering attributes for missing values prediction and data classification E. R. Hruschka Jr., N. F. F. Ebecken COPPE /Federal University of Rio de Janeiro, Brazil. Abstract This work shows the application
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationMining of Web Server Logs using Extended Apriori Algorithm
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More information