A dynamic rule-induction method for classification in data mining

Size: px
Start display at page:

Download "A dynamic rule-induction method for classification in data mining"

Transcription

1 Journal of Management Analytics, 2015 Vol. 2, No. 3, , A dynamic rule-induction method for classification in data mining Issa Qabajeh a *, Fadi Thabtah b and Francisco Chiclana c a E-Business Department, Canadian University of Dubai, Dubai, UAE; b Computing and Informatics Department, De Montfort University, Leicester, UK; c Centre for Computational Intelligence, De Montfort University, Leicester, UK (Received 22 March 2015; revised 25 August 2015; accepted 31 August 2015) Rule induction (RI) produces classifiers containing simple yet effective If Then rules for decision makers. RI algorithms normally based on PRISM suffer from a few drawbacks mainly related to rule pruning and rule-sharing items (attribute values) in the training data instances. In response to the above two issues, a new dynamic rule induction (DRI) method is proposed. Whenever a rule is produced and its related training data instances are discarded, DRI updates the frequency of attribute values that are used to make the next in-line rule to reflect the data deletion. Therefore, the attribute value frequencies are dynamically adjusted each time a rule is generated rather statically as in PRISM. This enables DRI to generate near perfect rules and realistic classifiers. Experimental results using different University of California Irvine data sets show competitive performance in regards to error rate and classifier size of DRI when compared to other RI algorithms. Keywords: data mining; classification rules; rule induction; expected accuracy 1. Introduction Since the rapid development of computer hardware and networks, companies were able to capture massive amounts of data offline and online. These data usually hold crucial information about clients and functioning units performance, and therefore management can make use of them to improve various different business processes. Often, extracting useful information from the massive amounts of data using traditional means or manually is a challenging task that necessitates time, domain experts and care. This has demanded the existence of intelligent tools to automatically discover the useful information from the scattered data, and represent it in practical ways to decision makers in order to increase confidence in making key decisions. Normally, these decisions work for developing and sustaining businesses competitive advantages (Coulter, 2012). Generally, the intelligent tools used by the decision makers are basically automated computer software that utilise a certain learning methodology based on data mining. Data mining is a multidisciplinary field combining artificial intelligence (AI; search methods), databases and mathematics (statistics and probability; Abdelhamid & Thabtah, 2014). Data mining can be defined as the process of discerning new patterns *Corresponding author. fadi@cud.ac.ae 2015 Antai College of Economics and Management, Shanghai Jiao Tong University

2 234 I. Qabajeh et al. from large data intelligently to guide key corporate managers (Thabtah, Hammoud, & Adbeljaber, 2015). We have defined data mining as a learning methodology concerned with revealing hidden knowledge in a specific format from data sets for a particular usage. One popular task in data mining which involves predicting unseen target attributes, i.e. the class, based on learning from labelled historical data (training data set) is classification. Classification involves learning a model, often named the classifier, from a training data set consisting of a set of features (attributes), one of which is labelled as the class. The main goal of a classification technique is to accurately guess the value of the class for an unseen set of data, normally called the test data. This type of learning that occurs on the training data set is restricted to the value of the class attribute, and therefore it falls under the category of supervised learning research topics. Common applications for classification are medical diagnoses (Rameshkumar, Sambath, & Ravi, 2013) and website phishing detection (Abdelhamid, Ayesh, & Thabtah, 2014). Several different classification approaches have been developed in data mining including decision trees (Quinlan, 1993), neural networks (NN; Mohammad, Thabtah, & McCluskey, 2013), support vector machines (SVM; Cortes & Vapnik, 1995), associative classification (AC; Thabtah, Cowling, & Peng, 2004), rule induction (RI; Cohen, 1995) and others. The latter two approaches, AC and RI, extract classifiers which contain human-interpretable rules in If Then form, and this explains their widespread applicability. However, there are differences between AC and RI, especially in the way rules are induced. In particular, AC utilises association rule discovery techniques to induce the rules based on two user thresholds, named the minimum confidence and the minimum support. The AC algorithm normally discovers the rules at once from the input data set based on the above thresholds, whereas in RI rules are discovered one by one in a greedy fashion and per class label. In other words, the classifier in RI is learnt from parts of the training data set since the data gets initially partitioned into parts based on the available class label, whereas the classifier in AC is learnt from the complete training data without splitting it. This article falls under the umbrella of RI research. PRISM is an RI technique which was developed by Cendrowska (1987) and slightly enhanced by others (i.e. Stahl & Bramer, 2008). This learning algorithm follows a separate-and-conquer learning strategy in building the classifier (Witten & Frank, 2005). Particularly and for the available class labels in the training data set, this algorithm builds a set of rules and then combines them to make the classifier. Normally for a certain class label, PRISM starts with an empty rule and keeps adding attribute values (Definition 1 in section 2.1) to the rule s body until the rule reaches a certain expected accuracy (Definition 8 in section 2.1). Often, PRISM generates only perfect rules (rules that have 100% accuracy). When this happens, the rule gets generated and all training data connected with it are discarded. PRISM continues building other rules in the same way until no more data associated with the current class can be found. At this point, PRISM moves to a new class label and repeats the same steps described earlier until the training data set becomes empty. During the building of a rule, often the largest accuracy attribute value is added to the candidate rule. More details on PRISM are given in section 3.

3 Journal of Management Analytics 235 This paper investigates shortcomings associated with PRISM. Specifically, we look into three main issues: (1) Reducing the possible number of attribute values used to create rules by using an attribute value frequency threshold that we call freq ; (2) Generating not only rules with 100% accuracy but also other high-accuracy rules. We utilise here a predefined user threshold that we call rule strength (Rule_Strength) to separate acceptable and not acceptable rules. Acceptable rules usually hold accuracy above the Rule_Strength threshold, and not acceptable rules are pruned since they hold accuracy below Rule_Strength. More details are given in section 2.1. Often, we store the acceptable rules that are not perfect (< 100% accuracy) in a secondary classifier that can be used in the prediction step only when rules in the primary classifier (= 100% accuracy) fail to classify a test case;. (3) Dynamically updating the frequency of candidate attribute values that was initially computed from the training data set whenever a rule is produced. This is because when a rule is generated its associated training data are removed, but some of these data may contain items that are similar to certain attribute values of other candidate rules. Therefore, the frequency of these impacted rules attribute values appearing in the removed training instances must be decremented. This is simply because the training data set has been reduced after deleting the generated rule s data instances from it. This problem is discussed further in section 3 (Issue 2). In the dynamic rule induction algorithm, the domain expert is the one who controls the number of attribute values which can be utilised to make rules by setting the freq threshold. This threshold is used to separate strong attribute values (attribute values having data representation above the freq threshold) and weak attribute values (attribute values having a data representation below the freq threshold). By discarding weak attribute values early, this minimises the search space and reduces computation costs such as training time. Moreover, we check the attribute value frequency against the freq threshold every time a rule is produced. Section 4 further elaborates this step. In response to the above-raised issues, we develop a new dynamic learning method based on RI that we name dynamic rule induction (DRI). DRI discovers the rule one by one per class, and primarily uses a minimum frequency threshold called freq to limit the search space for attribute values by discarding weak attribute values. Further, whenever a rule is induced, DRI decrements the frequencies of strong attribute values that appeared inside the deleted training instances of the induced rule. This indeed may result in discarding some attribute values since they become weak (their frequency drops below the freq threshold) and therefore a lower number of rules in the classifier, especially those with 100% accuracy. More details on the distinguishing features of the proposed algorithm are given in section 4.3. Lastly, DRI allows the generation of rules with high but not necessarily perfect accuracy, which may limit the use of the default class rule in classifying test data. The default class rule is formed from the uncovered training data after inducing all rules, and normally the default class rule is connected with the highest frequency class in the uncovered training data. Often these high-accuracy rules are ignored by PRISM algorithm since they do not hold

4 236 I. Qabajeh et al. 100% accuracy. These rules are only used instead of the default class rule and when no primary rule is able to classify a test datum. This paper is structured as follows: section 2 presents the related definitions to the classification problem, surveys common RI algorithms and highlights the investigated research issues. Section 3 discusses the proposed algorithm and its related phases alongside a comprehensive example that reveals DRI s insight and its main features. Section 4 is devoted to the data and the experimental results analysis, and finally conclusions are provided in section Literature review and research issues raised in RI In this section, we define terms related to the research problem, review common RI algorithms in the literature and shed light on the research issues this article investigated. We focus on the PRISM algorithm and its successors, since we tackle research problems associated with it. Another reason for including this section is that some of the algorithms described herein are used in the experimental section for comparison purposes with the proposed algorithm Related definitions Given an input training data set T, which has n distinct attributes A 1, A 2,,A n one of which is called the class, i.e. l, that contains a list of values. T size is denoted T. An attribute may be categorical, which means it takes a value from a known set of possible values, or continuous (numeric). For attributes that are categorical, their values are mapped to a set of positive integers, whereas the continuous attribute is discretised. The ultimate aim is to build a classification model (classifier) from T, e.g. C : A l, which guesses the value of the class of test data where A is a disjoint set of attribute values and lis a class. The proposed algorithm depends on a predefined user threshold called freq. This threshold is utilised to differentiate between strong and non-strong ruleitems <attribute value, class> (weak ruleitems) based on their frequency in the training data set. Any ruleitem that survives the freq threshold is known as a strong ruleitem, and when the strong ruleitem belongs to one attribute, we call it a strong 1-ruleitem. Hereunder are the main related terms and definitions. Definition 1: An attribute value is an attribute plus its values denoted (A i, a i ). Definition 2: A training instance in T is a row combining a list of attribute values (A j1, a j1 ),,(A jv, a jv ), plus a class denoted by c j. Definition 3: A ruleitem r has the format <body, c>, where body is a set of disjoint attribute values and c is a class value. Definition 4: The frequency threshold ( freq) is a predefined threshold given by the end user. Definition 5: The body frequency (body_freq) ofaruleitem r in T is the number of instances in T that match r s body. Definition 6: The frequency of a ruleitem r in T (ruleitem_freq) is the number of instances in T that match r.

5 Journal of Management Analytics 237 Definition 7: Aruleitem r passes the freq threshold if, r s body_freq / T freq. Such a ruleitem is said to be a strong ruleitem. Definition 8: A ruleitem r expected accuracy is defined as ruleitem_freq / body_freq. Definition 9: A rule in our classifier is represented as: body l, where the left-hand side (body) is a set of disjoint attribute values, and the right-hand side (l) is a class value. The format of the rules is: a 1 ^ a 2 ^...^ a n l Literature review PRISM is one of the known RI algorithms that derive rules in greedy manner, in which it splits the training data set into subsets with respects to class values. Then, for each subset, the algorithm forms an empty rule and searches for the attribute value that has the highest expected accuracy and appends it into the rule body, and continues finding attribute values until the current candidate rule achieves maximum expected accuracy (often 100%). Once this happens, the algorithm generates the rule and removes all of its positive instances (data in the subset that belong to the rule). The same process is repeated to produce the rest of the rules from the remaining uncovered data in the subset, until the subset becomes empty or no rule with acceptable expected accuracy can be derived. At that point, the algorithm moves on to the next class subset and repeats the same process until all rules in all class data subsets are generated and merged to form the classifier. One notable problem with this classification approach is that the effort required to find the best attribute value to append into a rule at any stage of the learning phase is exhaustive when we have high-dimensional training data sets. Moreover, there is no clear pruning mechanism in PRISM, which often results in a very large number of rules, each covering a low number of instances within the classifier. A parallel PRISM (P-PRISM) method has been developed (Stahl & Bramer, 2008) to overcome PRISM s computationally expensive process, which involves testing all attribute values when computing the expected accuracies while building a rule. The authors have pre-sorted the items based on their occurrences in the training data set and their class values, and therefore holding such information rather than the complete input data will minimise the memory use. Then, the data are distributed to different processors (central processing units CPUs) where rules are produced locally and then combined globally with no synchronisation mechanism defined. Limited experiments have been conducted to measure the scalability and efficiency of P-PRISM. To cut down the classifier size in RI, the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm (Cohen, 1995) was developed. It divides the training data set with respect to class labels, then, starting with the least frequent class set, it builds a rule by adding items to its body until the rule is perfect (i.e. the number of negative examples covered by the rule is zero). For each candidate empty rule, the algorithm looks for the best attribute value in the data set using information gain (IG; defined in Equations1 and 2; Quinlan, 1993) and appends it to the rule s body. The IG basically evaluates how good the attribute is at splitting the data based on the class labels. The algorithm keeps adding attribute values until the rule becomes perfect, at which point the rule gets generated. This

6 238 I. Qabajeh et al. phase is called rule growing. At the same time as rules are built, RIPPER uses extensive pruning, using both the positive and negative examples associated with the candidate rules, to eliminate unnecessary attribute values. The algorithm stops building the rules when any rule found has 50% error, or in a new implementation of RIPPER when the minimum description length (MDL) of the rules set after adding a candidate rule is larger than the one obtained before adding the candidate rule. Another pruning in RIPPER occurs while building the final classifier. For each candidate rule generated, two substitute rules are made: its replacement and its revision. The first one is made by growing an empty rule r i and filtering it to minimise the error on the overall rule set. The revision rule is built in similar fashion except that the algorithm just inserts an additional item to the rule s body, and examines the original and the revised rule against the data to choose the rule with the lowest error rate. This extensive pruning in RIPPER explains the small-sized classifiers generated by this type of algorithm. Experiments on a number of University of California Irvine data sets (Merz & Murphy, 1996) showed that RIPPER scales well in accuracy rate when compared to decision trees (Cohen, 1995). Gain ( D, A)= Entropy( D) (( ) ) D a / D Entropy ( Da ) (1) where Entropy (D) = P c log 2 P c (2) where P v = the probability that D belongs to class c;. D a = the subset of D for which A has value a; D a = the number of examples in D a, and D = size of D. Ahybridclassification algorithm that uses decision tree and RI approaches together to produce classifiers in one phase rather than two phases, called PART, was proposed by Frank and Witten (1998). PART employs RI to generate the candidate rule set, and then filters this set out using pruning methods adopted from decision trees. PART builds a rule as RI algorithms, but rather than constructing the rule directly from the data, it derives a sub-tree from the training data and then it converts the path leading to the leaf with the largest coverage into a rule, and the sub-tree gets discarded along with its positive instances from the data set. The same process is repeated until all instances in the data set are removed. OneRule is a simple rule-based algorithm that was proposed by Holte (1993). This algorithm makes a one-level tree and produces rules that are connected with the most frequent class in the training data set (having the largest data coverage). For all attribute values in the training data set, OneRule iterates over the training data examples and computes the frequency of each attribute value with respect to available class labels. The algorithm selects the most frequent attribute and class and generates them as a rule if they pass an error rate check. Finally, the algorithm repeats the same step to generate the subsequent rules until it finds a rule with unacceptable error; at that stage, the rule-discovery process terminates.

7 Journal of Management Analytics Research issues raised Issue 1 One of the main problems associated with RI approaches such as PRISM is the large dimensionality of the search space of attribute values (i.e. the large number of candidate attribute values). When constructing a rule for a particular class, PRISM has to evaluate the expected accuracy of all available attribute values linked with that class in order to select the best one that can be added to the rule s body. This necessitates large computations when the training data have many attribute values, and can be a burden especially when several unnecessary computations are made for attribute values that have low data representation (weak attribute values). For this issue, we allow the end user to input, early on, a minimum frequency threshold that determines whether the attribute values can be qualified to be part of a rule body before computing its expected accuracy. This minimises the search space by reducing the number of attribute value frequency computations Issue 2 Another serious problem that has never been previously reported in RI research happens when instances of a generated rule are discarded by PRISM from the training data set. This usually impacts other attribute values that share these instances with that rule. For example, when a rule R 1 :IFx 1 and y 2 Then C 1 is generated, assume that six data instances are linked with the R 1 have been discarded. Now, all candidate attribute values inside the six deleted training data instances, other than items x 1 and y 2, are impacted because of this removal, and their frequencies should be updated to reflect the occurred changes. This means some of these candidate attribute values may no longer have a high enough frequency and therefore should be pruned before building the next rule. In fact, decrementing the frequencies of the affected candidate attribute values results in three distinct advantages: (1) A natural pruning method that discards infrequent attribute values, therefore further reducing the search space;. (2) The number of rules is minimised since fewer possible attribute values can be added into the next in-line rule, therefore resolving one of the major problems associated with PRISM (the typical PRISM has no pruning method);. (3) We can consider that the majority of the rules derived are now composed of items with realistic class, and frequencies which are computed incrementally based on the rule-generation process rather than statically at once from the training data set. We believe that these dynamic attribute value frequencies are fairer than those computed at once by the traditional RI algorithms. More details on the second research issue are given in a detailed example in section Issue 3 One of the problems associated with PRISM is its excessive learning to derive perfect rules regardless of whether the produced rule has sufficient data representation, which may lead to the generation of massive number of rules that have

8 240 I. Qabajeh et al. low frequency despite being perfect in regards to expected accuracy. So when a rule has an expected accuracy of 90% and has a large representation, unfortunately PRISM does not generate it and prefers to break down its instances to produce multiple low-coverage rules. We believe that these high-coverage rules, when having a good expected accuracy, can be advantageous, especially in predicting the class of test data when perfect rules are failing to do so. This leads us to propose a threshold that can separate between acceptable and non-acceptable rules, which we call Rule_Strength. We also use rule sorting, where top-ranked rules normally have 100% accuracy and low-ranked rules are acceptable rules with expected accuracy of less than 100%. 3. A new dynamic rule induction algorithm Our algorithm uses RI learning strategy to discover and extract the rules. It consists of two main phases: rule discovery and class prediction. In phase 1, the algorithm logically splits the training data per class, and for each class it builds rules with expected accuracy = 100% OR >= Rule_Strength until no more rules can be extracted or the class label s data are covered by the produced rules. Our algorithm induces rules usually ignored by PRISM by considering the Rule_Strength threshold. The same process is repeated for the rest of the classes in the training data set until the complete data set becomes empty, and at that point all rules are merged together to make the classifier. In order to minimise the number of candidate attribute values, DRI employs a frequency threshold that only allows items having sufficient number of occurrences above the freq threshold to be part of rules. All items that belong to a particular class and have frequencies below the freq threshold are discarded during the rulediscovery phase. During phase 1, and once a rule is derived and its associated instances are removed, the frequencies of any candidate attribute values appearing in the deleted instances are updated. Often, this update involves decrementing the frequencies of the impacted candidate attribute values. This indeed may result in removing candidate attribute values that have low frequencies (<freq threshold). The consequence is further minimising the search space by pruning these candidate attribute values, and therefore the number of rules ending in the classifier is also minimised. Phase 2 involves using the classifier to forecast the class of unseen data, and computing the error rate. The general steps of the proposed algorithm are depicted in Figure 1. In the subsequent sections, details about each phase are elaborated. The attributes inside the training data set are assumed to be categorical or continuous. For continuous attributes, the entropy-based discretisation method is applied before the rule-discovery phase. In discretising a continuous attribute, the attribute s values are sorted in ascending order, and the class linked with each value is presented. When the value of the class changes this is considered a breaking point, and the gain of splitting the data for that attribute at each breaking point is computed based on IG (Quinlan, 1993). The breaking point that maximises the IG over all possible points is selected. The same process is repeated for the remaining unselected breaking point partitions. Further information about discretisation is found in (Witten and Frank, 2005). We deal with missing values as other available values in the training data set.

9 Journal of Management Analytics 241 Figure 1. Dynamic rule induction (DRI) algorithm.

10 242 I. Qabajeh et al Rule production and classifier building Before DRI starts the mining process, the training data gets transformed into a data structure that will hold <Item, class, Line# s / row IDs>. The item and the class are represented by <ColumnID, RowID> data representation, where the first column and row numbers that the item/class occur in in the training data set denote the item/class. This data representation has been adopted from Thabtah and Hammoud (2013). The main advantage of using this data format is that there is no attribute value frequency counting after iteration 1. This is because our algorithm stores both the attribute values and the class locations in the training data, set in a data structure called the TID. Ruleitem r s TID is utilised to locate the frequency of r by just taking r s TID s size. This is very simple process that normally reduces the number of passes over the training data set to one (Abdelhamid, Ayesh, Thabtah, Ahmadi, & Hadi, 2012). Further details on the advantage of the data representation used can be found in Thabtah and Hammoud (2013). DRI starts learning by passing over the input data set and building a data structure that corresponds to all strong 1-ruleitems and their frequencies (TIDs). All candidate 1-ruleitems that are weak (their frequency is below the freq threshold) are discarded. Then, for each class, say L 1, we start with an empty rule r i, i.e. If Empty then L 1 adds the largest expected accuracy attribute values to r i s body until r i becomes perfect or with an acceptable error rate. In other words, our algorithm can generate a rule for a class despite being not perfect, as long as it passes the Rule_Strength threshold. These not-perfect rules are then ranked and stored in a secondary classifier. Now, once a rule is produced, all instances connected with it in the training data are deleted and we move on to build the next rule for the current class (L 1 ). The deletion of r i s training instances may impact other candidate attribute values that appear in those instances, and therefore the DRI algorithm updates the frequency of all other strong attribute values that have appeared in the removed r i s instances to reflect the changes made. This guarantees a live and dynamic frequency for all remaining strong attribute values, where some of these may become more statistically fit and others become weak. This is a natural pruning process in which weak attribute values are identified without having to look them up in the training data set, which efficiently improves the training process and reduces the number of candidate strong items used to generate the next rule. We believe that DRI is the only RI algorithm that takes care of this problem. Now, once the first rule is devised (r i ), then the algorithm continues building up rules for the current class until: (1) No more strong attribute values are linked with class L1; (2) The remaining attribute values expected accuracy is unacceptable. At this point, DRI picks up another class and repeats the same process until the training data set becomes empty or no stronger attribute values are found. Section 3 gives a detailed example of the rule discovery phase of the proposed algorithm. Often, there should be a way to distinguish among rules in classification in order to choose which rule should be fired in classifying the test data during the process of class allocation. In the classic PRISM algorithm, there is no rule ranking since all rules generated and stored in the classifier have 100% expected accuracy. Nevertheless, PRISM

11 Journal of Management Analytics 243 Figure 2. Rule sorting of the dynamic rule induction (DRI) algorithm. and its successors ignore near-perfect rules and rules that are not perfect. We solved this problem by considering not only perfect rules but other rules that pass a userdefined threshold called the Rule Strength (Rule_Strength). These rules are normally kept in a secondary classifier that can be utilised only when no rules in the primary classifier can cover a test datum. This means the DRI algorithm has two classifiers:. Primary: stores only perfect rules that have 100% accuracy;. Secondary: stores other rules that are not perfect but passed the Rule_Strength threshold (i.e. rules having an acceptable error rate). The sorting procedure (Figure 2) will be fully applied on the secondary rule set and partially applied (Line 2 onward) on the primary rule set, since rules in the primary classifier have similar expected accuracy Test data class allocation step Once the rules are derived and sorted in the classifier (primary and secondary), then they are ready to be utilised to allocate the right class to the unlabelled data (test data). It should be noted that there is one classifier where we name the top part the primary classifier part and the lower part the secondary classifier part. The basic idea behind our class allocation procedure is to limit the use of the default class rule which normally has an unacceptable error rate. This is the main reason for making a secondary classifier part which normally contains high-predictive rules that are ignored by RI algorithms which are based on PRISM. Rules in the secondary classifier are only used when rules in the primary classifier are unable to classify a test datum. For a test datum t i, the DRI algorithm goes over the rules in the classifier s primary part, and the first rule that has common items with t i classifies it. This means all items in the fired rule body must be contained in t i. In the case that no rules in the primary part are found, DRI moves on to the secondary part and applies the same procedure. Now, in the case that no rules are found in both sets, then our algorithm will take on the first partially matching rule. By partially we mean any item of the rule s body is matching any item in t i. The DRI class allocation procedure minimises the utilisation of the default class to almost no use at all. This should positively affect the overall classification accuracy of the classifier. Figure 3 displays the class allocation procedure which we propose.

12 244 I. Qabajeh et al. Figure 3. Class allocation procedure of the dynamic rule induction (DRI) algorithm DRI vs other RI algorithms There are limited numbers of methods that are based on PRISM in data mining that have been developed to improve PRISM s output quality and efficiency in finding the rules, such as P-PRISM. This section highlights the primary distinctions between our method and those that are PRISM-based:. The PRISM algorithm utilises the expected accuracy as a measure of a rule s goodness, and only generates the rule when its expected accuracy is 100%. This unfortunately results in many rules with low data coverage. By contrast, our algorithm generates perfect rules as well as high-coverage, near-perfect rules (low-error rules). This results in a lower number of perfect rules in the primary classifier and allows other good rules to play a role in the classification phase of test data, which eventually reduces the use of the default class rule;.. There is no rule sorting in PRISM and its successors, whereas DRI discriminates amongst rules based on three new criteria in RI. This allows, in some cases, lower ranked rules to classify test data;. The DRI algorithm uses two new thresholds, named freq and Rule_Strength, to minimise the search space of attribute values while constructing the rules. This makes the process of rule discovery more efficient. On the other hand, PRISM has to evaluate the expected accuracy of all attribute values each time it builds a rule, which could be problematic when the training data set s dimensionality is large;.. PRISM utilises a static expected accuracy and frequency for each attribute value associated with the class that were computed once from the training data set during the first scan. On the other hand, DRI enables each attribute value to have dynamic expected accuracy and frequency that often change whenever a rule is derived. This ensures that each attribute value has its true data representation while building the classifier Example of the proposed algorithm In this section, we go through a detailed example to simplify the way the DRI algorithm works in finding the rules and producing the classifier. Assume the minimum freq

13 Table 1. Sample data set (Witten and Frank, 2005). Journal of Management Analytics 245 Inst. ID Age Spectacle -prescrip Astigmatism Tear-prod-rate Class 1 Young Myope No Reduced None 2 Young Myope No Normal Soft 3 Young Myope Yes Reduced None 4 Young Myope Yes Normal Hard 5 Young Hypermetrope No Reduced None 6 Young Hypermetrope No Normal Soft 7 Young Hypermetrope Yes Reduced None 8 Young Hypermetrope Yes Normal Hard 9 Pre-presbyopic Myope No Reduced None 10 Pre-presbyopic Myope No Normal Soft 11 Pre-presbyopic Myope Yes Reduced None 12 Pre-presbyopic Myope Yes Normal Hard 13 Pre-presbyopic Hypermetrope No Reduced None 14 Pre-presbyopic Hypermetrope No Normal Soft 15 Pre-presbyopic Hypermetrope Yes Reduced None 16 Pre-presbyopic Hypermetrope Yes Normal None 17 Presbyopic Myope No Reduced None 18 Presbyopic Myope No Normal None 19 Presbyopic Myope Yes Reduced None 20 Presbyopic Myope Yes Normal Hard 21 Presbyopic Hypermetrope No Reduced None 22 Presbyopic Hypermetrope No Normal Soft 23 Presbyopic Hypermetrope Yes Reduced None 24 Presbyopic Hypermetrope Yes Normal None Table 2. The frequency and expected accuracy of attribute values connected with class None. Candidate ruleitem Frequency Expected accuracy Age = young, None 4 4/8 Age = presbyopic, None 6 6/8 Age = pre-presbyopic, None 5 5/8 Spectacle-prescrip = myope, None 7 7/12 Spectacle-prescrip = hypermetrope, None 8 8/12 Astigmatism = No, None 7 7/12 Astigmatism = Yes, None 8 8/12 Tear-prod-rate = reduced, None 12 12/12 Tear-prod-rate= normal, None 3 3/12 threshold and Rule_Strength are set to 3 and 50%, respectively. Suppose the data set below was given (Table 1). DRI starts with class None, and computes candidate ruleitems as shown in Table 2. The highest accuracy attribute value is Tear-prod-rate = reduced with all of its appearances are associated with class None and therefore we generate the first rule as follows: RULE (1) If Tear-prod-rate = reduced then None (12/12).

14 246 I. Qabajeh et al. Table 3. New frequency of attribute values linked with class None. after generating rule 1. Frequency computations from the original data set before R1 and after R1 Candidate ruleitem Original frequency New frequency Status Age = young, None 4 0 Deleted after R 1 is derived Age = presbyopic, None 6 2 Deleted after R 1 is derived Age = pre-presbyopic, None 4 0 Deleted after R 1 is derived Spectacle-prescrip = myope, 7 1 Deleted after R 1 is derived None Spectacle-prescrip = 8 2 Deleted after R 1 is derived hypermetrope, None Astigmatism = No, None 7 1 Deleted after R 1 is derived Astigmatism = Yes, None 8 2 Deleted after R 1 is derived Tear-prod-rate = normal, None 3 3 Keep for possible secondary classifier Table 4. The frequency and expected accuracy of attribute values connected with class Hard. Candidate ruleitem Frequency Expected accuracy Frequency status Age = young, Hard 2 2/8 Remove Age = presbyopic, Hard 1 1/8 Remove Age = pre-presbyopic, Hard 1 1/8 Remove Spectacle-prescrip = myope, Hard 3 3/12 Spectacle-prescrip = hypermetrope, Hard 1 1/12 Remove Astigmatism = Yes, Hard 4 4/12 Tear-prod-rate = normal, None 4 4/12 Then, we remove all data instances covered by rule 1 and update the frequencies of all attribute values that have appeared in the removed instances as shown in Table 3. We stop generating rules for class None since the remaining attribute value, i.e. Tearprod-rate = normal, has high error besides being the only attribute value left. So we keep it for the secondary classifier in case it later passes the Rule_Strength threshold inputted by the end user. We move on to class Hard, Table 4 displays the expected accuracy computed from the training data set for attribute values linked with this class. In Table 4, we notice only three strong attribute values, so all other weak items are removed as shown in the last column of the table. It should be noted that other RI algorithms keep them. Now, based on the computations shown in Table 4, there are two attribute values with similar expected accuracies (4/12) so we select one randomly, i.e. Astigmatism = Yes, and add it to the empty rule as follows: RULE (2) If Astigmatism = Yes Then Hard (4/12) We then separate the data instances associated with the current rule as shown in Table 5, and compute the expected accuracy again as depicted in Table 6. The best

15 Journal of Management Analytics 247 Table 5. Training instances linked with Astigmatism = Yes. Age Spectacle-prescrip Astigmatism Tear-prod-rate Class Young Myope Yes Normal Hard Young Hypermetrope Yes Normal Hard Pre-presbyopic Myope Yes Normal Hard Pre-presbyopic Hypermetrope Yes Normal None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope Yes Normal None Table 6. Updated frequency and expected accuracy of attribute values computed from Table 5. Candidate ruleitem Frequency Expected Accuracy Tear-prod-rate = normal, None 4 4/6 Spectacle-prescrip = myope, Hard 3 3/3 Table 7. The frequency and expected accuracy of attribute values connected with class Soft. Candidate ruleitem Frequency Expected Accuracy Frequency status Age = young, Soft 2 2/3 Remove Age = presbyopic, Soft 1 1/3 Remove Age = pre-presbyopic, Soft 2 2/3 Remove Spectacle-prescrip = myope, Soft 3 2/3 Spectacle-prescrip = hypermetrope, Soft 1 3/6 Remove Astigmatism = Yes, Soft 0 0 Remove Astigmatism = no, Soft 5 5/6 Tear-prod-rate = normal, None 5 5/8 and only attribute value left is Spectacle-prescrip = myope with 3/3 accuracy, so we add it to the current rule as follows: RULE (2) If Astigmatism = Yes and Spectacle-prescrip = myope Then Hard 3/3. Only one instance is left uncovered for class Hard in the training data, so we stop generating rules for this class since this attribute value fails the frequency requirement Table 8. Training instances linked with Astigmatism = No. Inst. ID Age Spectacle-prescrip Astigmatism Tear-prod-rate Class 2 Young Myope No Normal Soft 6 Young Hypermetrope No Normal Soft 10 Pre-Presbyopic Myope No Normal Soft 14 Pre-Presbyopic Hypermetrope No Normal Soft 18 Presbyopic Myope No Normal None 22 Presbyopic Hypermetrope No Normal Soft

16 248 I. Qabajeh et al. Table 9. Updated frequency and expected accuracy of attribute values computed from Table 8. Candidate ruleitem Frequency/(Support) Expected Accuracy Spectacle-prescrip = myope, Soft 3 2/3 Tear-prod-rate = normal, None 5 5/6 Table 10. Remaining unclassified instances after generating rule 3. Inst. ID Age Spectacle-prescrip Astigmatism Tear-prod-rate Class 8 Young Hypermetrope Yes Normal Hard 16 Pre-presbyopic Hypermetrope Yes Normal None 18 Presbyopic Myope No Normal None 24 Presbyopic Hypermetrope Yes Normal None to make a new rule. We move on to class Soft ; Table 7 displays the expected accuracies from the training data set for the attribute values linked with this class. We notice that thereare five weak attribute values so we remove them, as shown in the last column of the table. We select attribute value Astigmatism = No since it has the largest expected accuracy, i.e. 5/6. We build the following rule: RULE (3) If Astigmatism = No Then Soft (5/6). Then we separate the data instances for this rule as shown in Table 8, and calculate again the expected accuracy from Table 8 as shown in Table 9. According to Table 9, the expected accuracy remained the same as the current rule, i.e. 5/6, so we generate the current rule and remove all instances associated with it. We have generated rule 3 despite it being not perfect, since it passed the Rule_Strength threshold. Nevertheless, this rule will be added below the perfect rules. Table 10 shows the remaining unclassified instances, and all candidate attribute values that Table 11. data set. The frequency and expected accuracy of uncovered attribute value in the training None Hard Candidate ruleitem Frequency/ (Support) Expected Accuracy Frequency status Status Age = young 0 1 Remove Age = presbyopic 2 0 Remove Age = pre-presbyopic 1 0 Remove Spectacle-prescrip = myope 1 Remove Spectacle-prescrip = 2 2/3 1 1/3 Remove hypermetrope, Astigmatism = Yes 2 2/3 1 1/3 Remove Astigmatism = no 1 Remove Tear-prod-rate = normal 3 3/4 1 1/4

17 Journal of Management Analytics 249 become weak are shown in Table 11 along with their updated frequencies which eventually are removed. RULE (3) If Astigmatism = No Then Soft (5/6) Based on Table 11, there is one attribute value with an acceptable frequency and which has an expected accuracy above Rule_Strength, which is Tear-prod-rate = normal. Therefore, we build a rule for it as follows: RULE (4) If Tear-prod-rate = Normal (3/4) and delete all its training instances. At this point we are only left with one instance in the training data set associated with class Hard. This is our default class rule. In our example, four rules have been devised from the original training data set, two of which are primary and two of which are secondary. 4. Data and experimental results In this section, we test the proposed algorithm on different data sets related to University of California Irvine data collection (Merz & Murphy, 1996). Our choice of the University of California Irvine data sets is based on different features like the number of attributes, the data set size, the number of classes and the type of the available attribute. For fair comparison, data sets of different sizes have been chosen. Table 12 displays the details of each data set used in the experiments. Different evaluation criteria are used to conduct the experiments and results analysis, mainly:. Classification accuracy (%);. Number of rules, particularly between DRI and PRISM algorithms. Different classification algorithms in data mining have been chosen to evaluate the general performance of the DRI algorithm with respect to classifiers predictive accuracy and rules. The majority of the chosen algorithms fall under the category of RI, and these are RIPPER and PRISM. In addition, we have selected a known decision tree algorithm called C4.5 to further evaluate DRI. The reason for picking these algorithmsis is due to the fact that most of them employ a similar learning methodology to DRI, with the exception of C4.5 which uses an information theory measure based on entropy to build a decision tree classifier. Table 12. Data sets characteristics. Dataset No. of classes No. of attributes No. of instances Contact lenses Vote Weather Labour Glass Iris Diabetes Segment Zoo Sonar Tic-Tac

18 250 I. Qabajeh et al. Figure 4. Average classification accuracy (%) for the considered algorithms on the University of California Irvine data sets. The experiments of the proposed algorithm have been conducted using a Java prototype, whereas all remaining algorithms have been tested on WEKA. is an open-source Java-based platform that was developed at the University of Waikato, New Zealand. It contains different implementations and evaluation measures of data mining and machine learning methods for tasks including classification, clustering, regression, association rules and feature selections. All experiments were conducted on a computing machine with a 1.7-GHz processor. The average accuracy produced by the considered algorithm on the 10 University of California Irvine data sets is displayed in Figure 4. In the figure, it is clear that the DRI algorithm performed on average extremely well when compared to the RIPPER and PRISM RI algorithms. In fact, on average, DRI gained higher classification by 1.51 and 4.58% than the RIPPER and PRISM algorithms, respectively. This gain resulted from the dynamic rules generated by this algorithm which keeps the highest, fittest rules besides the perfect rules, which leads to improvement of the predictive power of DRI. On the other hand, the decision tree algorithm C4.5 has slightly higher classification accuracy on average than our algorithm. To be more precise, C4.5 has on average 0.67% higher accuracy than DRI on the University of California Irvine data sets used in the experiments. This is good because of the high predictive power of C4.5, besides the excessive pruning which is used by this algorithm during the process of constructing the classifier. The fact that DRI is competitive in regards to accuracy compared to C4.5 and derives on average higher predictive classifiers than its own kind is an achievement. We further evaluated the proposed algorithms per University of California Irvine data set and compared its predictive with the three other classification algorithms. Figure 5 shows the classification accuracy of all algorithms used in the experiments. In the figure, DRI has outperformed most of the considered classification algorithms on the University of California Irvine data sets used in the experiments. In particular, the won lost tie records of DRI against Repeated Incremental Pruning to Produce Error Reduction, PRISM and C4.5 are 6 4 0, 7 1 2, and 3 1 6, respectively. The new rule evaluation method of DRI has impacted positively on the classification

19 Journal of Management Analytics 251 Figure 5. The classification accuracy (%) for the considered algorithms on the 10 University of California Irvine data sets. Figure 6. data sets. The classifier size of PRISM and dynamic rule induction (DRI) algorithms on the performance of this algorithm by only allowing rules that are statistically fit to participate in the classifier. These rules are the ones utilised during the class prediction step. The number of rules in the classifiers produced by PRISM and our algorithm is depicted in Figure 6. It is clear from the figure that PRISM generates on average larger classifiers than the DRI algorithm does, due to the fact that PRISM has no pruning strategies at all. A dynamic update of candidate items when rules are generated has a good impact on reducing the search space of the items, and therefore a lower number of candidate strong items is presented. In other words, the removal of the overlap among rules in the training instances when each rule is generated also has a positive impact on the classifier size. In particular, DRI ensures that all candidate strong items expected accuracies as well as frequencies are amended on the fly

20 252 I. Qabajeh et al. whenever a rule gets produced, which definitely minimises the available number of candidate strong items for the next rule. 5. Conclusions Rule induction (RI) is one of the known classification approaches in data mining that attracted researchers due to its simple output and applicability in several domains. However, RI, especially the PRISM algorithm, has a few substantial issues, including ignoring rules with high training data coverage and not having 100% accuracy. Furthermore, PRISM has no defined rule-pruning strategy, which may lead to the generation of high numbers of low-data-coverage rules. Another serious problem in RI, especially greedy algorithms like PRISM, is that whenever a rule is produced and all of its covered data are removed from the training data set, these algorithms do not take into account the impact of the removed data on other waiting candidate rules. This may result in the generation of many redundant rules and could also increase the search space for attribute values. In response to the above-raised issues, we proposed in this article a dynamic rule induction strategy (DRI) that utilises two thresholds to reduce the search space, and guarantees the production of not only perfect rules but high-quality rules as well. Moreover, DRI discards all data instances when a rule is generated, and amends the frequencies of all remaining candidate rules attribute values that appeared in the removed instances. This makes fairer rules since the actual rule body attribute value frequencies are incrementally updated rather than computed once from the original training data set. Experimental results using 10 University of California Irvine data sets, and using different RI algorithms, were conducted. The results revealed that the DRI algorithm is highly competitive with respect to classification accuracy compared to the PRISM, RIPPER and C4.5 algorithms. Moreover, DRI consistently produced a lower number of rules than PRISM on the data sets we considered. In the near future, we intend to extend DRI to deal with unstructured data sets in order to handle the challenging problem of multi-label classification. References Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection-based associative classification data mining. Expert Systems with Applications, 41(13), Abdelhamid, N., Ayesh, A., Thabtah, F., Ahmadi, S., & Hadi, W. (2012). MAC: A multiclass associative classification algorithm. Journal of Information and Knowledge Management (JIKM), 11(2), Abdelhamid, N., & Thabtah, F. (2014). Associative classification approaches: Review and comparison. Journal of Information and Knowledge Management (JIKM), 13, Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), Cohen, W. (1995). Fast effective rule induction. In Morgan Kaufmann (Ed.), Proceedings of the 12 th international conference on machine learning (pp ). Tahoe City, CA, USA. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), Coulter, M. (2012). Strategic management in action. Pearson Education. June 2012, 6/E, ISBN- 10: Frank, E., & Witten, I. (1998). Generating accurate rule sets without global optimisation. In Morgan Kaufmann (Ed.), Proceedings of the fifteenth international conference on machine learning (pp ). San Francisco, CA, USA.

A Classification Rules Mining Method based on Dynamic Rules' Frequency

A Classification Rules Mining Method based on Dynamic Rules' Frequency A Classification Rules Mining Method based on Dynamic Rules' Frequency Issa Qabajeh Centre for Computational Intelligence, De Montfort University, Leicester, UK P12047781@myemail.dmu.ac.uk Francisco Chiclana

More information

Class Strength Prediction Method for Associative Classification

Class Strength Prediction Method for Associative Classification Class Strength Prediction Method for Associative Classification Suzan Ayyat Joan Lu Fadi Thabtah Department of Informatics Huddersfield University Department of Informatics Huddersfield University Ebusiness

More information

Challenges and Interesting Research Directions in Associative Classification

Challenges and Interesting Research Directions in Associative Classification Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo

More information

Rule Pruning in Associative Classification Mining

Rule Pruning in Associative Classification Mining Rule Pruning in Associative Classification Mining Fadi Thabtah Department of Computing and Engineering University of Huddersfield Huddersfield, HD1 3DH, UK F.Thabtah@hud.ac.uk Abstract Classification and

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Pruning Techniques in Associative Classification: Survey and Comparison

Pruning Techniques in Associative Classification: Survey and Comparison Survey Research Pruning Techniques in Associative Classification: Survey and Comparison Fadi Thabtah Management Information systems Department Philadelphia University, Amman, Jordan ffayez@philadelphia.edu.jo

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Data Mining Algorithms: Basic Methods

Data Mining Algorithms: Basic Methods Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Homework 1 Sample Solution

Homework 1 Sample Solution Homework 1 Sample Solution 1. Iris: All attributes of iris are numeric, therefore ID3 of weka cannt be applied to this data set. Contact-lenses: tear-prod-rate = reduced: none tear-prod-rate = normal astigmatism

More information

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges. Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Inducer: a Rule Induction Workbench for Data Mining

Inducer: a Rule Induction Workbench for Data Mining Inducer: a Rule Induction Workbench for Data Mining Max Bramer Faculty of Technology University of Portsmouth Portsmouth, UK Email: Max.Bramer@port.ac.uk Fax: +44-2392-843030 Abstract One of the key technologies

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Structure of Association Rule Classifiers: a Review

Structure of Association Rule Classifiers: a Review Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4. Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that

More information

Association Rule Mining and Clustering

Association Rule Mining and Clustering Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:

More information

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017 International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 17 RESEARCH ARTICLE OPEN ACCESS Classifying Brain Dataset Using Classification Based Association Rules

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Input: Concepts, Instances, Attributes

Input: Concepts, Instances, Attributes Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

A review of associative classification mining

A review of associative classification mining The Knowledge Engineering Review, Vol. 22:1, 37 65. Ó 2007, Cambridge University Press doi:10.1017/s0269888907001026 Printed in the United Kingdom A review of associative classification mining FADI THABTAH

More information

Review and Comparison of Associative Classification Data Mining Approaches

Review and Comparison of Associative Classification Data Mining Approaches Review and Comparison of Associative Classification Data Mining Approaches Suzan Wedyan Abstract Associative classification (AC) is a data mining approach that combines association rule and classification

More information

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Annie Chen ANNIEC@CSE.UNSW.EDU.AU Gary Donovan GARYD@CSE.UNSW.EDU.AU

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

A Two Stage Zone Regression Method for Global Characterization of a Project Database

A Two Stage Zone Regression Method for Global Characterization of a Project Database A Two Stage Zone Regression Method for Global Characterization 1 Chapter I A Two Stage Zone Regression Method for Global Characterization of a Project Database J. J. Dolado, University of the Basque Country,

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Summary. Machine Learning: Introduction. Marcin Sydow

Summary. Machine Learning: Introduction. Marcin Sydow Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

ASSOCIATIVE CLASSIFICATION WITH KNN

ASSOCIATIVE CLASSIFICATION WITH KNN ASSOCIATIVE CLASSIFICATION WITH ZAIXIANG HUANG, ZHONGMEI ZHOU, TIANZHONG HE Department of Computer Science and Engineering, Zhangzhou Normal University, Zhangzhou 363000, China E-mail: huangzaixiang@126.com

More information

Image Mining: frameworks and techniques

Image Mining: frameworks and techniques Image Mining: frameworks and techniques Madhumathi.k 1, Dr.Antony Selvadoss Thanamani 2 M.Phil, Department of computer science, NGM College, Pollachi, Coimbatore, India 1 HOD Department of Computer Science,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

AMOL MUKUND LONDHE, DR.CHELPA LINGAM International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Comparative Study of Clustering Algorithms using R

Comparative Study of Clustering Algorithms using R Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES

AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES ABSTRACT Wael AlZoubi Ajloun University College, Balqa Applied University PO Box: Al-Salt 19117, Jordan This paper proposes an improved approach

More information

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

ORT EP R RCH A ESE R P A IDI!  #$$% &' (# $! R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method. IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED ROUGH FUZZY POSSIBILISTIC C-MEANS (RFPCM) CLUSTERING ALGORITHM FOR MARKET DATA T.Buvana*, Dr.P.krishnakumari *Research

More information

The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti

The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti Information Systems International Conference (ISICO), 2 4 December 2013 The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

Clustering of Data with Mixed Attributes based on Unified Similarity Metric Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1

More information

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Part I. Instructor: Wei Ding

Part I. Instructor: Wei Ding Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute

More information

Associating Terms with Text Categories

Associating Terms with Text Categories Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

A Survey on Algorithms for Market Basket Analysis

A Survey on Algorithms for Market Basket Analysis ISSN: 2321-7782 (Online) Special Issue, December 2013 International Journal of Advance Research in Computer Science and Management Studies Research Paper Available online at: www.ijarcsms.com A Survey

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance

More information

A Novel Algorithm for Associative Classification

A Novel Algorithm for Associative Classification A Novel Algorithm for Associative Classification Gourab Kundu 1, Sirajum Munir 1, Md. Faizul Bari 1, Md. Monirul Islam 1, and K. Murase 2 1 Department of Computer Science and Engineering Bangladesh University

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information