Feature selection has long been an active research topic in machine learning. Beginning

Size: px
Start display at page:

Download "Feature selection has long been an active research topic in machine learning. Beginning"

Transcription

1 E n h a n c i n g I n f o r m a t i o n Parameter Tuning for Induction-Algorithm- Oriented Feature Elimination Ying Yang and Xindong Wu, University of Vermont Feature selection has long been an active research topic in machine learning. Beginning with an empty set of features, it selects features most necessary for learning a target concept. Feature elimination, a newer technique, starts out with a full set of features and eliminates those most unnecessary for learning the target concept. At first Parameter tuning To the best of our knowledge, researchers have studied only one issue about IAOFE parameter tuning the search strategy that searches the feature space for candidates to be eliminated. Kohavi and John compared a hill-climbing search and a best-first search. 2 They also briefly discussed strategies proposed by other researchers, such as beam, bidirectional, simulated- Induction-algorithmoriented feature elimination, with particular parameter configurations, can achieve higher predictive accuracy than existing popular feature selection approaches. The authors propose two sets of well-tuned parameters based on empirical analysis. glance, these two approaches seem to be minor variants on the same theme. However, feature elimination tends to be more effective, 1 can capture interacting features more easily, and suffers less from feature interaction than feature selection. 2 Because the most unnecessary features are eliminated from the beginning, they will not mislead the induction process in terms of efficiency or accuracy. 3 Traditional feature elimination involves measuring each feature s relevance to the target concept. 4,5 The process uses general characteristics of the training data to exclude some features and include others; it doesn t care what induction algorithm will use those output features. Because different algorithms have different biases, this blindness can have severe implications for inductive learning. 6 This problem is exacerbated by the fact that the goal of most inductive learning is to maximize predictive accuracy, not just discover relevant features. To address these issues, George John, Ron Kohavi, and Karl Pfleger proposed induction-algorithmoriented feature elimination (IAOFE). 6 Kohavi and John augmented the method by proposing the wrapper approach, which searches for unnecessary features using the induction algorithm of interest. 2 Because of this research and others efforts, IAOFE can improve predictive accuracy for induction algorithm families such as decision trees, naïve Bayes, Bayes networks, and nearest neighbors However, because of the way it works, abundant tunable parameters control IAOFE. To understand how to achieve the best performance possible from IAOFE, we conducted a comprehensive analysis of IAOFE parameter tuning. The IAOFE approach For each data set, IAOFE eliminates the features that don t enhance an induction algorithm s learning performance for a particular data domain. Figure 1 presents an algorithm that delineates IAOFE s generic framework in the context of classification learning. You can couple IAOFE with any induction algorithm for a data domain as long as you use the same algorithm for induction later. So, you can tailor the elimination to a particular algorithm and domain, or you can eliminate all features or no features. If IAOFE decides to eliminate all features, using the prior distribution of the classes is sufficient for classification /04/$ IEEE IEEE INTELLIGENT SYSTEMS Published by the IEEE Computer Society

2 annealing, and genetic search. These alternative strategies produced no significant differences. Most algorithms from the IAOFE family use hill-climbing search. We propose four IAOFE parameters: accuracy estimation, tolerance, tie breaker, and voting. Previous implementations of IAOFE usually explicitly use cross-validation for accuracy estimation and implicitly use the meticulous approach for tolerance and a random or first-best approach for tie breaking. They don t use voting at all. Accuracy estimation Feature elimination involves finding a set of unnecessary features under some objective function. Common objective functions include classification accuracy, classifier structure, and minimization of the features retained. 6 The first of these is used most often because a typical goal of inductive learning is to maximize classification accuracy on previously unseen data (predictive accuracy). So, for IAOFE we focus on using classification accuracy to measure whether to eliminate a feature. This parameter, which we call accuracy estimation, can take three values. Training data accuracy. This is possibly the most natural approach to estimating the accuracy of a classifier that s been induced from a set of training data. 4 It involves inducing the classifier from the entire set of training data and then estimating the learned classifier s accuracy on that same data set. However, this doesn t show how well the classifier will perform when it makes new predictions for data it hasn t already seen. So, the resulting IAOFE might be susceptible to overfitting. On the other hand, using training-data accuracy is efficient. It can ameliorate IAOFE s major disadvantage its computational overhead when repeatedly sampling training data to evaluate each feature. Cross-validation. Cross-validation is one way to reduce overfitting. Instead of estimating trainingdata accuracy, it estimates generalization accuracy on the part of training data that are held out when training the classifier. In k-fold cross-validation, 13,14 you randomly split the training-data set D into k mutually exclusive subsets (the folds) of approximately equal size, D 1, D 2,,D k, and then train and test the classifier k times. Each time t {1, 2,, k}, you train on D D t and test on D t. The cross-validation estimate of accuracy is the overall number of correct classifications divided by the number of instances in the data set. Cross-validation can be either stratified or unstratified. In stratified cross-validation, the folds contain approximately the same proportion of classes as the original data set. This is good if you know that the training data embodies the correct proportion of Input: training data T containing a set of features F and class labels C; and an induction algorithm IA Output: a subset of F that can optimize the induction performance of IA { eliminated_features = empty_set; candidates = empty_set; WHILE ( eliminated_features <= F ) improve = 0; benchmark_performance = IA(T (F,C )); FOREACH feature i performance = IA(T(F i),c )); if compare(performance, benchmark_performance) == decrease continue; elseif compare(performance, benchmark_performance) == nochange push (candidates, i); improve = 1; else clear (candidates); push (candidates, i); benchmark_performance = performance; improve = 1; END of FOREACH if (improve == 0) break; eliminated_features = eliminated_features + candidates; END of WHILE F = F eliminated_features; return (F); } Figure 1. An algorithm delineating the generic framework of the induction-algorithm-oriented feature elimination method, in the context of classification learning. classes, where stratified cross-validation can reduce the variance among the estimates. However, for many real-world data, you don t know whether the training data are representative in the class proportion, so imposing that proportion on the cross-validation folds might introduce a bias. In these cases, unstratified cross-validation might be appropriate: you create the folds randomly without considering class proportion. Bootstrap. Bootstrap was introduced by Bradley Efron 15 and later fully described by Efron and Robert Tibshirani. 16 It avoids overfitting by estimating the generalization accuracy based on resampling. In the simplest form of Bootstrap, instead of repeatedly analyzing data subsets as cross-validation does, you repeatedly analyze data subsamples. Each subsample is a random sample, with replacement from the full sample. Tolerance For the convenience of expression, we abbreviate the classification performance resulting from eliminating some feature as current performance, or CP, and the best performance up to this stage as benchmark performance, or BP. IAOFE decides to eliminate a feature when doing so doesn t degrade the induced classifier s performance that is, when CP (1 ) BP. The parameter, tolerance, controls how tolerant IAOFE can be. There are three possible settings: Meticulous. = 0. IAOFE compares CP with the exact value of BP to see whether performance decreases at all. Conservative. < 0. IAOFE eliminates a feature only when doing so increases classification performance above a certain degree. The elimination is conservative because it picks only malicious features that can damage the induction; it retains the features that don t affect (neither damage nor contribute to) induction performance. However, if the absolute value of e is set too high, IAOFE MARCH/APRIL

3 E N H A N C I N G I N F O R M A T I O N might not eliminate any features at all, and the whole procedure will quickly terminate. Aggressive. > 0. As long as eliminating a feature doesn t decrease performance beyond its tolerance, IAOFE will eliminate it. By this means, IAOFE tends to maximize the number of eliminated features at the cost of insignificantly decreasing classification accuracy. This can be useful in situations where the compactness of the induced concept is a major concern. Tie breaker Eliminating different attributes often results in identical predictive accuracies of an induction algorithm. We call these attributes candidates.we need a tie breaker parameter to decide which ones to actually eliminate. It can have various settings: Random. The most intuitive way is to randomly pick a feature from the pool of candidates. First best. Follow the practice of ID3-like induction algorithms that is, choose the first encountered candidate that produces the highest improvement in accuracy. Classifier structure. All else being equal, eliminate a feature if doing so will produce the most compact structure of the resulting classifier (if there is a structure at all, such as a decision tree). This can be desirable if judged by Occam s razor. Wholesale. Eliminate all candidates at once and begin the next loop to look for new candidates. This risks eliminating a necessary feature if it has a redundant copy. But this can significantly speed up IAOFE by freeing it from loops of reevaluating and eliminating one candidate at a time. This boost in efficiency can sometimes overshadow the risk of wrongly eliminating candidates, especially when efficiency is a big issue. Voting Although it s effective, IAOFE tends to be computationally expensive when there are large amounts of training data or many features. This problem is compounded by the fact that large data sets are routinely involved in modern research. So, we have a fourth parameter, voting, to address this scenario. The idea is to randomly partition a large data set into k parts, apply IAOFE to each part, and let the returned results from each part vote for the final decision on what features to eliminate. In this way, IAOFE can avoid dealing directly with large amounts of data, instead adopting a divide-and-conquer approach. Another important use of this mechanism is when you apply IAOFE to distributed data, which is a common practice in data mining. In this scenario, instead of fusing all the data together, which is normally impossible, IAOFE can work on each data portion in parallel. This is both more feasible and more efficient. The voting parameter has two settings: Unanimous. Eliminate a feature only when each part of the data supports the elimination. Majority. Eliminate a feature as long as the majority of the data parts agree to do so. When tuning a parameter, we kept other parameters settings invariant so that only this parameter s settings would produce the performance differences. Experiments We experimented with tuning parameters for IAOFE to find effective configurations of parameter settings. We compare IAOFE with those configurations against existing popular feature selection approaches. Design We used the C4.5 induction algorithm as our example because it s one of the most extensively used. For a data set, we conducted a threefold unstratified cross-validation. In each fold, we withheld the test data, conducted IAOFE on the training data, and obtained the retained feature subset. Using the retained features, we then induced a decision tree classifier from the training data and applied it to the test data. We took various measurements such as classification accuracy, the number of eliminated features, and the decision tree size and averaged them on three folds. When tuning a parameter, we kept other parameters settings invariant so that only this parameter s settings would produce the performance differences. These three folds belonged to an outer loop, which differs from the inner loop that IAOFE might conduct on training data. The latter is where we tuned the parameters. We used unstratified cross-validation for the outer loop because it represents many real- world scenarios, where training data are only a sample of the whole population and the class distributions don t necessarily correspond with those of the whole population. In this way, we expected to have a better understanding of IAOFE s pros and cons as it s used in real-world applications. Data We weren t picky with the experimental data. We grabbed as many data sets as we could (36) from the University of California, Irvine s machine learning repository 17 because we believed that IAOFE could tune itself to different domains. However, because we planned to conduct cross-validation for evaluation, we excluded data sets with fewer than 100 instances: If the data set was extremely small, cross-validation could incur a very high variance and thus not be indicative of an approach s true performance. Table 1 lists the data sets, including the number of features and instances, in increasing order of the number of instances. We used two statistics to evaluate the experimental results. The arithmetic mean of a particular measurement (such as classification accuracy) across all data sets provides a gross indication of competing methods relative performance. It s debatable whether values in different data sets are commensurable and hence whether averaging across data sets is meaningful. Nonetheless, a low mean value indicates a tendency toward low values for individual data sets. The second statistic, win/lose/tie record, gives the number of data sets for which the decision tree classifier trained with IAOFE obtains a higher, lower, or equal classification accuracy compared with the classifier trained with alternative methods. If we apply a onetailed sign test to each record and the test result is significantly low (at the 0.05 critical level), the outcome is unlikely to be obtained by chance. Thus, the record of wins to losses represents a systematic underlying advantage of IAOFE with respect to the type of data sets studied IEEE INTELLIGENT SYSTEMS

4 Rivals One goal of this research is to identify effective parameter configurations that can make IAOFE outperform alternative methods. Considering the large amount of literature existing for feature selection methods, evaluating every one of them is impractical. However, several methods are most commonly cited in the literature and are most commonly used in practice. Chi square, stemming from statistics, 18 has a long history in feature selection. The Chi square value can measure the association strength between a categorical class and a categorical feature. We calculate it from the contingency table of the class and the feature and then compare it with a threshold value according to a confidence level. A value higher than the threshold can lead to a judgment that dependency exists between the class and the feature, and thus the feature will be selected. If a feature is numeric, it will normally be discretized before the Chi square approach is employed. Information gain and gain ratio have been popularized with the extensive use of decision tree induction. 19 Given entropy as a measure of data impurity with regard to class, information gain measures the expected reduction of entropy caused by partitioning the data according to a feature s different values. The higher the information gain a feature achieves, the more likely it will be chosen for the classification. However, information gain has a disadvantage in that it prefers features with a large number of values that partition the data into many small, pure subsets. This can lead to choosing features such as birthdate, name, ID number, and so on that have a poor generalization strength and thus aren t predictive for future unseen data. Gain ratio can overcome this problem by introducing split information, a term taking into account how the feature partitions the data. The more uniformly data are distributed among a feature s values, the higher its split information. Gain ratio equals information gain divided by split information. Although commonly used, information gain and gain ratio assume that features are independent of each other, which is often not true in real-world applications. Relief, another commonly cited approach in the literature, estimates features according to how well their values distinguish among instances that are close to each Table 1. Experimental data sets. Data set No. of No. of features instances Iris Teaching Hepatitis Hayes-roth Wine Wpbc Automobile Sonar Glass Heart Ecoli Bupa Ionosphere Horse-colic Monks Monks Monks Voting Led Cylinder-bands Crx Breast-cancer-wisconsin Pid Anneal Vehicle Ttt Vowel-context German 21 1,000 Cmc 10 1,473 Car 7 1,728 Segmentation 20 2,310 Splice 61 3,190 Krvskp 37 3,196 Sick 30 3,772 Mushroom 23 8,124 Nursery 9 12,960 other. 20 For that purpose, Relief for a given instance searches for its two nearest neighbors: one from the same class and the other from a different class. It selects the features that can differentiate between instances from different classes and at the same time have the same value for instances from the same class. The original Relief could only deal with two-class problems. Igore Kononenko further extended Relief to deal with multiclass problems. 21 Another common approach is Focus,which exhaustively examines all feature subsets and selects the minimal subset that is sufficient to determine the class. 22 However, this exhaustive search strategy is prohibitively expensive when the training data size is large. Because our experiments routinely included data sets involving many instances and features, Focus wasn t feasible, so we didn t evaluate it. Results and analysis There are four areas of experimental results for tuning each parameter. Accuracy estimation. For each data set, Table 2 presents the average classification accuracy (standard deviation results from the outer-loop threefold cross-validation) and the average number of eliminated features, corresponding to different settings of accuracy estimation. The table also presents as benchmarks those measurements of the original data where no feature elimination was applied. Training accuracy represents training data accuracy, CV_stratified represents stratified cross-validation, and CV_unstratified represents unstratified cross-validation. We used tenfold crossvalidation for both CV_stratified and CV_unstratified because it offers a decent evaluation for a classifier. 13 To make a fair comparison, we also used 10-time resampling for Bootstrap. The data showed that all settings can eliminate features while maintaining accuracy similar to the original data. CV_stratified, CV_unstratified, and Bootstrap each obtained a similar mean accuracy (in terms of resampling data to estimate accuracy). CV_stratified slightly beats CV_unstratified and Bootstrap, with its win/lose/tie records equal to 15/10/11 and 12/10/14, respectively. When we estimated accuracy using the training data (this didn t resample data at all), IAOFE achieved the highest mean accuracy. It also competed with the alternatives without any statistically significant loss (win/lose/tie records against CV_stratified, CV_unstratified, and Bootstrap are 12/17/7, 19/11/6, and 13/15/8, respectively). Tolerance. Table 3 shows that although con- MARCH/APRIL

5 E N H A N C I N G I N F O R M A T I O N Table 2. Parameter tuning for accuracy estimation. Accuracy (standard deviation) : Number of eliminated features Data set Original Training accuracy CV_stratified CV_unstratified Bootstrap Iris 92.7 (7.6) 92.7 (7.6) : (6.9) : (4.6) : (2.3) : 3 Teaching 33.8 (2.3) 40.4 (7.1) : (6.0) : (8.6) : (8.6) : 2 Hepatitis 80.0 (5.7) 80.7 (3.7) : (1.0) : (4.3) : (2.9) : 13 Hayes-roth 80.0 (5.5) 80.0 (5.5) : (5.5) : (5.5) : (5.5) : 1 Wine 92.7 (3.5) 91.0 (9.8) : (3.9) : (3.3) : (2.5) : 9 Wpbc 72.2 (6.1) 69.7 (9.2) : (8.3) : (1.7) : (8.7) : 27 Automobile 69.8 (5.9) 73.7 (8.0) : (5.7) : (2.4) : (1.4) : 16 Sonar 67.3 (3.6) 68.8 (5.1) : (7.4) : (7.9) : (3.1) : 41 Glass 94.4 (6.5) 95.3 (3.6) : (4.2) : (6.5) : (4.2) : 8 Heart 77.4 (2.8) 77.0 (7.4) : (2.8) : (2.6) : (7.9) : 7 Ecoli 81.0 (2.9) 80.1 (2.2) : (3.4) : (3.4) : (3.4) : 3 Bupa 63.2 (4.4) 66.1 (6.0) : (4.3) : (7.1) : (3.0) : 2 Ionosphere 90.6 (1.7) 88.3 (2.2) : (4.3) : (3.0) : (3.2) : 26 Horse-colic 83.2 (2.0) 82.3 (2.5) : (2.4) : (2.4) : (2.3) : 15 Monks (0.0) (0.0) : (0.0) : (0.0) : (0.0) : 3 Monks (1.1) 67.1 (1.1) : (1.1) : (1.1) : (1.1) : 5 Monks (0.0) (0.0) : (2.4) : (2.4) : (2.4) : 3 Voting 95.6 (2.2) 94.9 (1.6) : (1.4) : (1.4) : (1.4) : 13 Led (0.0) (0.0) : (0.0) : (0.0) : (0.0) : 2 Cylinder-bands 73.7 (4.8) 76.8 (7.1) : (6.8) : (3.1) : (2.6) : 17 Crx 86.2 (0.9) 85.2 (0.8) : (0.7) : (1.5) : (1.4) : 6 Breast-cancer-wisconsin 91.9 (4.4) 91.6 (1.7) : (3.7) : (3.9) : (4.8) : 8 Pid 72.7 (2.2) 73.4 (2.4) : (2.5) : (2.3) : (4.7) : 4 Anneal 89.2 (0.9) 91.2 (2.1) : (2.1) : (2.2) : (1.7) : 25 Vehicle 70.7 (2.8) 70.8 (0.9) : (2.4) : (1.7) : (1.6) : 5 Ttt 84.6 (3.4) 84.7 (3.3) : (4.4) : (3.5) : (4.9) : 1 Vowel-context 73.5 (1.7) 73.6 (1.6) : (2.3) : (0.3) : (1.6) : 3 German 71.7 (2.7) 68.8 (6.3) : (1.6) : (2.8) : (2.4) : 7 Cmc 49.3 (3.0) 49.7 (2.8) : (1.3) : (1.5) : (2.7) : 3 Car 90.0 (1.0) 90.0 (1.0) : (1.1) : (1.1) : (1.0) : 0 Segmentation 95.4 (1.2) 95.9 (0.7) : (0.5) : (1.6) : (1.4) : 8 Splice 93.4 (0.1) 91.9 (1.6) : (0.8) : (0.1) : (1.2) : 54 Krvskp 99.2 (0.4) 99.3 (0.3) : (0.3) : (0.3) : (0.3) : 15 Sick 97.7 (0.6) 97.7 (0.6) : (0.5) : (0.6) : (0.4) : 21 Mushroom (0.0) (0.0) : (0.0) : (0.0) : (0.0) : 17 Nursery 96.1 (0.4) 96.1 (0.4) : (0.4) : (0.4) : (0.4) : 0 Mean 82.7 (-) 82.9 (-) : (-) : (-) : (-) : 11 servative settings obtained the highest mean accuracy, they seldom eliminated any features. (We didn t round up the decimal for the average number of eliminated features because we wanted to show that elimination does happen but very rarely.) Even with a seemingly marginal value such as = 2 percent, elimination occurred rarely. This suggests that most unnecessary features don t affect classification accuracy. The malicious features that can actually damage classification accuracy aren t routine cases IEEE INTELLIGENT SYSTEMS

6 Table 3. Parameter tuning for tolerance. Accuracy (standard deviation) : Number of eliminated features Meticulous Conservative Aggressive Data set 5% 2% +2% +5% Iris 93.3 (4.6) : (7.6) : (7.6) : (5.3) : (5.3) : 3 Teaching 38.5 (8.6) : (8.9) : (8.9) : (8.6) : (9.4) : 3 Hepatitis 79.4 (4.3) : (6.0) : (6.8) : (0.3) : (2.9) : 17 Hayes-roth 80.0 (5.5) : (5.5) : (5.5) : (5.5) : (5.5) : 1 Wine 91.6 (3.3) : (3.5) : (3.5) : (6.8) : (1.8) : 10 Wpbc 72.2 (1.7) : (4.6) : (2.6) : (8.4) : (5.5) : 32 Automobile 66.4 (2.4) : (6.1) : (7.0) : (12.0) : (6.0) : 20 Sonar 66.4 (7.9) : (3.7) : (5.6) : (6.0) : (8.3) : 57 Glass 94.4 (6.5) : (6.5) : (6.5) : (3.6) : (3.6) : 9 Heart 75.9 (2.6) : (3.3) : (2.8) : (4.4) : (6.7) : 11 Ecoli 80.7 (3.4) : (2.9) : (2.9) : (2.7) : (3.7) : 4 Bupa 62.3 (7.1) : (5.6) : (7.1) : (7.4) : (3.8) : 3 Ionosphere 88.0 (3.0) : (1.7) : (1.8) : (4.8) : (5.0) : 30 Horse-colic 83.7 (2.4) : (2.0) : (2.0) : (2.6) : (3.6) : 20 Monks (0.0) : (0.0) : (0.0) : (0.0) : (0.0) : 3 Monks (1.1) : (1.1) : (1.1) : (1.1) : (1.1) : 5 Monks (2.4) : (0.0) : (0.0) : (2.1) : (1.4) : 4 Voting 95.4 (1.4) : (2.2) : (2.2) : (1.6) : (1.6) : 15 Led (0.0) : (0.0) : (0.0) : (0.0) : (0.0) : 2 Cylinder-bands 75.7 (3.1) : (4.8) : (2.5) : (1.6) : (2.4) : 32 Crx 85.8 (1.5) : (0.9) : (0.9) : (2.3) : (2.5) : 14 Breast-cancer-wisconsin 92.6 (3.9) : (4.4) : (3.7) : (4.1) : (1.0) : 9 Pid 72.8 (2.3) : (2.2) : (2.2) : (1.0) : (2.3) : 7 Anneal 90.1 (2.2) : (1.0) : (1.0) : (3.1) : (1.1) : 34 Vehicle 70.1 (1.7) : (2.8) : (2.1) : (2.0) : (1.5) : 13 Ttt 84.5 (3.5) : (3.4) : (3.4) : (2.5) : (3.2) : 4 Vowel-context 73.0 (0.3) : (1.7) : (2.2) : (2.4) : (2.9) : 7 German 72.7 (2.8) : (3.3) : (3.3) : (2.3) : (0.6) : 18 Cmc 51.9 (1.5) : (3.0) : (2.2) : (3.2) : (5.1) : 6 Car 89.9 (1.1) : (1.0) : (1.0) : (1.1) : (2.3) : 2 Segmentation 95.4 (1.6) : (1.2) : (1.2) : (2.0) : (1.2) : 16 Splice 93.4 (0.1) : (0.1) : (0.1) : (0.4) : (0.2) : 55 Krvskp 99.0 (0.3) : (0.4) : (0.4) : (1.0) : (2.3) : 29 Sick 97.6 (0.6) : (0.6) : (0.6) : (0.6) : (1.0) : 28 Mushroom (0.0) : (0.0) : (0.0) : (0.7) : (2.0) : 19 Nursery 96.1 (0.4) : (0.4) : (0.4) : (0.1) : (0.9) : 3 Mean 82.6 (-) : (-) : (-) : (-) : (-) : 15 On the contrary, aggressive settings can be much more effective in reducing the number of features however, at the cost of (sometimes greatly) decreasing accuracy. Fortunately, the meticulous style achieved a happy middle between the aggressive setting and the conservative setting. Its mean accuracy approximately matched that of the conservative ones, and it eliminated almost as many features as the aggressive ones did. Tie breaker. According to Table 4, as we pre- MARCH/APRIL

7 E N H A N C I N G I N F O R M A T I O N Table 4. Parameter tuning for tie breaker. Accuracy (standard deviation) : Number of eliminated features Data set Random First best Classifier structure Wholesale Iris 94.0 (5.3) : (5.3) : (5.3) : (20.0) : 2 Teaching 38.5 (8.6) : (8.6) : (8.6) : (8.6) : 3 Hepatitis 76.8 (0.3) : (3.1) : (0.3) : (2.8) : 18 Hayes-roth 80.0 (5.5) : (5.5) : (5.5) : (5.5) : 0 Wine 88.2 (6.8) : (3.4) : (5.1) : (15.8) : 12 Wpbc 69.7 (8.4) : (4.0) : (4.9) : (0.0) : 29 Automobile 66.4 (12.0) : (2.9) : (1.5) : (9.1) : 21 Sonar 72.1 (6.0) : (6.3) : (4.9) : (2.8) : 56 Glass 95.3 (3.6) : (3.6) : (3.6) : (3.6) : 9 Heart 76.7 (4.4) : (2.3) : (1.7) : (2.6) : 11 Ecoli 81.6 (2.7) : (5.7) : (6.3) : (21.4) : 4 Bupa 62.0 (7.4) : (7.4) : (7.4) : (7.4) : 1 Ionosphere 87.2 (4.8) : (1.8) : (0.9) : (6.6) : 32 Horse-colic 84.2 (2.6) : (2.6) : (4.0) : (4.0) : 19 Monks (0.0) : (0.0) : (0.0) : (0.0) : 3 Monks (1.1) : (1.1) : (1.1) : (1.1) : 6 Monks (2.1) : (2.1) : (2.1) : (2.1) : 4 Voting 95.6 (1.6) : (1.6) : (1.6) : (1.6) : 15 Led (0.0) : (0.0) : (0.0) : (0.0) : 2 Cylinder-bands 77.9 (1.7) : (3.2) : (2.7) : (3.7) : 31 Crx 85.2 (2.3) : (2.3) : (2.5) : (2.5) : 14 Breast-cancer-wisconsin 92.6 (4.1) : (2.6) : (2.8) : (16.8) : 9 Pid 71.9 (1.0) : (3.6) : (3.6) : (3.6) : 6 Anneal 88.7 (3.1) : (1.1) : (1.2) : (3.2) : 35 Vehicle 69.4 (2.0) : (2.0) : (2.9) : (1.3) : 14 Ttt 79.6 (2.5) : (3.7) : (3.7) : (2.8) : 4 Vowel-context 70.8 (2.5) : (2.0) : (0.9) : (8.1) : 8 German 68.7 (2.3) : (4.0) : (5.4) : (4.7) : 17 Cmc 54.9 (3.3) : (4.8) : (2.4) : (4.9) : 6 Car 89.9 (1.1) : (1.1) : (1.1) : (1.1) : 1 Segmentation 93.8 (2.0) : (2.0) : (1.1) : (1.8) : 18 Splice 91.7 (0.4) : (0.5) : (0.7) : (4.1) : 56 Krvskp 96.9 (1.0) : (0.1) : (0.7) : (2.9) : 32 Sick 96.3 (0.6) : (0.3) : (0.6) : (0.5) : 28 Mushroom 98.1 (0.7) : (0.3) : (0.4) : (1.6) : 22 Nursery 94.7 (0.1) : (0.1) : (0.1) : (0.1) : 2 Mean 82.1 (-) : (-) : (-) : (-) : IEEE INTELLIGENT SYSTEMS

8 dicted, the wholesale setting produces the lowest mean accuracy. This poor performance stems from the fact that wholesale tends to eliminate necessary features as well as their redundant copies. The settings of random, first best, and classifier structure (the decision tree size in our experiments) seemed to end in a tie. However, a closer look reveals that the classifier structure still had an advantage by using more heuristics. Compared with random, first best, and wholesale, it was more accurate more often than not, with the win/lost/ tie records being 12/10/14, 14/8/14, and 18/3/15, respectively. Voting. As we mentioned earlier, we re particularly curious about the voting strategy s effect on IAOFE when the data are large or distributed, or both. We tested the five largest data sets (each with more than 3,000 instances) from our experimental suite (see Table 5). Unanimous voting always produced better accuracy than majority voting. Also, compared with the original data, voting effectively eliminated features without any significant loss in accuracy. This suggests that voting is effective and feasible for IAOFE to deal with large and distributed data. Table 5. Parameter tuning for voting. Comparison with existing methods Based on empirical evidence, we suggest two configurations that can best explore IAOFE s power (see Table 6). Configuration 1 is most desirable overall, and Configuration 2 is the best configuration among resampling approaches. Thus, using training data accuracy for accuracy estimation, meticulous for tolerance, classifier structure for tie breaking, and unanimous for voting, IAOFE is more accurate than other popular feature selection approaches. We named IAOFE with these two configurations IAOFE_trainingAccuracy and IAOFE_CVstratified, respectively. We ran them against other popular feature selection approaches. Table 7 shows the results. This time, however, instead of reporting the average number of eliminated features as we did earlier, we report the average decision tree size induced by IAOFE for each data set. Both configurations can reduce the number of features without significantly degrading classification accuracy. Their win/lose/tie records against the original data are 14/14/8 and 12/17/7, respectively. IAOFE_trainingAccuracy (Configuration 1) achieves the highest mean accuracy among all the competing methods. As for win/lose/tie results, it obtained 24/9/3 with Chi square, 22/10/4 with information gain, 22/10/4 with gain ratio, and 23/9/4 with Relief, each resulting in a sign test below That is, the frequency of wins to losses for each comparison is statistically significant. This surprisingly good performance of IAOFE_ trainingaccuracy is of special interest because it suggests we can use IAOFE more extensively than we d thought for classifying unseen data. We had worried about IAOFE s computational overhead if it employed a resampling methodology for evaluating each feature. Apparently, we can avoid this problem by estimating training data accuracy. However, this method doesn t perform so well in reducing the decision tree size. In some data sets, such as Bupa and Cylinder-bands, it produced a tree size even larger than the original data. Because of the way we implemented IAOFE, the fold with the biggest contribution to the tree size increase always tended to have the biggest contribution to the accuracy increase as well. As we explained earlier, we chose classification accuracy as the objective function. In other words, IAOFE maximizes classification accuracy but doesn t optimize the classifier structure. You can make corresponding changes to the objective function if the classifier structure is a major concern. IAOFE_CVstratified (Configuration 2) also achieved a higher mean accuracy than existing methods. Its win/lose/tie records, although not as statistically significant as IAOFE_trainingAccuracy, also favored IAOFE (23/12/1 with Chi square, 21/14/1 with information gain, 18/16/2 with gain ratio, and 20/12/4 with Relief). However, IAOFE_CVstratified was stronger than IAOFE_trainingAccuracy in terms of improving the classifier structure, so it could be more useful when the classifier structure is as important as accuracy. Selecting values for important parameters has been crucial in heuristic learning. 23 For inductive learning, the question of whether a feature is relevant to the target concept is less useful than the question of whether a feature is necessary for learning the target concept given an induction algorithm. These two issues are not equivalent. 2 However, traditional feature selection and elimination approaches are blind to the induction algorithm that will finally use their output features. Thus, we expect that IAOFE, an approach that considers the biases of an induction algorithm and takes the algorithm itself as part of the evaluation function to eliminate features, will be very useful. However, with lots of parameters offering omnifarious performances, IAOFE Accuracy (standard deviation) : No. of eliminated features Data set Original Unanimous Majority Splice 93.4 (0.1) 93.2 (0.4) : (0.8) : 53 Krvskp 99.2 (0.4) 98.4 (0.2) : (0.6) : 27 Sick 97.7 (0.6) 97.0 (0.5) : (0.4) : 27 Mushroom (0.0) 99.8 (0.2) : (0.8) : 20 Nursery 96.1 (0.4) 95.7 (0.6) : (0.4) : 1 Mean 97.3 (-) 96.8 (-) : (-) : 26 Table 6. Suggested configurations for parameter settings. Configuration Accuracy estimation Tolerance Tie breaker Voting 1: IAOFE_trainingAccuracy Training data accuracy Meticulous Classifier structure Unanimous 2: IAOFE_CVstratified Stratified cross-validation Meticulous Classifier structure Unanimous MARCH/APRIL

9 E N H A N C I N G I N F O R M A T I O N Table 7. Comparison with existing methods. Accuracy (standard deviation) : Decision tree size IAOFE_ IAOFE_ Information Data set Original CVstratified trainingaccuracy Chi square gain Gain ratio Relief Iris 92.7 (7.6) : (6.9) : (7.6) : (5.3) : (5.3) : (5.3) : (5.3) : 8 Teaching 33.8 (2.3) : (6.0) : (7.1) : (1.0) : (1.0) : (4.8) : ( 2.5) : 21 Hepatitis 80.0 (5.7) : (0.3) : (2.2) : (1.0) : (1.0) : (0.3) : (2.2) : 8 Hayes-roth 80.0 (5.5) : (5.5) : (5.5) : (7.1) : (7.1) : (7.1) : (7.1) : 12 Wine 92.7 (3.5) : (4.4) : (2.5) : (3.5) : (2.5) : (2.6) : (4.3) : 8 Wpbc 72.2 (6.1) : (7.0) : (8.4) : (5.3) : (5.3) : (5.3) : (6.9) : 6 Automobile 69.8 (5.9) : (2.0) : (11.9) : (7.2) : (7.8) : (4.0) : (7.0) : 80 Sonar 67.3 (3.6) : (3.1) : (9.5) : (5.7) : (5.8) : (7.7) : (6.0) : 15 Glass 94.4 (6.5) : (3.6) : (3.6) : (5.9) : (6.5) : (6.4) : (6.4) : 11 Heart 77.4 (2.8) : (2.8) : (8.8) : (3.4) : (3.4) : (6.2) : (1.7) : 26 Ecoli 81.0 (2.9) : (3.4) : (3.1) : (5.8) : (2.1) : (1.0) : ( 2.1) : 28 Bupa 63.2 (4.4) : (4.3) : (6.0) : (4.8) : (4.8) : (4.8) : (3.3) : 26 Ionosphere 90.6 (1.7) : (1.0) : (1.7) : (0.9) : (1.7) : (2.2) : (3.9) : 12 Horse-colic 83.2 (2.0) : (4.0) : (2.3) : (0.4) : (0.4) : (3.5) : (0.9) : 8 Monks-1-test (0.0) : (0.0) : (0.0) : (13.5) : (13.5) : (13.5) : (0.0) : 41 Monks-2-test 67.1 (1.1) : (1.1) : (1.1) : (1.1) : (1.1) : (1.1) : (1.1) : 1 Monks-3-test (0.0) : (2.4) : (0.0) : (0.0) : (0.0) : (0.0) : (2.1) : 14 Voting 95.6 (2.2) : (1.4) : (1.7) : (1.6) : (1.6) : (1.6) : (0.8) : 7 Led (0.0) : (0.0) : (0.0) : (10.3) : (2.3) : (1.7) : (2.3) : 11 Cylinder-bands 73.7 (4.8) : (5.5) : (2.4) : (1.3) : (1.3) : (1.2) : (3.2) : 49 Crx 86.2 (0.9) : (2.3) : (1.3) : (2.7) : (2.7) : (1.5) : (2.9) : 19 Breast-cancer (4.4) : (3.7) : (4.1) : (4.6) : (4.6) : (4.5) : (3.4) : 34 wisconsin Pid 72.7 (2.2) : (2.4) : (2.4) : (3.0) : (2.2) : (3.9) : (3.4) : 10 Anneal 89.2 (1.0) : (1.4) : (0.2) : (4.1) : (3.6) : (1.1) : (2.6) : 45 Vehicle 70.7 (2.8) : (3.6) : (0.9) : (2.3) : (0.9) : (2.3) : (0.9) : 104 Ttt 84.6 (3.4) : (1.9) : (3.9) : (1.2) : (1.2) : (1.2) : (1.2) : 29 Vowel-context 73.5 (1.7) : (1.4) : (2.4) : (1.1) : ( 2.5) : (2.9) : (2.6) : 183 German 71.7 (2.7) : (5.4) : (2.7) : (3.5) : (3.5) : (2.9) : (4.7) : 57 Cmc 49.3 (3.0) : (1.0) : (2.9) : (4.1) : (4.1) : (3.8) : (4.1) : 97 Car 90.0 (1.0) : (1.1) : (1.1) : (1.2) : (1.2) : (1.2) : (1.2) : 22 Segmentation 95.4 (1.2) : (1.1) : (0.9) : (0.9) : (0.9) : (1.0) : (1.6) : 66 Splice 93.4 (0.1) : (0.6) : (0.4) : (0.2) : (0.3) : (0.3) : (0.2) : 252 Krvskp 99.2 (0.4) : (0.6) : (0.8) : (1.7) : (1.7) : (1.7) : (1.0) : 21 Sick 97.7 (0.6) : (0.4) : (0.2) : (0.6) : (0.6) : (0.5) : (1.0) : 1 Mushroom (0.0) : (0.5) : (0.5) : (0.0) : (0.1) : (0.0) : (0.0) : 42 Nursery 96.1 (0.4) : (0.4) : (0.4) : (0.1) : (0.1) : (0.1) : (0.1) : 44 Mean 82.7 (-) : (-) : (-) : (-) : (-) : (-) : (-) : IEEE INTELLIGENT SYSTEMS

10 can t be ideally effective without parameter tuning. Our experimental data was particularly comprehensive: 36 data sets, from different real-world and artificial domains, containing categorical as well as numeric features, with 150 to 10,000+ instances and 5 to 61 features. Although no parameter settings can be universally optimal, the results reported here might have some degree of generality and thus might be useful in practice. Acknowledgments We are grateful to Hill Zhu for suggesting useful references for this article. References 1. D. Koller and M. Sahami, Toward Optimal Feature Selection, Proc. 13th Int l Conf. Machine Learning (ICML 96), Morgan Kaufmann, 1996, pp R. Kohavi and G.H. John, Wrappers for Feature Subset Selection, Artificial Intelligence, special issue on relevance, vol. 97, nos. 1 2, 1996, pp X. Wu and D. Urpani, Induction by Attribute Elimination, IEEE Trans. Knowledge and Data Eng., vol. 11, no. 5, 1999, pp A. Blum and P. Langley, Selection of Relevant Features and Examples in Machine Learning, Artificial Intelligence, vol. 97, nos. 1 2, 1997, pp M. Dash and H. Liu, Feature Selection for Classification, Intelligent Data Analysis, vol. 1, no. 3, 1997, pp G.H. John, R. Kohavi, and K. Pfleger, Irrelevant Features and the Subset Selection Problem, Proc. 11th Int l Conf. Machine Learning, Morgan Kaufmann, 1994, pp P. Langley and S. Sage, Induction of Selective Bayesian Classifiers, Proc. 10th Conf. Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1994, pp P. Langley and S. Sage, Oblivious Decision Trees and Abstract Cases, Working Notes AAAI-94 Workshop on Case-Based Reasoning, AAAI Press, 1994, pp M.J. Pazzani, Searching for Dependencies in Bayesian Classifiers, Proc. 5th Int l Workshop Artificial Intelligence and Statistics, Springer-Verlag, 1996, pp G.M. Provan and M. Singh, Learning Bayesian Networks Using Feature Selection, Proc. 5th Int l Workshop Artificial Intelligence and Statistics, Springer-Verlag, 1995, pp M. Singh and G.M. Provan, A Comparison of Induction Algorithms for Selective and Non-Selective Bayesian Classifiers, Proc. 12th Int l Conf. Machine Learning, Morgan Kaufmann, 1995, pp W.N. Street, O.L. Mangasarian, and W.H. Wolberg, An Inductive Learning Approach to Prognostic Prediction, Proc. 12th Int l Conf. Machine Learning,Morgan Kaufmann, 1995, pp R. Kohavi, A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Proc. 14th Int l Joint Conf. Artificial Intelligence (IJCAI), Morgan Kaufmann, 1995, pp R. Kohavi and F. Provost, Glossary of Terms, special issue on applications of machine learning and the knowledge discovery process, Machine Learning, vol. 30, 1998, pp B. Efron, Estimating the Error Rate of a Prediction Rule: Improvement on Cross- Validation, J. Am. Statistical Assoc., vol. 78, no. 382, 1983, pp B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases, Dept. of Information and Computer Science, Univ. of California, Irvine, 1998, ~mlearn/mlrepository.html. 18. G.W. Snedecor and W. Cochran, Statistical Methods, Iowa State Univ. Press, J.R. Quinlan, Induction of Decision Trees, Machine Learning, vol. 1, 1986, pp K. Kira and L. Rendell, The Feature Selection Problem: Traditional Methods and a New Algorithm, Proc. 10th Nat l Conf. Artificial Intelligence, 1992, AAAI Press, pp I. Kononenko, Estimating Attributes: Analysis and Extensions of RELIEF, Proc. European Conf. Machine Learning, Springer- Verlag, 1994, pp C. Cardie, Using Decision Trees to Improve Case-Based Learning, Proc. 10th Int l Conf. Machine Learning, Morgan Kaufmann, 1993, pp Z. Michalewicz and D.B. Fogel, How to Solve It: Modern Heuristics, Springer-Verlag, T h e A u t h o r s Ying Yang is a postdoctoral research associate in the Department of Computer Science at the University of Vermont. Her research interests are in machine learning and data mining. She received her PhD in computer science from Monash University, Australia. Contact her at the Dept. of Computer Science, 343 Votey Bldg., Univ. of Vermont, Burlington, VT 05405; yyang@emba.uvm.edu. Xindong Wu is a professor in and the chair of the Department of Computer Science at the University of Vermont. He received his PhD in artificial intelligence from the University of Edinburgh. He is the executive editor of Knowledge and Information Systems, chair of the Steering Committee of the IEEE International Conference on Data Mining, a series editor of the Springer book series on advanced information and knowledge processing, and chair of the IEEE Computer Society Technical Committee on Computational Intelligence. Contact him at the Dept. of Computer Science, 351 Votey Bldg., Univ. of Vermont, Burlington, VT 05405; xwu@emba.uvm.edu. For more information on this or any other computing topic, please visit our Digital Library at IEEE UPCOMING ISSUES: Persistent Software Attributes Return on Investment in the Software Industry New Trends in Process Improvement VISIT US AT MARCH/APRIL

SSV Criterion Based Discretization for Naive Bayes Classifiers

SSV Criterion Based Discretization for Naive Bayes Classifiers SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,

More information

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Remco R. Bouckaert 1,2 and Eibe Frank 2 1 Xtal Mountain Information Technology 215 Three Oaks Drive, Dairy Flat, Auckland,

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Non-Disjoint Discretization for Naive-Bayes Classifiers

Non-Disjoint Discretization for Naive-Bayes Classifiers Non-Disjoint Discretization for Naive-Bayes Classifiers Ying Yang Geoffrey I. Webb School of Computing and Mathematics, Deakin University, Vic3217, Australia Abstract Previous discretization techniques

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (  1 Cluster Based Speed and Effective Feature Extraction for Efficient Search Engine Manjuparkavi A 1, Arokiamuthu M 2 1 PG Scholar, Computer Science, Dr. Pauls Engineering College, Villupuram, India 2 Assistant

More information

Weighted Proportional k-interval Discretization for Naive-Bayes Classifiers

Weighted Proportional k-interval Discretization for Naive-Bayes Classifiers Weighted Proportional k-interval Discretization for Naive-Bayes Classifiers Ying Yang & Geoffrey I. Webb yyang, geoff.webb@csse.monash.edu.au School of Computer Science and Software Engineering Monash

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology

Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology From: KDD-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology Ron Kohavi and Dan Sommerfield

More information

Feature-weighted k-nearest Neighbor Classifier

Feature-weighted k-nearest Neighbor Classifier Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka

More information

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey

More information

Unsupervised Discretization using Tree-based Density Estimation

Unsupervised Discretization using Tree-based Density Estimation Unsupervised Discretization using Tree-based Density Estimation Gabi Schmidberger and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {gabi, eibe}@cs.waikato.ac.nz

More information

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu

More information

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA 1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998

More information

Forward Feature Selection Using Residual Mutual Information

Forward Feature Selection Using Residual Mutual Information Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics

More information

Discretizing Continuous Attributes Using Information Theory

Discretizing Continuous Attributes Using Information Theory Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification

More information

Reihe Informatik 10/2001. Efficient Feature Subset Selection for Support Vector Machines. Matthias Heiler, Daniel Cremers, Christoph Schnörr

Reihe Informatik 10/2001. Efficient Feature Subset Selection for Support Vector Machines. Matthias Heiler, Daniel Cremers, Christoph Schnörr Computer Vision, Graphics, and Pattern Recognition Group Department of Mathematics and Computer Science University of Mannheim D-68131 Mannheim, Germany Reihe Informatik 10/2001 Efficient Feature Subset

More information

Locally Weighted Naive Bayes

Locally Weighted Naive Bayes Locally Weighted Naive Bayes Eibe Frank, Mark Hall, and Bernhard Pfahringer Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall, bernhard}@cs.waikato.ac.nz Abstract

More information

Cost-sensitive C4.5 with post-pruning and competition

Cost-sensitive C4.5 with post-pruning and competition Cost-sensitive C4.5 with post-pruning and competition Zilong Xu, Fan Min, William Zhu Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363, China Abstract Decision tree is an effective

More information

A Fast Decision Tree Learning Algorithm

A Fast Decision Tree Learning Algorithm A Fast Decision Tree Learning Algorithm Jiang Su and Harry Zhang Faculty of Computer Science University of New Brunswick, NB, Canada, E3B 5A3 {jiang.su, hzhang}@unb.ca Abstract There is growing interest

More information

Sample 1. Dataset Distribution F Sample 2. Real world Distribution F. Sample k

Sample 1. Dataset Distribution F Sample 2. Real world Distribution F. Sample k can not be emphasized enough that no claim whatsoever is It made in this paper that all algorithms are equivalent in being in the real world. In particular, no claim is being made practice, one should

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

MANY factors affect the success of data mining algorithms

MANY factors affect the success of data mining algorithms IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 6, NOVEMBER/DECEMBER 2003 1437 Benchmarking Attribute Selection Techniques for Discrete Class Data Mining Mark A. Hall and Geoffrey Holmes

More information

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Feature Selection Based on Relative Attribute Dependency: An Experimental Study Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez

More information

Constructing X-of-N Attributes with a Genetic Algorithm

Constructing X-of-N Attributes with a Genetic Algorithm Constructing X-of-N Attributes with a Genetic Algorithm Otavio Larsen 1 Alex Freitas 2 Julio C. Nievola 1 1 Postgraduate Program in Applied Computer Science 2 Computing Laboratory Pontificia Universidade

More information

Decision Tree Grafting From the All-Tests-But-One Partition

Decision Tree Grafting From the All-Tests-But-One Partition Proc. Sixteenth Int. Joint Conf. on Artificial Intelligence, Morgan Kaufmann, San Francisco, CA, pp. 702-707. Decision Tree Grafting From the All-Tests-But-One Partition Geoffrey I. Webb School of Computing

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

Feature Selection with Decision Tree Criterion

Feature Selection with Decision Tree Criterion Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl

More information

Targeting Business Users with Decision Table Classifiers

Targeting Business Users with Decision Table Classifiers From: KDD-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Targeting Business Users with Decision Table Classifiers Ron Kohavi and Daniel Sommerfield Data Mining and Visualization

More information

Feature Selection with Adjustable Criteria

Feature Selection with Adjustable Criteria Feature Selection with Adjustable Criteria J.T. Yao M. Zhang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: jtyao@cs.uregina.ca Abstract. We present a

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

Proportional k-interval Discretization for Naive-Bayes Classifiers

Proportional k-interval Discretization for Naive-Bayes Classifiers Proportional k-interval Discretization for Naive-Bayes Classifiers Ying Yang and Geoffrey I. Webb School of Computing and Mathematics, Deakin University, Vic3217, Australia Abstract. This paper argues

More information

Comparing Univariate and Multivariate Decision Trees *

Comparing Univariate and Multivariate Decision Trees * Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Combining Cross-Validation and Confidence to Measure Fitness

Combining Cross-Validation and Confidence to Measure Fitness Combining Cross-Validation and Confidence to Measure Fitness D. Randall Wilson Tony R. Martinez fonix corporation Brigham Young University WilsonR@fonix.com martinez@cs.byu.edu Abstract Neural network

More information

Feature Subset Selection as Search with Probabilistic Estimates Ron Kohavi

Feature Subset Selection as Search with Probabilistic Estimates Ron Kohavi From: AAAI Technical Report FS-94-02. Compilation copyright 1994, AAAI (www.aaai.org). All rights reserved. Feature Subset Selection as Search with Probabilistic Estimates Ron Kohavi Computer Science Department

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

Dimensionality Reduction, including by Feature Selection.

Dimensionality Reduction, including by Feature Selection. Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain

More information

Machine Learning. Cross Validation

Machine Learning. Cross Validation Machine Learning Cross Validation Cross Validation Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA S. DeepaLakshmi 1 and T. Velmurugan 2 1 Bharathiar University, Coimbatore, India 2 Department of Computer Science, D. G. Vaishnav College,

More information

The Role of Biomedical Dataset in Classification

The Role of Biomedical Dataset in Classification The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences

More information

A Survey of Distance Metrics for Nominal Attributes

A Survey of Distance Metrics for Nominal Attributes 1262 JOURNAL OF SOFTWARE, VOL. 5, NO. 11, NOVEMBER 2010 A Survey of Distance Metrics for Nominal Attributes Chaoqun Li and Hongwei Li Department of Mathematics, China University of Geosciences, Wuhan,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets

A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets A Wrapper for Reweighting Training Instances for Handling Imbalanced Data Sets M. Karagiannopoulos, D. Anyfantis, S. Kotsiantis and P. Pintelas Educational Software Development Laboratory Department of

More information

Chapter 8 The C 4.5*stat algorithm

Chapter 8 The C 4.5*stat algorithm 109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and

More information

Attribute Selection with a Multiobjective Genetic Algorithm

Attribute Selection with a Multiobjective Genetic Algorithm Attribute Selection with a Multiobjective Genetic Algorithm Gisele L. Pappa, Alex A. Freitas, Celso A.A. Kaestner Pontifícia Universidade Catolica do Parana (PUCPR), Postgraduated Program in Applied Computer

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

FEATURE SELECTION BASED ON INFORMATION THEORY, CONSISTENCY AND SEPARABILITY INDICES.

FEATURE SELECTION BASED ON INFORMATION THEORY, CONSISTENCY AND SEPARABILITY INDICES. FEATURE SELECTION BASED ON INFORMATION THEORY, CONSISTENCY AND SEPARABILITY INDICES. Włodzisław Duch 1, Krzysztof Grąbczewski 1, Tomasz Winiarski 1, Jacek Biesiada 2, Adam Kachel 2 1 Dept. of Informatics,

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Speeding up Logistic Model Tree Induction

Speeding up Logistic Model Tree Induction Speeding up Logistic Model Tree Induction Marc Sumner 1,2,EibeFrank 2,andMarkHall 2 Institute for Computer Science University of Freiburg Freiburg, Germany sumner@informatik.uni-freiburg.de Department

More information

Chapter 12 Feature Selection

Chapter 12 Feature Selection Chapter 12 Feature Selection Xiaogang Su Department of Statistics University of Central Florida - 1 - Outline Why Feature Selection? Categorization of Feature Selection Methods Filter Methods Wrapper Methods

More information

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM XIAO-DONG ZENG, SAM CHAO, FAI WONG Faculty of Science and Technology, University of Macau, Macau, China E-MAIL: ma96506@umac.mo, lidiasc@umac.mo,

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

Dynamic Ensemble Construction via Heuristic Optimization

Dynamic Ensemble Construction via Heuristic Optimization Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple

More information

Using Decision Trees and Soft Labeling to Filter Mislabeled Data. Abstract

Using Decision Trees and Soft Labeling to Filter Mislabeled Data. Abstract Using Decision Trees and Soft Labeling to Filter Mislabeled Data Xinchuan Zeng and Tony Martinez Department of Computer Science Brigham Young University, Provo, UT 84602 E-Mail: zengx@axon.cs.byu.edu,

More information

Competence-guided Editing Methods for Lazy Learning

Competence-guided Editing Methods for Lazy Learning Competence-guided Editing Methods for Lazy Learning Elizabeth McKenna and Barry Smyth Abstract. Lazy learning algorithms retain their raw training examples and defer all example-processing until problem

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

How do we obtain reliable estimates of performance measures?

How do we obtain reliable estimates of performance measures? How do we obtain reliable estimates of performance measures? 1 Estimating Model Performance How do we estimate performance measures? Error on training data? Also called resubstitution error. Not a good

More information

Univariate Margin Tree

Univariate Margin Tree Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,

More information

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens Faculty of Economics and Applied Economics Trimmed bagging a Christophe Croux, Kristel Joossens and Aurélie Lemmens DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 0721 Trimmed Bagging

More information

Handling Missing Values via Decomposition of the Conditioned Set

Handling Missing Values via Decomposition of the Conditioned Set Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,

More information

Weighting and selection of features.

Weighting and selection of features. Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer

More information

Feature Selection Algorithm with Discretization and PSO Search Methods for Continuous Attributes

Feature Selection Algorithm with Discretization and PSO Search Methods for Continuous Attributes Feature Selection Algorithm with Discretization and PSO Search Methods for Continuous Attributes Madhu.G 1, Rajinikanth.T.V 2, Govardhan.A 3 1 Dept of Information Technology, VNRVJIET, Hyderabad-90, INDIA,

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Efficiently Handling Feature Redundancy in High-Dimensional Data

Efficiently Handling Feature Redundancy in High-Dimensional Data Efficiently Handling Feature Redundancy in High-Dimensional Data Lei Yu Department of Computer Science & Engineering Arizona State University Tempe, AZ 85287-5406 leiyu@asu.edu Huan Liu Department of Computer

More information

Logistic Model Tree With Modified AIC

Logistic Model Tree With Modified AIC Logistic Model Tree With Modified AIC Mitesh J. Thakkar Neha J. Thakkar Dr. J.S.Shah Student of M.E.I.T. Asst.Prof.Computer Dept. Prof.&Head Computer Dept. S.S.Engineering College, Indus Engineering College

More information

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy

More information

Comparing Case-Based Bayesian Network and Recursive Bayesian Multi-net Classifiers

Comparing Case-Based Bayesian Network and Recursive Bayesian Multi-net Classifiers Comparing Case-Based Bayesian Network and Recursive Bayesian Multi-net Classifiers Eugene Santos Dept. of Computer Science and Engineering University of Connecticut Storrs, CT 06268, USA FAX:(860)486-1273

More information

A Selective Sampling Approach to Active Feature Selection

A Selective Sampling Approach to Active Feature Selection A Selective Sampling Approach to Active Feature Selection Huan Liu 1, Hiroshi Motoda 2, Lei Yu 1 1 Department of Computer Science & Engineering Arizona State University, Tempe, AZ 85287-8809, USA {hliu,leiyu}@asu.edu

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Using Pairs of Data-Points to Define Splits for Decision Trees

Using Pairs of Data-Points to Define Splits for Decision Trees Using Pairs of Data-Points to Define Splits for Decision Trees Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Ontario, M5S la4, Canada hinton@cs.toronto.edu Michael Revow

More information

Filter methods for feature selection. A comparative study

Filter methods for feature selection. A comparative study Filter methods for feature selection. A comparative study Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, and María Tombilla-Sanromán University of A Coruña, Department of Computer Science, 15071 A Coruña,

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

ASSOCIATIVE CLASSIFICATION WITH KNN

ASSOCIATIVE CLASSIFICATION WITH KNN ASSOCIATIVE CLASSIFICATION WITH ZAIXIANG HUANG, ZHONGMEI ZHOU, TIANZHONG HE Department of Computer Science and Engineering, Zhangzhou Normal University, Zhangzhou 363000, China E-mail: huangzaixiang@126.com

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Contribution of Boosting in Wrapper Models

Contribution of Boosting in Wrapper Models Marc Sebban, Richard Nock TRIVIA, West Indies and Guiana University Campus de Fouillole, 95159 - Pointe à Pitre (France) {msebban,rnock}@univ-ag.fr Abstract. We describe a new way to deal with feature

More information

An Analysis of Applicability of Genetic Algorithms for Selecting Attributes and Examples for the Nearest Neighbour Classifier

An Analysis of Applicability of Genetic Algorithms for Selecting Attributes and Examples for the Nearest Neighbour Classifier BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 7, No 2 Sofia 2007 An Analysis of Applicability of Genetic Algorithms for Selecting Attributes and Examples for the Nearest

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Bing Liu, Minqing Hu and Wynne Hsu

Bing Liu, Minqing Hu and Wynne Hsu From: AAAI- Proceedings. Copyright, AAAI (www.aaai.org). All rights reserved. Intuitive Representation of Decision Trees Using General Rules and Exceptions Bing Liu, Minqing Hu and Wynne Hsu School of

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Feature Selection Filters Based on the Permutation Test

Feature Selection Filters Based on the Permutation Test Feature Selection Filters Based on the Permutation Test Predrag Radivoac, Zoran Obradovic 2, A. Keith Dunker, Slobodan Vucetic 2 Center for Computational Biology and Bioinformatics, Indiana University,

More information

C2FS: An Algorithm for Feature Selection in Cascade Neural Networks

C2FS: An Algorithm for Feature Selection in Cascade Neural Networks C2FS: An Algorithm for Feature Selection in Cascade Neural Networks Lars Backstrom Computer Science Cornell University lb87@cornell.edu Rich Caruana Computer Science Cornell University caruana@cs.cornell.edu

More information