DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

Size: px

Start display at page:

Download "DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES"

Meagan Darlene Hardy
5 years ago
Views:

1 EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset selection algorithms, data discretisers, data transforms along with various classifiers are reported. The result obtained at the end of the evaluation is an expert system or machine learning model using the most preferred feature subset selection algorithm, data discretiser or data transform along with the classifier, using statistical. 96

2 CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES 6.1 INTRODUCTION Misfire detection using the signals acquired from the engine under different operating conditions has to undergo a few processes before extraction of relevant information for classifier training. The fault descriptors basically contain a set of parameters computed from the signal, called which is used for building the machine learning model capable of fault identification. A wide variety of can be extracted from the acquired signals, but the suitability of the feature set has to be evaluated thoroughly for building the model. Since signature or signal from each device is unique, a comprehensive decision on the type of feature to be used is often not possible without evaluating a few possibilities. The simplest and easiest feature with respect to computational load is often chosen as the first choice. In this section, the use of statistical as the basic building block for the machine learning model is evaluated in combination with various data pre processing techniques discussed in Chapter 3 and different learning algorithms presented in Chapter 4. The main focus is to build a misfire detection system by the synthesis of the best performing: a) discretisation technique, b) feature subset selection and/or feature transform and c) machine learning algorithm. This combination has to be evaluated in detail at each and every stage before crystallising the final machine learning model with an aim to develop a robust system capable of consistent high performance with inherent tolerance to variation in noise and signal conditions. The statistical analysis of the acquired signal yields a set of which are used for fault diagnosis as described in Section The next logical step after feature extraction, refer to Figure 3.6 of Chapter 3 is discretisation of data followed by data transformation and/or feature subset selection. Building and evaluating the model after each process with an aim to identify the possibility of achieving good model performance by eliminating any one or all data pre-processing steps are attempted. Finally combinations of a) data 97

3 discretisation and feature subset selection, and b) data discretisation and data transformation techniques are also evaluated. these probable combinations are built into various machine learning models discussed in Section 3.4 and evaluated. 6.2 STATISTICAL FEATURES The eleven statistical extracted from the engine block vibration signals were selected as a basis for the study. The are mean, standard error, median, standard deviation, sample variance, kurtosis, skewness, range, minimum, maximum, and sum. The important logic behind data preprocessing apart from noise and outliers detection is that, all the may not contain exclusive information required for machine learning. It is generally observed that some may yield more information than others and a few can be highly correlated, containing similar information. The correlated data do not have additional information for building the classifier and is considered an overload due to repetition of the same information in more than one feature. The process of identifying and selecting good which reveal more information for classification is called feature subset selection. This process is usually preceded by dimensionality reduction where the volume of information is reduced by granulating or aggregating data for ease of computing and is explained in Section 3.4. This process is analogues to palletisation or bundling of components for improving data processing efficiency. 6.3 ESTIMATION OF THE PREFERRED FSS TECHNIQUE In this work decision tree based FSS and FSS are chosen for building the classifier model. A detailed evaluation to identify the feature subset with minimum number of has to be performed to validate the FSS procedure and to ensure a classification model with minimum complexity Effect of number of A decision tree based FSS and FSS are inducted as data preprocessing techniques in building the classifier model. The effect of number of on classification efficiency is investigated using these techniques. The feature subset 98

4 identified through this process will be used for developing and evaluating the various algorithm based models. The significant part of the decision tree in Figure 6.1 obtained by using all the show that the root node is sample variance which is the feature with maximum discriminating capacity closely followed by standard deviation and standard error. Percolating down the tree branches, it is evident that the list continues with skewness, range, kurtosis, minimum, mean and median. The statistical, sum and maximum do not find a place in the tree, indicating that they do not have any additional discriminating power to augment the classifier. The decision tree is capable of listing the in the order of importance for use in a classifier but it is not capable of suggesting a crisp subset of or the minimum number of that would be most suitable for building the classifier. Hence optimising the number of discriminant ( which have information for classification) is essential. This process is accomplished by evaluating the classifier performance starting with the root node, cumulatively adding in their order of importance and evaluating it at each stage of feature addition. A similar exercise has to be followed for CFS also. The application of CFS evaluator recommends the following seven statistical : standard error, standard deviation, kurtosis, skewness, range, minimum and maximum. The CFS is capable of identifying based on correlation avoidance, but is not capable of listing the based on their importance. One proposed method for validating the recommendation of CFS is by building a classification tree using the decision tree and lists from the root node in the descending order of importance which is as described in the previous paragraph and the significant part of it is represented by Figure 6.2. It is noted that CFS validation requires the assistance of an additional algorithm if evaluation has to be done thoroughly. In this work decision tree algorithm is used for feature ranking. The listed in descending order of their importance are standard deviation at the root node followed by standard error, skewness, range, kurtosis, minimum and maximum. 99

5 Figure 6.1 tree with all considered 100

6 Figure 6.2 tree with CFS identified 101

7 6.3.2 Effect of number of on classification efficiency The CFS evaluation suggests a subset of seven while decision tree identified nine as reported in Section An analysis on the number of on classification efficiency was performed and reported here. The are taken in the order of importance from one feature to the maximum number of and their cumulative classification efficiencies using decision tree and random forest are calculated. The decision tree is a benchmark classification algorithm generally used to compare results as demonstrated by Hall (Hall 2000). forest in an extension of decision trees and hence considered for the evaluation of FSS methodology. the eleven statistical are given as input for identifying the feature subset containing minimum number of without appreciable loss in classification efficiency. CFS subset evaluation and decision tree give an orderly representation of the showing their relative importance in classification. The learning algorithm was evaluated for classification efficiency using the most prominent single feature and the result was recorded. Additional in the order of importance were added to the set one by one and at every stage the cumulative classification efficiency of the selected were recorded. The number of in the subset can be decided based on the two alternatives: a. the feature subset that offers maximum classification efficiency b. the feature subset that achieves very close to maximum classification efficiency with minimal number of and has minimum computational complexity. The second option is a good alternative where serious deviations in performance is avoided and the system will be more robust since the model over fitting all the available data is avoided due to the use of minimum number of. An added advantage is that the model has reduced computational load thereby saving on system resources requirement during implementation. Variations in classification efficiency with number of in a subset, using decision tree and FSS are presented in Figure 6.3 a and 6.3 b respectively. 102

8 Classification accuracy % SV SD SE SKE RAN KUR MIN MED MEN ALL Series Number of considered (cumulative) Figure 6.3 a) FSS using decision tree Classification accuracy % SD SE SKE RAN KUR MIN MAX Series Number of considered (cumulative) Figure 6.3 b) FSS using CFS Figure 6.3 Classification efficiency of statistical feature subset using tree Evaluating the effectiveness of FSS is done using the decision tree as classifier in the first phase. Figure 6.3a depicts FSS done using decision tree for both identification of the best FSS and classification. Analysis of the graph shows that the use of first five (as ranked by decision tree) gives the maximum classification efficiency of 89.1% and increasing or decreasing the number of in the subset becomes progressively counterproductive. The classifier is never able to achieve the peak performance of 89.1% when more than or less than five are used. A similar observation on Figure 6.3b representing the FSS using CFS and ranked by decision tree shows that the classifier performance peaks to 89.2% when the first four identified by CFS and ranked by 103

9 decision tree are used. Any decrease in the number of drastically reduces the classifier performance and increase in marginally reduces the classifier performance from 89.2% to 88.4%. The use of all the yields 88.5% only. forest, a tree based ensemble is used to check for consistency in the predicted results. Figure 6.4a represents the forest evaluation of FSS using decision tree algorithm. The figure shows that considering the first four or five returns a maximum classification accuracy of 87.9% and any increase in the number of induces oscillations in the classification accuracy but never reaches the peak performance. Any reduction below four is marked by a sharp decrease in classification accuracy from 87.9% to a flat 78%. The overall results tally well with the findings obtained from FSS evaluation using decision tree which is presented in Figure 6.3a. A similar observation can be made on FSS done using CFS and evaluated by using forest. The results presented in Figure 6.4b clearly predict the FSS with the first four to have a maximum performance of 88.2% and oscillates between 86.5% and 87.8% when increased beyond four. Any reduction is equally unfavorable with the classifier returning an 87.1% for three and a flat 78% for two and below. Comparing the peak classification accuracies obtained from both the classifiers, it is clearly evident that FSS using CFS is the best tool. The first four namely standard deviation, standard error, skewness and range form the best possible feature subset for use in any classifier. Classification accuracy % SV SD SE SKE RAN KUR MIN MED MEN ALL Series Number of considered (cumulative) Figure 6.4 a) FSS using decision tree for use in forest 104

10 Classification accuracy % SD SE SKE RAN KUR MIN MAX Series Number of considered (cumulative) Figure 6.4 b) FSS using CFS Figure 6.4 Classification efficiency of statistical feature subset using forest 6.4 MODEL BUILDING AND EVALUATION The machine learning model development involves two-phases called training and testing. Training is the process where the classifier learns to classify the faults based on the supplied training samples. In event identification like misfire detection, the supervised form of training is followed where the feature with the class it represents is provided to the learning algorithm. The important point to be considered here is that the model has to be evaluated with each and every data preprocessing technique and their possible combinations. The process of testing is to check how well the classifier has learnt to label the unseen samples. The summarized testing results of all the classifiers used here are presented in the form of a square matrix called confusion matrix. the confusion matrices depicted are for results with CFS and Konenenko discretisation. The interpretation of the confusion matrix is as follows: referring to Table 6.5, attention is focused on the last row for explanation since it has a higher spread of misclassifications in to other conditions. The last row of the confusion matrix represents misfire in cylinder four. The first element in the last row, i.e. location (5,1), 0 represents the number of data points that belong to the good condition and have been misclassified incorrectly as good. The second element in the last row i.e. location (5,2), 11 depicts as to how many of the misfire in cylinder four condition have been misclassified as misfire in cylinder one. The third element represents the number of 105

11 data points that has been misclassified as misfire in cylinder two. The fourth element in the first row i.e. location (5,4), 32 depicts as to how many of the misfire in cylinder four condition have been misclassified as misfire in cylinder three. The last element in the last row i.e. location (5,5), represents the number of data points that has been correctly classified as misfire in cylinder four. Similarly the second row represents the misfire in cylinder one condition. The second element in second row represents the correctly classified instances for misfire in cylinder one condition and rest of them are misclassified details as explained earlier. Similar interpretation can be given for other elements as well. To summarize, the diagonal elements shown in the confusion matrix represents the correctly classified points and non-diagonal elements are misclassified ones. The evaluation results of various classifiers with a diverse set of data pre-processing techniques are presented in the following sub sections. The process of detecting misfire is of paramount importance than identifying exactly in which cylinder it had occurred. Hence all the classifiers were also evaluated in a two class mode where misfire in any cylinder is considered as one class and no misfire as another class. However the overall focus is to design a system which is capable of identifying misfire and accurately determining the cylinder in which it has occurred. This will enable a faster fault detection and rectification for the automobile tree Algorithm tree algorithm is a versatile classifier and is also capable of identifying for FSS. The classifier parameters and classification results of tree algorithm, using various data preprocessing techniques and FSS options are presented in Tables 6.1 to 6.5. The effectiveness of using MDL correction to the rules generated by decision tree is also evaluated under the above mentioned conditions. Comparing Tables 6.2 and 6.3 it is evident that large variations in classification accuracy is not induced by MDL but the benefit of achieving the same classification accuracy with rules pruned to the minimum possible size is advantageous. MDL helps in formulating a more generalized rule thus making the model robust from misclassifications due to the 106

12 effects of minor variations in engine vibration signature. The main advantage of MDL is that the rules are shortened to the minimum level possible and hence model over fitting the data is avoided. The evaluation in two class mode is done using MDL enabled decision tree algorithm and the results are presented in Table 6.4. The confusion matrix showing the misclassification details is presented in Table 6.5 which shows that misclassification among good and misfire is nil since all entries in the first row and column except (1,1) are zero. Table 6.1 Classifier parameters for decision tree Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 0.5 s Total Number of Instances 1000 Correctly Classified Instances 892 Incorrectly Classified Instances 108 Mean absolute error 0.06 Root mean squared error MDL correction Incorporated Number of leaves 98 Size of the tree 107 Table 6.2 tree results in multiclass mode without MDL Statistical No pre Discretisation Entropy based processing using 10 bins discretisation FSS tree FSS

13 Table 6.3 tree results in multiclass mode with MDL Statistical No pre Discretisation Entropy based processing using 10 bins discretisation FSS tree FSS Table 6.4 tree results in two class mode with MDL Statistical No pre Discretisation Entropy based processing using 10 bins discretisation FSS tree FSS Table 6.5 tree confusion matrix Good C1m C2m C3m C4m Good C1m C2m C3m C4m 108

14 From the values tabulated in Table 6.4 it is inferred that none of the data transforms could produce 100% misfire detection hence their use in misfire detection with decision tree is not favorable. It is observed that EWD and entropy based discretisation achieves 100% classification accuracy in two class mode. From the values tabulated in Table 6.3, it is observed that entropy based discretisation achieves maximum multi-class classification accuracy and from among them, Konenenko based discretiser with CFS has the maximum classification accuracy of 89.2%. Additionally, the classification results do not change with or without MDL, indicating that the system is capable of learning a reduced rule set even without MDL implementation forest The forest algorithm uses multiple decision trees with a voting system to build the classification model. The classifier parameters and classification results using various data preprocessing techniques and FSS options are presented in Tables 6.6 to 6.9. The optimum number of trees to be used for model building is presented in Figure 6.5. The optimum number of trees to be used for building the tree cluster based classifier has to be evaluated to achieve the maximum possible results. The number of trees is varied from 1 to 15 and the results are recorded. The Konenenko discretised data set is used for the analysis. From the results presented in Figure 6.5, it is evident that a maximum classification accuracy of 89.6 is achieved when the number of trees used is 10. Table 6.6 Classifier parameters for forest Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 1.1 s Total Number of Instances 1000 Correctly Classified Instances 896 Incorrectly Classified Instances 104 Mean absolute error Root mean squared error Number of trees used

15 Classification accuracy % Series Number of trees used Figure 6.5 Number of trees Vs classification accuracy Table 6.7 forest results in multiclass mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation Table 6.8 forest results in two class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation

16 From the values tabulated in Table 6.8 it is inferred that none of the data transforms could produce 100% misfire detection hence their use in misfire detection with forest is not favorable. It is observed that only entropy based discretisation achieves 100% classification accuracy in two class mode. From the values tabulated in Table 6.7, it is observed that using all with discretisation achieves a maximum of 90.6% followed by CFS with Konenenko based discretiser with a maximum classification accuracy of 89.6%. Table 6.9 forest confusion matrix Good C1m C2m C3m C4m Good C1m C2m C3m C4m The use of FSS and discretiser enhances robustness of the model by avoiding data over fitting, which is mandatory for ensuring the future operability of the model under slightly varying signal conditions. Hence a model using CFS with Konenenko based discretiser performing with 89.6% is preferred. The misclassification details are comparable to the one obtained with decision tree and is presented in Table Fuzzy classifier (Furia and FRRC) The classifier parameters and classification results of fuzzy classifiers are presented in Tables 6.10 to The effectiveness of using various data preprocessing techniques is evaluated with two different fuzzy formulations, FURIA and FRRC. The misclassification details are presented in Table The FURIA classification results presented in Tables 6.11 and 6.12 clearly portrays that the use of FURIA with any data model is not a suitable choice since the classification accuracy in two class mode never reaches 100%, which is mandatory for the system to be of any practical use. Hence further analysis of the results is not done. 111

17 Table 6.10 Classifier parameters for FURIA Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 16 s Total Number of Instances 1000 Correctly Classified Instances 885 Incorrectly Classified Instances 115 Mean absolute error Root mean squared error Number of rules generated 18 Table 6.11FURIA results in multiclass mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation Table 6.12 FURIA results in two class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation

18 Table 6.13 FURIA confusion matrix (with CFS + transform) Good C1m C2m C3m C4m Good C1m C2m C3m C4m The evaluation of FRRC is presented in Tables 6.14 to The misclassification details are presented in Table As observed earlier, any higher performance with all is consciously avoided to negate model over-fitting the data. Table 6.14 Classifier parameters for FRRC Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 7.5 s Total Number of Instances 1000 Correctly Classified Instances 890 Incorrectly Classified Instances 110 Mean absolute error Root mean squared error Number of rules 83 Table 6.15 FRRC results in multi class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation

19 Table 6.16 FRRC results in two class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation Table 6.17 FRRC confusion matrix Good C1m C2m C3m C4m Good C1m C2m C3m C4m From the values tabulated in Tables 6.12 and 6.16, it is inferred that none of the data transforms could produce 100% misfire detection in two class mode hence their use in misfire detection with both the fuzzy systems is not favorable. The model performance with FRRC presented in table 6.15 is similar to that of FURIA with respect to multi-class mode reaching a maximum of 89% but with two class mode it achieves the mandatory 100% when CFS with entropy based discretisation is used for data preprocessing Naïve Bayes classifier Naïve Bayes classifier makes use of conditional probability for its classification. Its classification performance with statistical is presented in Tables 6.18 to Table 6.18 shows the test parameters and classification efficiency, while the details of misclassifications are presented in Table 6.21 as confusion matrix. 114

20 Table 6.18 Classifier parameters for Naïve Bayes classifier Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 0.2 s Total Number of Instances 1000 Correctly Classified Instances 846 Incorrectly Classified Instances 154 Mean absolute error Root mean squared error Table 6.19 Naïve Bayes results in multi class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation Table 6.20 Naïve Bayes results in two class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation

21 Table 6.21 Naïve Bayes confusion matrix Good C1m C2m C3m C4m Good C1m C2m C3m C4m The Naïve Bayes is a simple yet efficient classifier achieving 100 % classification efficiency in two class mode with entropy based discretisation as observed from Table From the values tabulated in Table 6.20, it is inferred that none of the data transforms could produce 100% misfire detection in two class mode hence their use in misfire detection with Naïve Bayes is not favorable. In multi-class mode both the entropy based discretisation methods achieve the same classification accuracy of 84.6% with FSS as noted from Table Bayes net Bayes net or Bayseian belief network classifier is a slightly improvised model as compared to Naïve Bayes model. Its classification performance is presented in Tables 6.22 to The Table 7.22 shows the test parameters and classification efficiency, while the details of misclassifications are presented in Table 6.25 as confusion matrix. Table 6.22 Classifier parameters for Bayes net Classifier Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 0.2 s Total Number of Instances 1000 Correctly Classified Instances 846 Incorrectly Classified Instances 154 Mean absolute error Root mean squared error Estimator Simple Bayes Optimisation algorithm used Hill climber 116

22 Table 6.23 Bayes net results in multi class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation From the values tabulated in Table 6.24, it is inferred that none of the data transforms except random with FSS could produce 100% misfire detection in two class mode hence their use in misfire detection with Bayes net is not favorable. The Bayes net classifier achieves a classification efficiency of 100% in two class mode with both entropy based discretisation as seen from Table In multi-class mode, the performance is 84.6% with FSS as noted from Table It is observed that both Naïve Bayes and Bayes net accomplish the same classification results both in two class and multi class mode. Table 6.24 Bayes net results in two class mode Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation

23 Table 6.25 Bayes net confusion matrix Good C1m C2m C3m C4m Good C1m C2m C3m C4m Support Vector Machines The SVM is a complex and computationally taxing algorithm which is capable of returning considerably higher classification accuracies. There are various kernels that can be used to build the classifier as mentioned in section The evaluation results are summarized in Tables 6.26 to The SVM classifier parameters for c-svm with transform using linear function kernel is presented in Table 6.26 followed by the confusion matrix for the same in Table Table 6.26 Classifier parameters for c-svm linear with transform Parameters for evaluation Values Model performance evaluation 10-fold stratified cross-validation Model building time 1.1 s Total Number of Instances 1000 Correctly Classified Instances 865 Incorrectly Classified Instances 135 Mean absolute error Root mean squared error Kernel used Linear SVM type c-svm Analysing the results presented in Tables 6.27 and 6.28, it is observed that the overall performance of SVM is very good in many instances in two class mode but the major setback is the processing time required, which is considerably high. Making an initial choice based on two class performance and time, only transform is available as the 118

24 option without a choice since PCA is comparable in time but fails to achieve 100% in two class mode. Table 6.27 SVM results in multi class mode (c-svm linear) Statistical FSS tree FSS Time taken in seconds No pre Discretisation Entropy based processing using 10 bins discretisation With RBF Table 6.28 SVM results in two class mode (c- SVM linear) Statistical FSS tree FSS No pre Discretisation Entropy based processing using 10 bins discretisation The evaluation of various kernels under c-svm and nu-svm formulations are carried out with using transform and the results are presented in Tables 6.30 and

25 Table 6.29 c-svm confusion matrix with CFS and transform Good C1m C2m C3m C4m Good C1m C2m C3m C4m Table 6.30 c-svm with CFS and transform SVM Kernels used Multi class mode 2 class mode RBF polynomial Sigmoid linear Table 6.31 nu-svm with CFS and transform SVM Kernels used Multi class mode 2 class mode RBF polynomial Sigmoid linear The results of c-svm with CFS and transform presented in Table 6.30 implies that a linear kernel is performing with highest two class and multi-class classification accuracy of 86.5% and 100% respectively, followed by RBF kernel with 82.5%. The results in Table 6.31 clearly indicate that nu-svm cannot be a choice since it is not able to reach 100% classification accuracy in two class mode with any of the combinations considered. 120

26 6.5 SUMMARY The detailed analyses of statistical have led to the formulation of the following conclusions: Feature subset selection (FSS): The effect of FSS using CFS and decision tree were analysed and the resulting feature reduction was implemented for further analysis. FSS using CFS proved to be marginally better than decision tree based FSS. Effect of feature transforms: The effect of feature transforms like PCA, random and transform on classification accuracy were analysed by evaluating the classification accuracy using all the algorithms considered. The results found that the use of with data transforms was lagging in performance with almost all algorithms except SVM. Effect of Discretisation: The effect of EWD and EFD were not very prominent compared to entropy based discretisation which was performing with good results along with almost all classifier except with SVM, where it increased the processing time required for arriving at a decision. From among the entropy based discretisation, Konenenko based discretiser is found to perform better that and Irani model. Analysis of the best feature-classifier combination: Diverse families of classifiers were evaluated and at each stage the best classifier-feature combination was chosen based on the classification accuracy and time taken for building the model. the selected combinations were evaluated and the best options for building the model are analysed and presented below. 121

27 Table 6.32 Summary of classification efficiencies using statistical Classifier used Multi class Two class Time taken in mode mode seconds tree forest Fuzzy (FRRC) Naïve Bayes Bayes Net c-svm with linear kernel A compilation of multi-class and two class classification results along with the computation time taken using statistical is presented in Table Analysing the results, it is evident that all the classifiers achieve 100% classification accuracy in two class mode but only forest and tree shares the highest and the next highest multi class accuracy of 89.6% and 89.2% respectively. The choice between forest and decision tree can be decided based on a compromise between classification accuracy and the time taken for classification. Here the decision tree model (DT-CFS-KD) requires only 0.1 seconds whereas the forest takes 0.3 seconds for arriving at the same decisions but performs better. Since the time saving under consideration is not very large the forest with feature selection followed by Konenenko discretisation of data is the model of choice (RF-CFS-KD). 122

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target