FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW PREDICTIONS

Size: px
Start display at page:

Download "FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW PREDICTIONS"

Transcription

1 6 th International Conference on Hydroinformatics - Liong, Phoon & Babovic (eds) 2004 World Scientific Publishing Company, ISBN FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW PREDICTIONS DIMITRI P. SOLOMATINE UNESCO-IHE Institute for Water Education P.O. Box 3015 Delft The Netherlands MICHAEL BASKARA L. A. SIEK UNESCO-IHE Institute for Water Education P.O. Box 3015 Delft The Netherlands M5 is a method developed by Quinlan [10] for inducing trees of linear regression models (model trees). The paper addresses the flexibility and optimality in M5 model tree by proposing two new algorithms, namely M5flex and M5opt. M5flex algorithm brings in domain knowledge by enabling the user to choose split attributes and split values for important nodes in a model tree so that the resulting model would be more accurate, reliable and appropriate for practical applications. M5opt is a semi-non-greedy algorithm with a number of improvements if compared with M5. For experiments six hydrological data sets and five benchmark data sets were used. For comparison, M5 and ANN algorithms were employed as well. Overall, M5flex was the most accurate, followed by M5opt, M5 and ANN. INTRODUCTION Data-driven modelling (Solomatine [14]) based on the advances of machine learning and computational intelligence, proved to be a powerful approach to a number of problems in hydroinformatics context. One of the most frequently and successfully used techniques in this respect is an artificial neural network (ANN). It has been demonstrated, however, that there is a whole set of other methods that can at least as accurate, and have additional advantages (Solomatine & Dulal [15]). One of such numerical prediction (regression) methods, that we found to be practically unknown to practitioners is so-called M5 model tree of Quinlan [10]. It is based on ideas of a popular classification method, a decision tree that follows the principle of recursive partitioning of input space using entropy-based measures, and finally assigning class labels to resulting subsets. In regression context, if a leave is associated with an average output value of the instances sorted down to it (zero-order model), then the overall approach is called a regression tree introduced by Breiman et al. [4]. If the tree has in its leaves more complex regression functions of the input variables, then the overall approach is called a model tree. The two notable approaches are: M5 model trees (Quinlan [11]; Wang & Witten [18]). multiple adaptive regression splines (MARS) by Friedman [8]. 1

2 The advantages of M5 model trees (Solomatine & Dulal [15]; Solomatine [16]) are that they are more accurate than regression trees, more understandable than, for example, ANNs, easy to use and to train, robust when dealing with missing data, can handle large number of attributes and high dimensions. The paper describes new implementations of the M5 model tree method, namely, M5flex and M5opt algorithms, together with their applications. M5 MODEL TREES M5 model trees splits the input progressively. The set T of examples is either associated with a leaf, or some test is chosen that splits T into subsets corresponding to the test outcomes and the same process is applied recursively to the subsets. Splits are based on minimizing the intra-subset variation in the output values down each branch. In each node, the standard deviation of the output values for the examples reaching a node is taken as a measure of the error of this node and calculating the expected reduction in error as a result of testing each attribute and all possible split values. The attribute that maximizes the expected error reduction is chosen. The standard deviation reduction (SDR) is calculated by SDR = sd(t) sd(t ) T / T (1) i i i where T is the set of examples that reach the node and T 1, T 2, are the sets that result from splitting the node according to the chosen attribute (in case of multiple split). The splitting process will terminate if the output values of all the instances that reach the node vary only slightly or only a few instances remain. Figure 1 presents an example. Tree-like regression models are built following the assumption that the functional dependency varies across domain so should be approximated by a number of local model, in case of M5 trees a linear one this makes M5 model tree a piece-wise linear function. After the initial tree has been grown, there are several steps that have to be taken, such as: calculation of error estimates, generation of linear models, simplification of linear models, pruning and smoothing. M5flex MODEL TREE ALGORITHM: INCLUSION OF DOMAIN EXPERT Some approaches give the user opportunity to choose the split attribute and value for each node. Ankerst et al. [1] introduced a visual approach to decision tree construction (based on CART, C4, CLOUDS, SPRINT algorithms) by visualizing multi-dimensional data with a class label such that their degree of impurity with respect to class membership can be easily perceived by the user. Ware et al. [19] introduced visual decision tree (C4.5) construction using 2D polygons. Techniques for building interactively the model trees seem to be missing. 2

3 The challenge is to integrate the background knowledge into a machine learning algorithm by allowing the user to determine some important structural properties of the model based on the physical insight, and leaving more tedious tasks to machine learning. The proposed M5flex method enables the user to determine split attributes and values in some parts of important (top-most) nodes, and then the M5 machine learning algorithm takes care of the remainder of the model tree building. Typically the domain expert would define the split parameters for the nodes of two levels at the top of the tree. The splitting processes in these nodes are important since they affect the splitting below these nodes and influence the performance of the whole model. User-defined split in the subsequent levels can be done as well, however, our experience shows that it becomes more complex for the user is often less accurate than automatic splits by M5. In the context of flood prediction, for example, the expert user can instruct the M5flex to separate the low flow and high flow conditions to be modelled separately. Hence, the M5flex model trees can be more suitable for hydrological applications and operating strategies, than ANNs or standard M5 model trees. M5opt MODEL TREE ALGORITHM: OPTIMIZATION A number of researchers aimed at improving the predictive accuracy of tree-based model, however they dealt mostly with decision trees; these are Utgoff et al. [17], Caruana & Freitag [5], Freund & Mason [7], Pfahringer et al. [9], Frank & Witten [6] (introduced multi-way splits), Sikonja & Kononenko [13] (pruning regression trees using minimum description length principle), Quinlan [11] (combination of regression and model trees and ANNs with instance-based learning). A notable non-greedy approach using iterative linear programming for constructing globally optimal decision trees was proposed by Bennett [2]. However, we have not found publications on optimal model trees for regression. Standard M5 adopts a greedy algorithm which constructs a model tree with a nonfixed structure by using a certain stopping criterion. M5 minimizes error at each interior node, one node at a time. This process is started at the root and is repeated recursively until all or almost all of the instances are correctly classified. In constructing this initial tree M5 is greedy, and this can be improved. In principle, it is possible to build a fully non-greedy algorithm, however computational cost of such approach would be too high. In M5opt, a compromise of combining greedy and non-greedy approaches was adopted. M5opt enables the user to define level of tree until which non-greedy algorithm is applied starting from the root. If full exhaustive search is employed at this stage, all tree structures and all possible attributes and split values are tried; an alternative is to employ randomized search, for example a genetic algorithm. The levels below are constructed by the greedy M5 algorithm. This principle still complies with the way that the terms of linear models at the leaves of model tree are obtained from split attributes of the interior nodes below these leaves before pruning process. M5opt has a number of other attractive 3

4 additional features: initial approximation (M5 builds the initial model tree in a way similar to regression trees where the split is performed based on the averaged output values of the instances that reach a node; M5opt algorithm builds linear models directly in the initial model tree); compacting the tree (improvement to the pruning method of M5). EXPERIMENTS Three hydrological data sets of Sieve catchment (Italy) with the hourly rainfall and runoff (Solomatine and Dulal [15]), three hydrological data sets of Bagmati catchment (Nepal) with the daily rainfall and runoffs, and five widely used benchmark data sets (Autompg, Bodyfat, CPU, Friedman and Housing, Blake & Mertz [3]) were employed. Five methods were used: M5, M5flex, M5opt and ANN (MLP). M5flex which was used applied only for 6 hydrological data sets. The problem associated with hydrological data sets is to predict runoff (Q t+i ) several hours ahead with respect to previous runoff (Q t- ) and effective rainfall (RE t- ). Before building a prediction model, it was necessary to analyze the physical characteristics of the catchment and then to select the input and output variables by analyzing the interdependencies between variables and the lags using correlation analysis. Finally the following 3 models were built: Q t+1 = f (RE t, RE t-1, RE t-2, RE t-3, RE t-4, RE t-5, Q t, Q t-1, Q t-2 ) Q t+3 = f (RE t, RE t-1, RE t-2, RE t-3, Q t, Q t-1 ) Q t+6 = f (RE t, Q t, ) The model for Bagmati case was set to be: Q t+1 = f (RE t, RE t-1, RE t-2, Q t, Q t-1 ). In Bagmati case study the data set was also separated into high flows and low flows with 300 m 3 /s as division point, and two additional models were built. M5 model trees were built based on default parameter values: pruning factor 2.0 and smoothing option; same settings were also used in M5flex and M5opt experiments. M5flex model trees. The user could to modify the split attributes and values in each node only in the first and the second level of model tree; this limitation was proposed simply to reduce the complexity that the domain expert would face. The split values that were used in experiments were points around extreme values (minimum and maximum), mean and some trials were needed to find the best model tree. M5opt model trees. There is a large number of parameters combinations that can be set in M5opt; we used twelve of these (Siek [12]). RESULTS AND DISCUSSION The overall experimental results can be summarized in Table 1 and Table 2. M5opt model trees were most accurate on eleven data sets, but for Bagmati-High, Bagmati-Low, Autompg, and Friedman data sets the best accuracy was given by ANN models. 4

5 The experiments dealing with all algorithms for Sieve and Bagmati (six) data sets (Table 2) showed that M5flex model trees were the best accuracy on most of these data sets, except Sieve Q t+6 data set where M5opt model was better. To compare algorithms' performance, a scoring matrix proposed by D.L. Shrestha was used; it is a square matrix whose diagonal elements are zero and other elements are the averages of relative performance of one algorithm compared to anotherwith respect to all data sets used. The element of scoring matrix SM i,j should be read as the average performance of ith algorithm over jth algorithm and is calculated as follows: N 1 RMSEk, j RMSEk, i =, i j SM i, j N max( RMSE RMSE k= k j, k i ) 1,, (2) 0, i = j where N is the number of data sets. By summing up all element values column-wise one can determine the overall score of each algorithm. Table 1. The best performance of M5, M5opt and ANN for each data set (RMSE) Data Sets ANN M5' M5opt Train Verif Train Verif Train Verif Q t Q t Q t All High Low Autompg Bodyfat CPU Friedman Housing Sieve Bag Others Table 2. The best performance of M5, M5flex, M5opt and ANN for hydrological data sets (RMSE) Data ANN M5' M5flex M5opt Sets Train Verif Train Verif Train Verif Train Verif Q t Q t Q t All High Low Sieve Bag 5

6 For all (eleven) data sets and comparing three algorithms (M5, M5opt and ANN), M5opt has the highest scoring factor of 27.8 (Table 3). The best models for the eight data sets of the eleven data sets used were obtained using exhaustive search. Table 3. Scoring matrix for all 11 verification data sets (in %) ANN M5' M5opt ANN M5' M5opt Total Table 4. Scoring matrix for the 6 verification hydrological data sets (in %) ANN M5' M5flex M5opt ANN M5' M5flex M5opt Total The comparison among four algorithms (M5, M5flex, M5opt and ANN) on hydrological problems (Sieve and Bagmati) can be seen in Table 4. The accuracy of M5opt (16.0) is lower than the accuracy of M5flex (37.4). The reason for high performance of M5flex is that it uses additional domain knowledge for determining the best split attributes and values. Also, M5flex and M5opt could predict the peak values where the others algorithms could not. The use of compacting in M5opt allows for making the resulting model tree simpler (as simple as the user wants) and more balanced this is desirable for the practical applications. To see the effect of optimization and compacting, compare model trees built for one of the case studies (Sieve Q t+6 ): M5 tree (Figure 1) has 7 rules with RMSE , but M5opt (Figure 2) has only 2 rules with RMSE Qt <= 37 : REt <= : LM1 (879/3.51%) REt > : LM2 (221/41.3%) Qt > 37 : REt <= : Qt <= 70.2 : LM3 (356/24.4%) Qt > 70.2 : LM4 (225/33.7%) REt > : REt <= 2.04 : LM5 (135/160%) REt > 2.04 : Qt <= 342 : LM6 (30/392%) Qt > 342 : LM7 (8/144%)

7 7 Models at the leaves: LM1: Qt+6 = REt Qt LM2: Qt+6 = REt Qt LM3: Qt+6 = REt Qt LM4: Qt+6 = REt Qt LM5: Qt+6 = REt Qt LM6: Qt+6 = REt Qt LM7: Qt+6 = REt + 0.4Qt Number of Rules : 7 Root mean squared error Figure 1. Model tree (M5 ), Q t+6 data set Qt <= 37 : LM1 (1100/19.3%) Qt > 37 : LM2 (754/116%) Models at the leaves: LM1: Qt+6 = REt Qt LM2: Qt+6 = REt Qt Number of Rules : 2 Root mean squared error Figure 2. Model tree (M5opt) with compacting the tree until level 1, Q t+6 data set CONCLUSION The following can be concluded. M5 model trees family is an accurate data-driven modelling approach leading to transparent models that can be easily understood by the decision makers. The approach taken in M5opt algorithm makes it possible to construct models, more accurate than standard M5 (or M5) ones. Additional computational costs can be high but they can be user-controlled by selecting the appropriate tree level until which the exhaustive search is executed. M5flex algorithm allows for bringing the domain knowledge into the process of data-driven modelling and in a number of case if outperforms the M5opt and ANN. It requires however the involvement of the domain expert but we see this more as its strength. The following research will be oriented towards reduction of computational time of M5opt and the procedure of building the M5 trees. The immediate plan of the authors is to improve the M5opt algorithm by introducing a choice of optimization approaches, and to try to combine M5opt with M5flex. REFERENCES [1] Ankerst, M., Elsen, C., Ester, M. and Kriegel, H., Visual classification: an interactive approach to decision tree construction, In Proceedings of the 5 th International Conference on Knowledge Discovery and Data Mining, ACM Press, (1999), pp

8 [2] Bennett, K.P., Global tree optimization: a non-greedy decision tree algorithm, Journal of Computing Science and Statistics, Vol. 26, (1994), pp [3] Blake, C.L. & Mertz, C.J., UCI Repository of machine learning databases. [ Univ. of California, (1998). [4] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J., Classification and regression trees, Wadsworth, Belmont, CA, (1984). [5] Caruana, R. and Freitag, D., Greedy attribute selection, International Conference on Machine Learning, (1994), pp [6] Frank, E. and Witten, I.H., Selecting multiway splits in decision trees, Working paper 96/31, Dept. of Computer Science, University of Waikato, (1996). [7] Freund, Y. & Mason, L., The alternating decision tree learning algorithm, Proc. 16 th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, (1999), pp [8] Friedman, J.H., Multivariate adaptive regression splines. Annals of Statistics, Vol. 19, (1991), pp [9] Pfahringer, B., Geoffrey, H. and Kirkby, R., Optimizing the induction of alternating decision trees, Proceedings of the Fifth Pasific-Asia Conf. on Advances in Knowledge Discovery and Data Mining, (2001). [10] Quinlan, J.R., Learning with continuous classes, Proc. AI 92, 5th Australian Joint Conference on Artificial Intelligence, Adams & Sterling (eds.), World Scientific, Singapore, (1992), pp [11] Quinlan, J.R., Combining instance-based and model-based learning, In proceedings ML 93 (Utgoff, Ed), Morgan Kaufmann, (1993). [12] Siek, M.B.L.A., Flexibility and optimality in model tree learning with application to water-related problems, MSc Thesis Report, IHE Delft, (2003). [13] Sikonja, M.R. and Kononenko, I., Pruning regression trees with MDL, 13 th European Conference on Artificial Intelligence (ECAI 98), (1998). [14] Solomatine, D.P., Data-driven modelling: paradigm, methods, experiences, Proc. 5 th International Conference on Hydroinformatics, Cardiff, UK, (2002). [15] Solomatine, D.P. and Dulal, K. N., Model tree as an alternative to neural network in rainfall-runoff modeling, Hydrological Sc. J., Vol.48(3), (2003), pp [16] Solomatine, D.P., Mixtures of simple models vs ANNs in hydrological modelling, Proc. Int. Conference in Hybrid Intelligence System (HIS'03), Melbourne, (2003). [17] Utgoff, P.E., Berkman, N.C. and Clouse, J.A., Decision tree induction based on efficient tree restructuring, J. of Machine Learning, Vol. 29 (1), (1997), pp [18] Wang, Y. and Witten, I.H., Induction of model trees for predicting continuous classes, Proc. European Conf. on Machine Learning, Prague, (1997), pp [19] Ware, M., Frank, E., Holmes, G., Hall, M. and Witten, I.H., Interactive machine learning - letting users build classifiers, Int. J. on Human-Computer Studies, (2000). 8

OPTIMIZING MIXTURES OF LOCAL EXPERTS IN TREE-LIKE REGRESSION MODELS

OPTIMIZING MIXTURES OF LOCAL EXPERTS IN TREE-LIKE REGRESSION MODELS Proc. IASTED Conference on Artificial Intelligence and Apllications, M.H. Hamza, (ed), Innsbruck, Austria, February 2005, 497-502 OPTIMIZING MIXTURES OF LOCAL EXPERTS IN TREE-LIKE REGRESSION MODELS Michael

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science

More information

A Two Stage Zone Regression Method for Global Characterization of a Project Database

A Two Stage Zone Regression Method for Global Characterization of a Project Database A Two Stage Zone Regression Method for Global Characterization 1 Chapter I A Two Stage Zone Regression Method for Global Characterization of a Project Database J. J. Dolado, University of the Basque Country,

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Comparing Univariate and Multivariate Decision Trees *

Comparing Univariate and Multivariate Decision Trees * Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

REGRESSION BY SELECTING APPROPRIATE FEATURE(S)

REGRESSION BY SELECTING APPROPRIATE FEATURE(S) REGRESSION BY SELECTING APPROPRIATE FEATURE(S) 7ROJD$\GÕQDQG+$OWD\*üvenir Department of Computer Engineering Bilkent University Ankara, 06533, TURKEY Abstract. This paper describes two machine learning

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Logistic Model Tree With Modified AIC

Logistic Model Tree With Modified AIC Logistic Model Tree With Modified AIC Mitesh J. Thakkar Neha J. Thakkar Dr. J.S.Shah Student of M.E.I.T. Asst.Prof.Computer Dept. Prof.&Head Computer Dept. S.S.Engineering College, Indus Engineering College

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

Speeding up Logistic Model Tree Induction

Speeding up Logistic Model Tree Induction Speeding up Logistic Model Tree Induction Marc Sumner 1,2,EibeFrank 2,andMarkHall 2 Institute for Computer Science University of Freiburg Freiburg, Germany sumner@informatik.uni-freiburg.de Department

More information

Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers

Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers A. Srivastava E. Han V. Kumar V. Singh Information Technology Lab Dept. of Computer Science Information Technology Lab Hitachi

More information

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree

More information

Constraint Based Induction of Multi-Objective Regression Trees

Constraint Based Induction of Multi-Objective Regression Trees Constraint Based Induction of Multi-Objective Regression Trees Jan Struyf 1 and Sašo Džeroski 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, B-3001 Leuven, Belgium

More information

Induction of Multivariate Decision Trees by Using Dipolar Criteria

Induction of Multivariate Decision Trees by Using Dipolar Criteria Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Interactive Machine Learning Letting Users Build Classifiers

Interactive Machine Learning Letting Users Build Classifiers Interactive Machine Learning Letting Users Build Classifiers Malcolm Ware Eibe Frank Geoffrey Holmes Mark Hall Ian H. Witten Department of Computer Science, University of Waikato, Hamilton, New Zealand

More information

Hybrid Approach for Classification using Support Vector Machine and Decision Tree

Hybrid Approach for Classification using Support Vector Machine and Decision Tree Hybrid Approach for Classification using Support Vector Machine and Decision Tree Anshu Bharadwaj Indian Agricultural Statistics research Institute New Delhi, India anshu@iasri.res.in Sonajharia Minz Jawaharlal

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Speeding Up Logistic Model Tree Induction

Speeding Up Logistic Model Tree Induction Speeding Up Logistic Model Tree Induction Marc Sumner 1,2,EibeFrank 2,andMarkHall 2 1 Institute for Computer Science, University of Freiburg, Freiburg, Germany sumner@informatik.uni-freiburg.de 2 Department

More information

ICA as a preprocessing technique for classification

ICA as a preprocessing technique for classification ICA as a preprocessing technique for classification V.Sanchez-Poblador 1, E. Monte-Moreno 1, J. Solé-Casals 2 1 TALP Research Center Universitat Politècnica de Catalunya (Catalonia, Spain) enric@gps.tsc.upc.es

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Unsupervised Discretization using Tree-based Density Estimation

Unsupervised Discretization using Tree-based Density Estimation Unsupervised Discretization using Tree-based Density Estimation Gabi Schmidberger and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {gabi, eibe}@cs.waikato.ac.nz

More information

Technical Note Using Model Trees for Classification

Technical Note Using Model Trees for Classification c Machine Learning,, 1 14 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note Using Model Trees for Classification EIBE FRANK eibe@cs.waikato.ac.nz YONG WANG yongwang@cs.waikato.ac.nz

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Inducing Cost-Sensitive Trees via Instance Weighting

Inducing Cost-Sensitive Trees via Instance Weighting Inducing Cost-Sensitive Trees via Instance Weighting Kai Ming Ting School of Computing and Mathematics, Deakin University, Vic 3168, Australia. Abstract. We introduce an instance-weighting method to induce

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{ijs.si, cs.cmu.edu} Abstract Web browsing

More information

SYMBOLIC FEATURES IN NEURAL NETWORKS

SYMBOLIC FEATURES IN NEURAL NETWORKS SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:

More information

Look-Ahead Based Fuzzy Decision Tree Induction

Look-Ahead Based Fuzzy Decision Tree Induction IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 3, JUNE 2001 461 Look-Ahead Based Fuzzy Decision Tree Induction Ming Dong, Student Member, IEEE, and Ravi Kothari, Senior Member, IEEE Abstract Decision

More information

Thesis Overview KNOWLEDGE EXTRACTION IN LARGE DATABASES USING ADAPTIVE STRATEGIES

Thesis Overview KNOWLEDGE EXTRACTION IN LARGE DATABASES USING ADAPTIVE STRATEGIES Thesis Overview KNOWLEDGE EXTRACTION IN LARGE DATABASES USING ADAPTIVE STRATEGIES Waldo Hasperué Advisor: Laura Lanzarini Facultad de Informática, Universidad Nacional de La Plata PhD Thesis, March 2012

More information

Intrusion detection in computer networks through a hybrid approach of data mining and decision trees

Intrusion detection in computer networks through a hybrid approach of data mining and decision trees WALIA journal 30(S1): 233237, 2014 Available online at www.waliaj.com ISSN 10263861 2014 WALIA Intrusion detection in computer networks through a hybrid approach of data mining and decision trees Tayebeh

More information

A Two-level Learning Method for Generalized Multi-instance Problems

A Two-level Learning Method for Generalized Multi-instance Problems A wo-level Learning Method for Generalized Multi-instance Problems Nils Weidmann 1,2, Eibe Frank 2, and Bernhard Pfahringer 2 1 Department of Computer Science University of Freiburg Freiburg, Germany weidmann@informatik.uni-freiburg.de

More information

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Data Mining and Knowledge Discovery, 4, 315 344, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RAJEEV

More information

Lecture 5: Decision Trees (Part II)

Lecture 5: Decision Trees (Part II) Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice

More information

Generating Rule Sets from Model Trees

Generating Rule Sets from Model Trees Generating Rule Sets from Model Trees Geoffrey Holmes, Mark Hall and Eibe Frank Department of Computer Science University of Waikato, New Zealand {geoff,mhall,eibe}@cs.waikato.ac.nz Ph. +64 7 838-4405

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX THESIS CHONDRODIMA EVANGELIA Supervisor: Dr. Alex Alexandridis,

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature

More information

Generating Optimized Decision Tree Based on Discrete Wavelet Transform Kiran Kumar Reddi* 1 Ali Mirza Mahmood 2 K.

Generating Optimized Decision Tree Based on Discrete Wavelet Transform Kiran Kumar Reddi* 1 Ali Mirza Mahmood 2 K. Generating Optimized Decision Tree Based on Discrete Wavelet Transform Kiran Kumar Reddi* 1 Ali Mirza Mahmood 2 K.Mrithyumjaya Rao 3 1. Assistant Professor, Department of Computer Science, Krishna University,

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Biology Project 1

Biology Project 1 Biology 6317 Project 1 Data and illustrations courtesy of Professor Tony Frankino, Department of Biology/Biochemistry 1. Background The data set www.math.uh.edu/~charles/wing_xy.dat has measurements related

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

Softening Splits in Decision Trees Using Simulated Annealing

Softening Splits in Decision Trees Using Simulated Annealing Softening Splits in Decision Trees Using Simulated Annealing Jakub Dvořák and Petr Savický Institute of Computer Science, Academy of Sciences of the Czech Republic {dvorak,savicky}@cs.cas.cz Abstract.

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

Implementation of Classification Rules using Oracle PL/SQL

Implementation of Classification Rules using Oracle PL/SQL 1 Implementation of Classification Rules using Oracle PL/SQL David Taniar 1 Gillian D cruz 1 J. Wenny Rahayu 2 1 School of Business Systems, Monash University, Australia Email: David.Taniar@infotech.monash.edu.au

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

arxiv: v1 [stat.ml] 25 Jan 2018

arxiv: v1 [stat.ml] 25 Jan 2018 arxiv:1801.08310v1 [stat.ml] 25 Jan 2018 Information gain ratio correction: Improving prediction with more balanced decision tree splits Antonin Leroux 1, Matthieu Boussard 1, and Remi Dès 1 1 craft ai

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Fuzzy-Rough Feature Significance for Fuzzy Decision Trees

Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Richard Jensen and Qiang Shen Department of Computer Science, The University of Wales, Aberystwyth {rkj,qqs}@aber.ac.uk Abstract Crisp decision

More information

USING REGRESSION TREES IN PREDICTIVE MODELLING

USING REGRESSION TREES IN PREDICTIVE MODELLING Production Systems and Information Engineering Volume 4 (2006), pp. 115-124 115 USING REGRESSION TREES IN PREDICTIVE MODELLING TAMÁS FEHÉR University of Miskolc, Hungary Department of Information Engineering

More information

Using Turning Point Detection to Obtain Better Regression Trees

Using Turning Point Detection to Obtain Better Regression Trees Using Turning Point Detection to Obtain Better Regression Trees Paul K. Amalaman, Christoph F. Eick and Nouhad Rizk pkamalam@uh.edu, ceick@uh.edu, nrizk@uh.edu Department of Computer Science, University

More information

A Cloud Framework for Big Data Analytics Workflows on Azure

A Cloud Framework for Big Data Analytics Workflows on Azure A Cloud Framework for Big Data Analytics Workflows on Azure Fabrizio MAROZZO a, Domenico TALIA a,b and Paolo TRUNFIO a a DIMES, University of Calabria, Rende (CS), Italy b ICAR-CNR, Rende (CS), Italy Abstract.

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Adaptive Building of Decision Trees by Reinforcement Learning

Adaptive Building of Decision Trees by Reinforcement Learning Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 34 Adaptive Building of Decision Trees by Reinforcement Learning MIRCEA

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

A Comparative Study of Reliable Error Estimators for Pruning Regression Trees

A Comparative Study of Reliable Error Estimators for Pruning Regression Trees A Comparative Study of Reliable Error Estimators for Pruning Regression Trees Luís Torgo LIACC/FEP University of Porto R. Campo Alegre, 823, 2º - 4150 PORTO - PORTUGAL Phone : (+351) 2 607 8830 Fax : (+351)

More information

Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources

Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources Doina Caragea, Adrian Silvescu, and Vasant Honavar Artificial Intelligence Research Laboratory, Computer Science Department,

More information

Performance analysis of a MLP weight initialization algorithm

Performance analysis of a MLP weight initialization algorithm Performance analysis of a MLP weight initialization algorithm Mohamed Karouia (1,2), Régis Lengellé (1) and Thierry Denœux (1) (1) Université de Compiègne U.R.A. CNRS 817 Heudiasyc BP 49 - F-2 Compiègne

More information

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Graph Propositionalization for Random Forests

Graph Propositionalization for Random Forests Graph Propositionalization for Random Forests Thashmee Karunaratne Dept. of Computer and Systems Sciences, Stockholm University Forum 100, SE-164 40 Kista, Sweden si-thk@dsv.su.se Henrik Boström Dept.

More information

Towards an Effective Cooperation of the User and the Computer for Classification

Towards an Effective Cooperation of the User and the Computer for Classification ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining (KDD- 2000), Boston, MA. Towards an Effective Cooperation of the User and the Computer for Classification Mihael Ankerst, Martin Ester, Hans-Peter

More information

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem

More information

BOAI: Fast Alternating Decision Tree Induction based on Bottom-up Evaluation

BOAI: Fast Alternating Decision Tree Induction based on Bottom-up Evaluation : Fast Alternating Decision Tree Induction based on Bottom-up Evaluation Bishan Yang, Tengjiao Wang, Dongqing Yang, and Lei Chang Key Laboratory of High Confidence Software Technologies (Peking University),

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

MySQL Data Mining: Extending MySQL to support data mining primitives (demo)

MySQL Data Mining: Extending MySQL to support data mining primitives (demo) MySQL Data Mining: Extending MySQL to support data mining primitives (demo) Alfredo Ferro, Rosalba Giugno, Piera Laura Puglisi, and Alfredo Pulvirenti Dept. of Mathematics and Computer Sciences, University

More information

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES 8 th International Conference on DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a n i a, M a y 25 27, 2 0 0 6 LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Chapter 12 Feature Selection

Chapter 12 Feature Selection Chapter 12 Feature Selection Xiaogang Su Department of Statistics University of Central Florida - 1 - Outline Why Feature Selection? Categorization of Feature Selection Methods Filter Methods Wrapper Methods

More information

Application of Multivariate Adaptive Regression Splines to Evaporation Losses in Reservoirs

Application of Multivariate Adaptive Regression Splines to Evaporation Losses in Reservoirs Open access e-journal Earth Science India, eissn: 0974 8350 Vol. 4(I), January, 20, pp.5-20 http://www.earthscienceindia.info/ Application of Multivariate Adaptive Regression Splines to Evaporation Losses

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Remco R. Bouckaert 1,2 and Eibe Frank 2 1 Xtal Mountain Information Technology 215 Three Oaks Drive, Dairy Flat, Auckland,

More information

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

CLASSIFICATION FOR SCALING METHODS IN DATA MINING CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department

More information