Empirical Comparison of Bagging-based Ensemble Classifiers

Size: px
Start display at page:

Download "Empirical Comparison of Bagging-based Ensemble Classifiers"

Transcription

1 Empirical Comparison of Bagging-based Ensemble Classifiers Ren Ye, P.N. Suganthan School of Electrical and Electronic Engineering Nanyang Technological University, Singapore {re0003ye, Abstract This paper compares empirically four baggingbased ensemble classifiers, namely the ensemble adaptive neurofuzzy inference system (ANFIS), the ensemble support vector machine (SVM), the ensemble extreme learning machine (ELM) and the random forest. The comparison of these four ensemble classifiers is novel because it has not been reported in the existing literature. The classifiers are evaluated with thirteen binary class datasets and the empirical results show that the ensemble methods employed in the four ensemble classifiers boost the testing accuracy by 1-5% on average from their base classifiers. In addition, the testing accuracy can be improved by increasing the number of base classifiers. The empirical results also show that the bagging SVM is the most favorable ensemble classifier among them. I. INTRODUCTION Classification is a common problem in machine learning, which is to group the new observations/samples based on a training set whose identifications (labels) are provided (supervised learning) [1] or unprovided (unsupervised learning, not discussed in the paper) [2]. Classification can be applied in various areas such as medicine, manufacturing, bioinformatics, finance, etc. There are several well-developed tools for classification such as Bayesian, K-nearest neighbors, decision trees, fuzzy neural networks (FNN), support vector machines (SVM) and so on. There are two phases to design a classifier: (i) supervised learning (unsupervised learning is not discussed in the paper) and (ii) evaluation of generalization performance. However, due to the variance in the training data, the bias of the attributes, etc., different accuracies may be obtained with different classifiers, or same classifiers with different parameters. In other words, there is no universal classifier for all types of data distributions [1]. In order to improve the performance of classifiers, ensemble learning can be used. Ensemble learning is to use multiple models to achieve a better predictive performance than that could be obtained from any of the constituent models [3]. There are two categories of ensemble learning, one is competitive ensemble learning and the other is cooperative ensemble learning [4]. The base classifiers work individually on one set of a problem and vote for the final decision is competitive ensemble learning whereas the base classifiers work associatively with each other and aggregate the decisions to the final decision is cooperative ensemble learning. The advantage of ensemble learning is the improvement of the accuracy [5]. The key determinant to better accuracy and generalization is ensemble diversity. There are mainly three strategies for ensemble diversity: (i) kernel diversity, (ii) parameter diversity and (iii) data diversity [4]. Kernel diversity: Kernel function converts the data space to a non-linear space so that a linear decision boundary can be used in the non-linear space to classify the data. Kernel functions play an important role in SVM and ELM. Varying the kernel functions for different ensembles will diversify the base classifiers. Parameter diversity: Different parameters will vary the performance of classifiers. For example, the parameters that affect the performance of fuzzy neural network (FNN) are the membership function, number of nodes, rules, aggregation methods, etc. and for SVM, the parameters affects the performance are the margin parameter and the parameters associated with the kernel functions. Data diversity: For a finite dataset, different data perturbations may lead to significantly different classification configurations. Data diversity minimizes the bias and variance and increases the generalization capability of the classifiers. Cross validation (CV) is a commonly used method in machine learning to avoid overfitting [6]. Other advanced data diversity algorithms such as bootstrap aggregation (bagging) [7] and boosting [8] are widely used in ensemble methods. The ensemble classifiers can make use one or more of the strategies mentioned above in the actual implementation. This paper will focus on data diversity, especially on bagging-based ensemble classifiers. Bagging is a method for generating multiple datasets for a classifier and use the classification results of each datasets to obtain an aggregated classifier. Bagging-based ensemble learning can be categorized as competitive learning and data diversity strategy. In the literature, there are a few bagging classifiers reported. The original bagging algorithm was applied to classification trees [7]. The bagging classification tree improved the performance of the classification tree significantly in terms of error reduction on eight datasets. In [9], the bagging algorithm was employed to a Bayesian predictor. In [10], the bagging algorithm was employed to a tree-based ranking predictor. The 917

2 empirical results showed that with bagging, both predictors outperformed the original predictors on most of the datasets. An improved bagging-based classification tree is reported in [2]. The algorithm is called Random Forest. Random forest introduces an additional mode of randomness to bagging, thereby increasing the data diversity. In addition to bootstrapping on training data, which is a standard procedure in bagging, random forest improves the construction of the classification trees. The normal way of constructing a classification tree is to split the node at the best split among all attributes whereas for random forest, the node is split using the best split among a randomly chosen subset of attributes [11]. The authors claimed that random forest yields good accuracy and robustness. Bagging can also be applied to neural networks and FNNs [12], [13]. In [13], an ensemble fuzzy classifier was created with bagging and adaptive neuro-fuzzy inference systems (ANFIS). Twenty datasets were evaluated and the bagging ANFIS outperformed the original ANFIS on most of the datasets in terms of accuracy. Support Vector Machine (SVM) is a machine learning tool for classification as well as regression [14]. The advantage of SVM is that it can be implemented in a high-dimensional linear attribute space but with a non-linear classifier by appropriately choosing a kernel function. The kernel function enables SVM to perform well for high-dimensional datasets. However, choosing a kernel function and the parameters of the kernel function usually undergoes an exhaustive search process [5]. Improperly chosen kernel functions and parameters will reduce the accuracy of the SVM. Two variations of bagging-based ensemble SVMs were reported in [15], [16]. A low bias bagged SVM was reported in [16]. The authors claimed that the bagging algorithm in the learning phase should be subjected to minimizing bias even with a trade-off on increasing variance. Therefore, the bagging algorithm was modified to a lobag algorithm. The lobag algorithm first estimated the parameters of an SVM that minimizes the bias and then based on the chosen parameters creating bags for the SVM classifiers. In [15], a double SVM bagging was presented. The authors modified the conventional bagging SVM with two phase classification: first phase was an SVM classification and second phase was a decision tree classification. Unlike the conventional bagging SVM classifiers, the reported double SVM bagging algorithm used the out-of-bag samples for training and combined the bagging sample to a probability matrix. In the testing phase, decision tree was employed and the final decision was voted by the confidence levels. Both ensemble SVMs returned a conclusion of improved performance. An ensemble SVM based on cooperative ensemble learning was presented in [1]. The data diversity was realized by iteratively learning from the training data and adding the false predicted data in the previous iteration as support vectors in the next iteration. The accuracy was determined by aggregating the decisions of all iterations. However, the authors noticed that for certain datasets, there were no convergence but oscillations and the oscillation problem is still un-solved. Extreme learning machine (ELM) is an effective machine learning algorithm based on single-hidden layer feedforward neural network (SLFN) architecture [17]. It is a fast algorithm and can provide good generalization. There are several ensemble ELM developed in the literature. A parameter diversity based ensemble ELM was reported in [18]. The ELM classifiers were created by k-fold CV. For each fold, there were a few training iterations and the parameters w and b of the output neuron feature space H were tuned for better performance in each training iteration. The best performed parameters were saved for each ELM classifiers. The testing results were decided by a majority vote. Another parameter diversity based ensemble ELM was presented in [19]. The authors created the ensembles by assigning different initial weight values to different ELM classifiers and followed by training with ELM. Therefore, a lot of ELM classifiers were created. Finally the testing result was aggregated by a majority voting. Both of the ensemble ELMs [18], [19] outperformed the original ELM in classification. A bagging-based ensemble ELM regressor was developed in [20]. The bagging algorithm was employed to bootstrap the training dataset to several sub-training datasets. ELM regressor was applied on each sub-training datasets and the error rates were calculated and stored as weights of vote. The testing results were decided by a weigh vote combining method. This paper will use the similar algorithm but with ELM classifier for empirical study on binary classification problem. The above-mentioned classifiers belong to FNN, kernel machine and decision trees. This paper will study four typical classifiers from these four categories, they are ANFIS of FNN, SVM of kernel machine, ELM of FNN and kernel machine and random forest of decision trees. The objective of this paper is to evaluate four bagging-based ensemble classifiers: the bagging ANFIS, the bagging SVM, the bagging ELM and the random forest. The performance of these four bagging ensemble methods are compared. The evaluation is based on thirteen UCI binary class datasets. To the authors knowledge, the comparison of the four ensemble classifiers has not been studied in the existing literature. The remaining of the paper is organized as follows: Section II elaborates the details of the bagging algorithm and four classifiers; Section III details the ensemble SVM, the ensemble ANFIS, the ensemble ELM and the random forest; Section IV discusses and compares the empirical results and finally Section V concludes the paper. II. CLASSIFIERS A. Preliminary Definition Let D = {x be a dataset with n samples and each sample has d attributes. Let T be a training set and S beatestingset, where D = T + S. There is an array of identification labels y { 1, 1 mapping to D and therefore y = y T +y S, where y T and y S correspond to the labels of T and S respectively. The general idea of a classifier is to find ŷ S based on a 918

3 wx+b=1 wx+b=0 Wx+b=-1 Fig. 1: A Typical ANFIS Architecture function ŷ S = t(s, y S ), where the function t( ) is determined by a learning function l(t, y T ). B. ANFIS Adaptive Neural Fuzzy Inference System (ANFIS) [21] is an FNN structure based on TSK fuzzy inference system model [22]. It processes the data in a fuzzy space. The adaptive and learning abilities are realized by the ANN structure. The architecture of a typical ANFIS is shown in Fig. 1. As shown in Fig. 1, there are six layers in the ANFIS architecture (six-layer feed-forward neural network). The example ANFIS architecture has two inputs x 1,x 2, and one output y. Each input is fuzzified by two fuzzy sets A 1,A 2,B 1,B 2 corresponding to four membership functions. Layer 1 is the input layer. No specific function is applied. O 1 i = x i (1) Layer 2 is the fuzzification layer. Inputs from Layer 1 are converted into fuzzy numbers by the membership functions μ( ). The membership functions are usually chosen to be non-linear bell shaped functions. O 2 i = μ Ai (x i ) (2) Layer 3 is the rule layer. Each fuzzified input undergoes an IF-ELSE rule and combined by multiplication (T- NORM). O 3 i = μ Ai (x 1 ) μ Bi (x 2 ) (3) Layer 4 is the normalization layer. The outputs of the rule layers are normalized by the firing strength (the weights) and then are combined by summation (S-NORM). Oi 4 = Oi 3 Oj 3 (4) j Layer 5 is the defuzzification layer. The defuzzification layer formed a series of linear equations based on the inputs and the outputs from the normalization layer. O 5 i = O 4 i f i = O 4 i (p i x 1 + q i x 2 + r i ) (5) Layer 6 is the output layer. The linear equations are summed to form a single linear equation. The output of the ANFIS is the output of the linear equation. O 6 = i O 5 i (6) Fig. 2: A Linearly Separable SVM Case The learning algorithm of ANFIS comprises two parts, one is the back-propagation gradient descent method which is usually used in ANN and the other is the least-square estimator which is usually used in linear statistics. The gradient descent method is mainly used for updating the antecedent parameters and the least-square estimator is mainly used for updating the consequent parameters. The users can choose different combinations of learning algorithms, such as gradient descent only, gradient descent plus one pass of least square estimator, etc. based on the design objectives (speed v.s. accuracy). However, ANFIS has a fixed architecture for a particular dataset. The learning algorithm can only change the parameters of the membership functions and the coefficients of the output linear equations but not the fuzzy rules. This limitation reduces the adaptability of ANFIS for complex datasets with large number of attributes and samples. In addition, the output of ANFIS is a floating number and therefore, the output should be round off to the nearest values of y for final decision. C. SVM d-dimensional samples can be mapped on a d-dimensional hyperspace. The basic idea of SVM is to draw the best hyperplane in the hyperspace to separate the samples according to their corresponding labels y. The hyperplane is optimum when the margin of separation between samples with different labels is maximized. The samples sitting on the margins are called support vectors [14]. Suppose the samples are linearly separable, then a hyperplane in the hyperspace can be defined as: wx i + b (7) where w is a vector perpendicular to the hyperplane and b is abias. For a linearly-separable case, two classes are separated by wx + b =0(a solid line in Fig. 2) and there is also a region bounded by wx + b =1and wx + b = 1 (two dashed lines in Fig. 2) in between the two classes so that a soft margin is created. 919

4 2 The distance between two margins is w. So, the best hyperplane is obtained by solving: subject to min 1 2 wt w (8) y i (wx i + b) 1 For linearly un-separable samples, certain samples will be misclassified. They are represented by non-negative variables ξ i such that: y i (wx i + b) 1 ξ i. (9) Therefore, the optimization problem becomes: min 1 n 2 wt w + C ξ i (10) subject to i=1 y i (wφ(x i )+b) 1 ξ i and ξ i 0 where C is the penalty factor to weight the misclassification and φ( ) is a nonlinear function to map the input to higher dimensional space. If C is large, the solution converges towards the solution obtained by the optimal separating hyperplane with more tolerance of misclassification. If C is small, the solution converges to one where the margin maximization dominates [23]. The optimization equation (10) can be represented more formally by Lagrange method as shown: n max L(α) = α 1 n n α i α j y i y j φ(x i ) T φ(x j ) (11) 2 subject to i=1 i=1 j=1 n α i y i =0and 0 α i C i=1 where the term φ(x i ) T φ(x j ) can be represented by a kernel function K(x i,x j ). The advantage of the kernel function is to save the computation effort of computing the higher-dimensional mapping of x i and x j separately by computing the internal product of x i and x j first and then map to the specific higher-dimensional space. There are several popular kernel functions such as polynomial, radial basis function (RBF) and sigmoid, etc. The users can choose a proper kernel with proper kernel parameters for a specific dataset. D. ELM ELM has a neural network like structure, but it is advantageous over neural network because the hidden layer needs not be neuron alike and needs not be tuned [17]. The output function of an ELM binary classifier is a function of the hidden layer feature space h( ) and the output weights β as shown: f L (x) =sign(h(x)β) (12) This function is equivalent to mapping a d-dimensional input datatoanl-dimensional hidden layer feature space h and then scale by the output weights of L hidden neurons. The output weights β is determined by the hidden layer output matrix H = [h(x 1 )...h(x T )] as well as the predefined labels y T shown: { (H T H) 1 H T y T H T H is nonsingular, β = H(HH T ) 1 y T HH T (13) is nonsingular. To minimize the error ŷ S y S = Hβ y S and β, Lagrange multiplier method is used and finally the equation of binary ELM classifier is shown: f(x) =sign(h(x)h T ( I C + HHT ) 1 )y T ) (14) ELM can be easily adapted to multi-class classification and regression. The advantage of ELM is the fast training speed due to its neural network alike structure. E. Bagging Usually T is small and with a large variance. The bootstrap phase in bagging under-samples data with replacement from T according to the uniform probability distribution and creates B independent datasets (bags) T (1)...T (B). The purpose of making B bags is to reduce the variance of the original dataset T. The aggregation phase in bagging aggregates the decisions drawn from each bag by voting or other equivalent methods to reduce bias [7], [16]. Therefore, bagging can reduce variance and in the mean time reduce bias. The schematic diagram of bagging is shown in Fig. 3. The reasons why bagging over performs a single classifier are listed in [24]. There are three reasons: statistical, computational and expressiveness. The statistical reason is that the learning in bagging aggregates different possible hypotheses to obtain the best hypotheses, which is advantageous over a single hypothesis. The computational reason is that the learning algorithm in bagging has less chance to be trapped in a local minimum than the single classifier. The expressive reason is that the bagging classifier can expand the search space to higher dimensional space which improves the expressiveness from the single classifier. F. Random Forest Random forest is developed from decision trees. The basic idea of decision trees is to classify a sample by binary partition. At each node of the tree, a test is applied to the input sample and a decision (the direction of the preceding branches) is made by evaluating the outcome of the test. The test terminates at a leaf node (no more preceding branches) and the final classification is made. Decision tree is fast, easy to understand and insensitive to a missing data [25]. Random forest improves the decision trees by bagging and random subspace method. A random forest consists of a set of decision tree classifiers where each classifier is trained by independent identically distributed random vectors and the finial decision is aggregated by a majority vote [2]. 920

5 Testing Data Bag (1) Classifier (1) Prediction (1) Training Data Sampling with Replacement Bag (2) Bag (3) Classifier (2) Classifier (3) Prediction (2) Prediction (3) Majority Vote Aggregated Prediction Bag (B) Classifier (B) Prediction (B) Fig. 3: A Schematic Diagram of Bagging The advantage of using bagging for creating the decision trees is to increase the diversity of the base classifiers. In each node, random forest employs random subspace method for node splitting, which adds one more degree of randomness to the base classifier. Therefore, the author claimed that the random forest is a promising ensemble method [2]. III. ENSEMBLE CLASSIFIERS This section shows the detailed algorithms of the bagging ANFIS, the bagging SVM, the bagging ELM and the random forest. The generalized algorithm of the bagging-based ensemble classifier is shown in Fig. 4. There are four nested loops in the algorithm: the outer loop is to repeat the classification process to get an average testing result; the first inner loop is to train a classification model for each bag; and the second and third inner loop is to find the best parameters for the single classifier for that particular bag based on a k-fold CV. The testing phase includes a majority vote to obtain the final decision. For the bagging ANFIS, the best performing ANFIS model is obtained by k-fold CV only because ANFIS will learn and update the parameters by its back-propagation and least-square estimator. For the bagging SVM, the parameter C and kernel parameter γ (for RBF kernel) will be tuned by a grid search. For the bagging ELM (sigmoid kernel), the parameter C and the number of hidden layer nodes L will be tuned by a grid search as well. In order not to overfit each base classifier, the parameter is searched once before bootstrapping. If the grid search is implemented for each base classifier after bootstrapping, the base classifiers will have high chance to be overfitted to the training data. The random forest is pre-compiled from a R package [11]. There are several parameters can be tuned in the program but the only parameter to be tuned in the experiment is the number of bags so that there will be a fair comparison against the other three ensemble classifiers. The number of stratified testing S is set to be 50 in the empirical analysis; the number of bags B is set to 20, 50 and 80 for three different experiments; and the ratio of the training data and the testing data is n T : n S = 70% : 30%. The training epoches of ANFIS is set to 10. The grid search of the bagging SVM has two dimensions: C {2 i,i = 5, 4,...,5 and γ {2 i,i = 4, 3,...,4. The grid search of the bagging ELM has two dimensions as well: C {2 i,i = 5, 4,...,5 and L {10, 20,...,1000. Inputs: D: Dataset y: Labels B: Number of bags S: Number of Stratified Testing C, σ, L: Parameters of the classifier FOR st =1to S, DO{ BootStrapping Phase: Randomly PARTITION D, y T, y T + S, y S, where T : S = n T : n S ; BOOTSTRAP T, y T T (i), y (i), i =1...B; Training Phase: FOR each parameters, DO{ FOR each fold in k-fold CV, DO{ M = Train(T, y T, P arameters); SAVE the best-performed parameters P best ; FOR i =1to B, DO{ M (i) = Train(T (i), y (i) T,P best); Testing Phase: FOR i =1to B, DO{ ŷ (i) S = Test(M (i), S, y S ); Aggregation Phase: Majority Vote ŷ (i) S ŷ S, i =1...B; A (st) = ŷ S y S y S ; RETURN A and σ A ; Fig. 4: The Bagging-based Ensemble Classification Algorithm IV. EMPIRICAL ANALYSIS The bagging ANFIS, the bagging SVM, the bagging ELM and the random forest were evaluated using thirteen UCI T 921

6 binary classification datasets [26] and shown in Table I. The ANFIS algorithm is the standard release in the Matlab Fuzzy Logic Toolbox based on [21]; The SVM algorithm is build from a package called libsvm [27]; The ELM algorithm is obtained from [17] and the random forest algorithm is obtained from [28] that is converted from the R package reported in [11]. In order to speed up the computation of ANFIS, three attributes were chosen for each datasets under ANFIS and the bagging ANFIS evaluation. The attributes were chosen based on one-pass single attribute ANFIS classification with ascending classification error. TABLE I: Description of Datasets Dataset No. of Samples No. of Attributes australian (credit) breast (wisconsin) bright credit diabetes dimdata heart hepatitis ionosphere liver mammo(graphic) mushroom pima The tuned parameters for SVM and ELM averaged over 5-fold CV are shown in Table II. The tuned parameters of SVM did not vary much among each fold of CV whereas the tuned parameters of ELM, especially L, had a large standard deviation among each fold of CV. TABLE II: Parameters of SVM and ELM Dataset SVM ELM log 2 C log 2 γ log 2 C L australian 0[0] -2[0] -1.8[3.56] 364[376.94] breast 0.2[0.45] -2[0] 3.8[0.84] 650[409.39] bright 3.8[1.79] -2[0] 2.6[2.79] 880[188.28] credit 0.2[0.45] -2[0] 0[3.08] 424[310.45] diabetes 0[0] -1.8[0.45] 1.4[2.88] 348[341.20] dimdata 1[0] -2[0] 0[1.87] 730[236.43] heart 1.2[2.17] -2[0] -2.8[3.49] 586[386.43] hepatitis 5[0] -0.4[2.19] 1.8[3.35] 294[191] ionosphere 4[2.24] -2[0] 1.6[1.82] 590[389.55] liver 1.2[0.84] -1.2[0.84] 0.8[5.31] 500[445.93] mammo 2.2[1.3] -2[0] 4.4[1.34] 412[470.61] mushroom 5[0] 2[0] -2[2.92] 514[439.29] pima 0.2[0.45] -2[0] -0.6[2.97] 258[394.23] The base classifier accuracies (averaged over 80 base classifiers) before majority vote is shown in Table III. Compared with the aggregated accuracy shown in Table VI, the average of base classifier accuracies is generally 1 5% less. For example, the improvements of the average testing accuracies for ionosphere from before to after aggregation are 8.03%, 2.54%, 2.26% and 7.80% for bagging ANFIS, bagging SVM, bagging ELM and random forest, respectively. In terms of diversity, the standard deviations of the base classifiers are generally more than unity. These two tables show that bagging indeed increases the accuracy by aggregating base classifiers decisions. For example, the standard deviations of the base classifiers testing accuracies of ionosphere are 1.94, 2.27, 3.35 and 3.15 for bagging ANFIS, bagging SVM, bagging ELM and random forest, respectively. TABLE III: Base Classifier Accuracy (%, μ[σ]) of Four Ensemble Classifiers with 80 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 83.29[0.49] 84.36[1.67] 85.20[2.41] 79.96[4.43] breast 92.77[2.00] 95.87[1.45] 95.01[1.26] 93.76[1.23] bright 97.10[0.38] 99.03[0.31] 99.06[0.38] 97.59[0.77] credit 85.12[1.69] 82.96[2.32] 81.55[2.53] 79.44[3.68] diabetes 73.86[1.22] 74.67[2.30] 72.98[3.40] 67.51[3.35] dimdata 88.34[0.48] 95.53[0.51] 95.60[0.54] 92.01[0.88] heart 80.68[2.04] 77.27[5.23] 80.86[5.14] 76.63[4.37] hepatitis 80.20[4.45] 79.21[1.85] 78.09[4.70] 76.36[5.98] ionosphere 81.02[1.94] 92.15[2.27] 85.76[3.35] 85.17[3.15] liver 59.92[3.55] 68.17[4.11] 67.21[4.19] 61.72[5.44] mammo 81.40[1.61] 82.20[1.42] 81.03[2.09] 78.30[2.00] mushroom 74.69[0.75] 88.90[0.75] 88.99[0.76] 83.04[3.11] pima 75.14[1.34] 74.79[2.21] 75.93[2.58] 68.36[2.75] Tables IV, V and VI show the testing accuracies of four ensemble classifiers with 20 bags, 50 bags and 80 bags, respectively. The testing accuracies are averaged over 50 stratified testings. Generally, the testing accuracies of the four bagging-based ensemble classifiers increase when the number of bags increases. This is because that the data diversity is related to the number of base classifiers [24]. For the four bagging-based ensemble classifiers with the same number of bags, the bagging SVM is the most favorable ensemble classifier among them. When there are 20 bags, the bagging SVM has the best predicted accuracies on 7 datasets out of 13; when there are 50 bags, the bagging SVM has the best predicted accuracies on 6 datasets out of 13; and when there are 80 bags, the bagging SVM has the best predicted accuracies on 8 datasets out of 13. The second most favorable ensemble classifier is the random forest. When there are 20 bags, the random forest has the best predicted accuracies on 3 datasets out of 13; when there are 50 bags, the random forest has the best predicted accuracies on 5 datasets out of 13; and when there are 80 bags, the random forest has the best predicted accuracies on 4 datasets out of 13. The four ensemble classifiers are ranked by their testing accuracies of each dataset for 20 bags, 50 bags and 80 bags, respectively. The overall rank is calculated by the summation of the individual ranks. The overall rank of the four ensemble classifiers for different bags is: bagging SVM < random forest < bagging ELM < bagging ANFIS. V. CONCLUSION This paper has discussed four bagging-based ensemble classifiers, namely the ensemble ANFIS, the ensemble SVM, the ensemble ELM and the random forest. The discussion of the four ensemble classifiers is novel because it is not reported in the existing literature. The ensemble classifiers have been evaluated on thirteen UCI binary class datasets with different bagging numbers (20, 50 and 80). The empirical results have 922

7 TABLE IV: Testing Accuracy (%, μ[σ](rank)) offouren- semble Classifiers with 20 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 86.73[2.23] 85.57[1.99] 85.19[2.33] 86.66[1.82] (1) (3) (4) (2) breast 96.07[1.05] 96.40[1.09] 96.25[1.24] 96.51[1.12] (4) (2) (3) (1) bright 98.95[0.29] 99.32[0.29] 99.20[0.29] 99.12[0.33] credit 84.54[2.18] 84.77[2.16] 85.14[2.32] 86.52[2.07] diabetes 76.22[1.90] 76.61[2.27] 75.53[2.57] 75.75[2.14] (2) (1) (4) (3) dimdata 94.92[0.49] 96.23[0.51] 95.71[0.57] 95.44[0.56] heart 78.36[3.74] 81.89[2.61] 79.89[2.88] 81.07[2.89] hepatitis 76.00[4.21] 82.30[4.05] 80.04[4.82] 82.00[4.14] ionosphere 88.13[2.91] 94.57[1.77] 88.48[3.04] 93.30[1.84] liver 65.96[3.55] 69.34[4.22] 67.61[4.19] 69.90[3.64] (4) (2) (3) (1) mammo 83.19[1.89] 82.53[2.05] 81.75[1.80] 81.88[1.86] (1) (2) (4) (3) mushroom 78.66[1.12] 89.10[0.50] 89.12[0.52] 88.45[0.77] (4) (2) (1) (3) pima 75.02[2.54] 76.06[2.07] 75.56[2.85] 75.43[2.41] Rank (44) (21) (36) (29) TABLE VI: Testing Accuracy (%, μ[σ](rank)) of Four Ensemble Classifiers with 80 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 86.33[2.10] 85.52[2.04] 85.52[1.92] 86.78[2.06] (2) (4) (3) (1) breast 95.91[1.01] 96.56[1.12] 96.58[1.10] 96.77[0.95] bright 98.86[0.37] 99.38[0.24] 99.27[0.23] 99.17[0.32] credit 84.35[1.88] 85.08[1.81] 85.27[1.98] 86.94[1.64] diabetes 76.92[2.41] 77.36[1.92] 76.65[2.07] 76.98[1.64] (3) (1) (4) (2) dimdata 94.84[0.54] 96.29[0.48] 96.00[0.52] 95.72[0.49] heart 78.82[3.64] 81.57[3.13] 80.32[3.64] 80.89[3.05] hepatitis 75.13[4.08] 83.61[3.17] 81.30[3.83] 82.48[3.61] ionosphere 89.05[2.53] 94.69[1.89] 88.02[2.89] 92.97[2.06] (3) (1) (4) (2) liver 66.91[4.13] 69.96[4.20] 70.95[4.24] 71.83[4.43] mammo 83.48[2.08] 82.59[2.06] 82.03[2.17] 81.90[1.84] (1) (2) (3) (4) mushroom 79.04[0.68] 89.27[0.52] 89.25[0.57] 88.90[0.57] pima 75.86[2.14] 77.03[1.92] 76.04[2.41] 76.10[1.97] Rank (45) (23) (35) (27) TABLE V: Testing Accuracy (%, μ[σ](rank)) of Four Ensemble Classifiers with 50 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 87.10[1.78] 86.25[1.91] 86.19[2.07] 87.52[2.05] (2) (3) (4) (1) breast 95.83[1.23] 96.55[1.05] 96.62[1.07] 96.73[1.10] bright 98.83[0.32] 99.35[0.23] 99.24[0.26] 99.15[0.28] credit 84.45[2.35] 85.50[2.14] 85.74[2.17] 86.91[1.85] diabetes 76.83[2.33] 77.30[2.42] 76.33[3.07] 76.52[2.31] (2) (1) (4) (3) dimdata 94.99[0.49] 96.31[0.35] 95.96[0.42] 95.77[0.48] heart 77.77[3.56] 81.02[3.65] 79.84[4.19] 80.52[3.97] hepatitis 75.26[4.79] 82.22[3.31] 81.87[4.23] 82.70[3.83] (4) (2) (3) (1) ionosphere 88.25[2.42] 94.82[1.92] 88.29[2.36] 93.35[2.01] liver 67.22[3.65] 70.02[4.22] 69.38[4.55] 70.87[3.83] (4) (2) (3) (1) mammo 83.53[1.78] 82.61[1.85] 81.84[2.08] 81.85[1.75] (1) (2) (4) (3) mushroom 79.00[0.82] 89.21[0.59] 89.27[0.55] 88.91[0.70] (4) (2) (1) (3) pima 75.37[2.38] 76.38[2.47] 75.97[2.89] 75.73[2.61] Rank (45) (23) (35) (27) shown that all four ensemble classifiers have returned high testing accuracy. The effect of data diversity has been reflected from the comparison of base classifier accuracies and the ensemble classifier accuracies. In addition, the effect of data diversity has also been reflected from the number of bags because the results have shown that the testing accuracies increase with increasing of the base classifiers. Among the four ensemble classifiers, the ensemble SVM has been identified to be the most favorable ensemble classifier and the random forest has been identified to be the second most favorable one. The bagging ELM and the bagging ANFIS have been identified to be the third and fourth most favorable classifiers. REFERENCES [1] M. Bhardwaj, T. Gupta, T. Grover, and V. Bhatnagar, An efficient classifier ensemble using SVM, in International Conference on Methods and Models in Computer Science, Dec. 2009, pp [2] L. Breiman, Random forests, Machine Learning, vol. 45, pp. 5 32, [3] D. Opitz and R. Maclin, Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research, vol. 11, pp , [4] L. Yu, S. Wang, and K. K. Lai, Investigation of diversity strategies in SVM ensemble learning, in International Conference on Natural Computation, vol. 7, Oct. 2008, pp [5] R. Lei, X. Kong, and X. Wang, An ensemble SVM using entropy-based attribute selection, in Chinese Control and Decision Conference, May 2010, pp [6] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in International Joint Conference on Artificial Intelligence, vol. 2, no. 12, 1995, pp [7] L. Breiman, Bagging predictors, Machine Learning, vol. 24, pp , [8] Y. Freund and R. E. Schapire, A short introduction to boosting, Journal of Japanese Scociety for Artificial Intelligence, vol. 14, no. 5, pp , Sept [9] S. Kurogi and K. Harashima, Improving generalization performance of bagging ensemble via Bayesian approach, in IEEE International Symposium of Computational Intelligence in Robotics and Automation, Mar. 2009, pp

8 [10] S. Clemenon, M. Depecker, and N. Vayatis, Bagging ranking trees, in International Conference on Machine Learning and Applications, Dec. 2009, pp [11] A. Liaw and M. Wiener, Classification and regression by randomforest, RNews, vol. 2/3, pp , Dec [12] R. Chen and J. Yu, An improved bagging neural network ensemble algorithm and its application, in International Conference on Natural Computation, Aug. 2007, pp [13] J. Canul-Reich, L. Shoemaker, and L. O. Hall, Ensembles of fuzzy classifiers, in IEEE International Fuzzy Systems Conference, July 2007, pp [14] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol. 20, no. 3, pp , [15] F. M. Zaman and H. Hirose, Double SVMbagging: A new double bagging with support vector machine, Engineering Letters, vol. 17, no. 2, pp , [16] G. Valentini and T. G. Dietterich, Low bias bagged support vector machines, in International Conference on Machine Learning, 2003, pp [17] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme learning machine for regression and multi-class classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. PP, no. 99, pp. 1 17, Oct [18] N. Liu and H. Wang, Ensemble based extreme learning machine, IEEE Signal Processing Letters, vol. 17, no. 8, pp , Aug [19] Y. Liu, X. Xu, and C. Wang, Simple ensemble of extreme learning machine, in International Congress on Image and Signal Processing, Oct. 2009, pp [20] H. Tian and B. Meng, A new modeling method based on bagging elm for day-ahead electricity price prediction, in IEEE International Conference on Bio-Inspired Computing: Theories and Applications, Sept. 2010, pp [21] J.-S. R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp , May [22] M. Sugeno and G. Kang, Structure identification of fuzzy model, Fuzzy Sets and Systems, vol. 28, pp , [23] J. Suyken and C. Alzate, Support vector machines and kernel methods: new approaches in unsupervised learning, in Tutorial WCCI 2010, July [24] T. G. Dietterich, Machine-learning research: Four current directions, The AI Magazine, vol. 18, no. 4, pp , [25] W.-Y. Loh, Classification and regression trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 1, pp , Jan [26] [Online]. Available: [27] C.-C. Chang and C.-J. Lin, LIBSVM : a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 1 27, [Online]. Available: cjlin/libsvm [28] A. Jaiantilal. (2009) Classification and regression by randomforest-matlab. [Online]. Available: randomforest-matlab 924

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Bagging and Boosting Algorithms for Support Vector Machine Classifiers Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima

More information

The Effects of Outliers on Support Vector Machines

The Effects of Outliers on Support Vector Machines The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

CGBoost: Conjugate Gradient in Function Space

CGBoost: Conjugate Gradient in Function Space CGBoost: Conjugate Gradient in Function Space Ling Li Yaser S. Abu-Mostafa Amrit Pratap Learning Systems Group, California Institute of Technology, Pasadena, CA 91125, USA {ling,yaser,amrit}@caltech.edu

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty) Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel

More information

Chap.12 Kernel methods [Book, Chap.7]

Chap.12 Kernel methods [Book, Chap.7] Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first

More information

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM CHAPTER-7 MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM 7.1 Introduction To improve the overall efficiency of turning, it is necessary to

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

European Journal of Science and Engineering Vol. 1, Issue 1, 2013 ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM IDENTIFICATION OF AN INDUCTION MOTOR

European Journal of Science and Engineering Vol. 1, Issue 1, 2013 ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM IDENTIFICATION OF AN INDUCTION MOTOR ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM IDENTIFICATION OF AN INDUCTION MOTOR Ahmed A. M. Emam College of Engineering Karrary University SUDAN ahmedimam1965@yahoo.co.in Eisa Bashier M. Tayeb College of Engineering

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

In the Name of God. Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System

In the Name of God. Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System In the Name of God Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System Outline ANFIS Architecture Hybrid Learning Algorithm Learning Methods that Cross-Fertilize ANFIS and RBFN ANFIS as a universal

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias

More information

Improving Classification Accuracy for Single-loop Reliability-based Design Optimization

Improving Classification Accuracy for Single-loop Reliability-based Design Optimization , March 15-17, 2017, Hong Kong Improving Classification Accuracy for Single-loop Reliability-based Design Optimization I-Tung Yang, and Willy Husada Abstract Reliability-based design optimization (RBDO)

More information

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines Trung Le, Dat Tran, Wanli Ma and Dharmendra Sharma Faculty of Information Sciences and Engineering University of Canberra, Australia

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

Model combination. Resampling techniques p.1/34

Model combination. Resampling techniques p.1/34 Model combination The winner-takes-all approach is intuitively the approach which should work the best. However recent results in machine learning show that the performance of the final model can be improved

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Nelder-Mead Enhanced Extreme Learning Machine

Nelder-Mead Enhanced Extreme Learning Machine Philip Reiner, Bogdan M. Wilamowski, "Nelder-Mead Enhanced Extreme Learning Machine", 7-th IEEE Intelligent Engineering Systems Conference, INES 23, Costa Rica, June 9-2., 29, pp. 225-23 Nelder-Mead Enhanced

More information

Version Space Support Vector Machines: An Extended Paper

Version Space Support Vector Machines: An Extended Paper Version Space Support Vector Machines: An Extended Paper E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, G.I. Nalbantov 2, and S. Vanderlooy Abstract. We argue to use version spaces as an approach to reliable

More information

Robust 1-Norm Soft Margin Smooth Support Vector Machine

Robust 1-Norm Soft Margin Smooth Support Vector Machine Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

ENSEMBLE RANDOM-SUBSET SVM

ENSEMBLE RANDOM-SUBSET SVM ENSEMBLE RANDOM-SUBSET SVM Anonymous for Review Keywords: Abstract: Ensemble Learning, Bagging, Boosting, Generalization Performance, Support Vector Machine In this paper, the Ensemble Random-Subset SVM

More information

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

Application of Support Vector Machine In Bioinformatics

Application of Support Vector Machine In Bioinformatics Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Module 4. Non-linear machine learning econometrics: Support Vector Machine Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Dynamic Ensemble Construction via Heuristic Optimization

Dynamic Ensemble Construction via Heuristic Optimization Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

An Endowed Takagi-Sugeno-type Fuzzy Model for Classification Problems

An Endowed Takagi-Sugeno-type Fuzzy Model for Classification Problems Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Soft computing algorithms

Soft computing algorithms Chapter 1 Soft computing algorithms It is indeed a surprising and fortunate fact that nature can be expressed by relatively low-order mathematical functions. Rudolf Carnap Mitchell [Mitchell, 1997] defines

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

A Practical Guide to Support Vector Classification

A Practical Guide to Support Vector Classification A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Kernel Combination Versus Classifier Combination

Kernel Combination Versus Classifier Combination Kernel Combination Versus Classifier Combination Wan-Jui Lee 1, Sergey Verzakov 2, and Robert P.W. Duin 2 1 EE Department, National Sun Yat-Sen University, Kaohsiung, Taiwan wrlee@water.ee.nsysu.edu.tw

More information

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA 1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Combining SVMs with Various Feature Selection Strategies

Combining SVMs with Various Feature Selection Strategies Combining SVMs with Various Feature Selection Strategies Yi-Wei Chen and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan Summary. This article investigates the

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

Software Documentation of the Potential Support Vector Machine

Software Documentation of the Potential Support Vector Machine Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen

More information

CS 8520: Artificial Intelligence

CS 8520: Artificial Intelligence CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software

More information

SELF-ORGANIZING methods such as the Self-

SELF-ORGANIZING methods such as the Self- Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Maximal Margin Learning Vector Quantisation Trung Le, Dat Tran, Van Nguyen, and Wanli Ma Abstract

More information

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

More information

NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE)

NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE) NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE) University of Florida Department of Electrical and

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

An Empirical Comparison of Spectral Learning Methods for Classification

An Empirical Comparison of Spectral Learning Methods for Classification An Empirical Comparison of Spectral Learning Methods for Classification Adam Drake and Dan Ventura Computer Science Department Brigham Young University, Provo, UT 84602 USA Email: adam drake1@yahoo.com,

More information

Statistical foundations of machine learning

Statistical foundations of machine learning Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Machine Learning Group Computer Science Department mlg.ulb.ac.be Some algorithms for nonlinear modeling Feedforward neural network

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

More information

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used

More information

Data-driven Kernels for Support Vector Machines

Data-driven Kernels for Support Vector Machines Data-driven Kernels for Support Vector Machines by Xin Yao A research paper presented to the University of Waterloo in partial fulfillment of the requirement for the degree of Master of Mathematics in

More information

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT

More information

Probabilistic Approaches

Probabilistic Approaches Probabilistic Approaches Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Independence

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Applied Statistics for Neuroscientists Part IIa: Machine Learning Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem

More information

An Ensemble of Classifiers using Dynamic Method on Ambiguous Data

An Ensemble of Classifiers using Dynamic Method on Ambiguous Data An Ensemble of Classifiers using Dynamic Method on Ambiguous Data Dnyaneshwar Kudande D.Y. Patil College of Engineering, Pune, Maharashtra, India Abstract- The aim of proposed work is to analyze the Instance

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

Combined Weak Classifiers

Combined Weak Classifiers Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract

More information

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press,  ISSN Comparative study of fuzzy logic and neural network methods in modeling of simulated steady-state data M. Järvensivu and V. Kanninen Laboratory of Process Control, Department of Chemical Engineering, Helsinki

More information

Rule Based Learning Systems from SVM and RBFNN

Rule Based Learning Systems from SVM and RBFNN Rule Based Learning Systems from SVM and RBFNN Haydemar Núñez 1, Cecilio Angulo 2 and Andreu Català 2 1 Laboratorio de Inteligencia Artificial, Universidad Central de Venezuela. Caracas, Venezuela hnunez@strix.ciens.ucv.ve

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

Lecture 20: Bagging, Random Forests, Boosting

Lecture 20: Bagging, Random Forests, Boosting Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively

More information

CS294-1 Final Project. Algorithms Comparison

CS294-1 Final Project. Algorithms Comparison CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we

More information

Bioinformatics - Lecture 07

Bioinformatics - Lecture 07 Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles

More information

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier Rough Set Approach to Unsupervised Neural based Pattern Classifier Ashwin Kothari, Member IAENG, Avinash Keskar, Shreesha Srinath, and Rakesh Chalsani Abstract Early Convergence, input feature space with

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information