Empirical Comparison of Bagging-based Ensemble Classifiers
|
|
- Jocelin Hall
- 6 years ago
- Views:
Transcription
1 Empirical Comparison of Bagging-based Ensemble Classifiers Ren Ye, P.N. Suganthan School of Electrical and Electronic Engineering Nanyang Technological University, Singapore {re0003ye, Abstract This paper compares empirically four baggingbased ensemble classifiers, namely the ensemble adaptive neurofuzzy inference system (ANFIS), the ensemble support vector machine (SVM), the ensemble extreme learning machine (ELM) and the random forest. The comparison of these four ensemble classifiers is novel because it has not been reported in the existing literature. The classifiers are evaluated with thirteen binary class datasets and the empirical results show that the ensemble methods employed in the four ensemble classifiers boost the testing accuracy by 1-5% on average from their base classifiers. In addition, the testing accuracy can be improved by increasing the number of base classifiers. The empirical results also show that the bagging SVM is the most favorable ensemble classifier among them. I. INTRODUCTION Classification is a common problem in machine learning, which is to group the new observations/samples based on a training set whose identifications (labels) are provided (supervised learning) [1] or unprovided (unsupervised learning, not discussed in the paper) [2]. Classification can be applied in various areas such as medicine, manufacturing, bioinformatics, finance, etc. There are several well-developed tools for classification such as Bayesian, K-nearest neighbors, decision trees, fuzzy neural networks (FNN), support vector machines (SVM) and so on. There are two phases to design a classifier: (i) supervised learning (unsupervised learning is not discussed in the paper) and (ii) evaluation of generalization performance. However, due to the variance in the training data, the bias of the attributes, etc., different accuracies may be obtained with different classifiers, or same classifiers with different parameters. In other words, there is no universal classifier for all types of data distributions [1]. In order to improve the performance of classifiers, ensemble learning can be used. Ensemble learning is to use multiple models to achieve a better predictive performance than that could be obtained from any of the constituent models [3]. There are two categories of ensemble learning, one is competitive ensemble learning and the other is cooperative ensemble learning [4]. The base classifiers work individually on one set of a problem and vote for the final decision is competitive ensemble learning whereas the base classifiers work associatively with each other and aggregate the decisions to the final decision is cooperative ensemble learning. The advantage of ensemble learning is the improvement of the accuracy [5]. The key determinant to better accuracy and generalization is ensemble diversity. There are mainly three strategies for ensemble diversity: (i) kernel diversity, (ii) parameter diversity and (iii) data diversity [4]. Kernel diversity: Kernel function converts the data space to a non-linear space so that a linear decision boundary can be used in the non-linear space to classify the data. Kernel functions play an important role in SVM and ELM. Varying the kernel functions for different ensembles will diversify the base classifiers. Parameter diversity: Different parameters will vary the performance of classifiers. For example, the parameters that affect the performance of fuzzy neural network (FNN) are the membership function, number of nodes, rules, aggregation methods, etc. and for SVM, the parameters affects the performance are the margin parameter and the parameters associated with the kernel functions. Data diversity: For a finite dataset, different data perturbations may lead to significantly different classification configurations. Data diversity minimizes the bias and variance and increases the generalization capability of the classifiers. Cross validation (CV) is a commonly used method in machine learning to avoid overfitting [6]. Other advanced data diversity algorithms such as bootstrap aggregation (bagging) [7] and boosting [8] are widely used in ensemble methods. The ensemble classifiers can make use one or more of the strategies mentioned above in the actual implementation. This paper will focus on data diversity, especially on bagging-based ensemble classifiers. Bagging is a method for generating multiple datasets for a classifier and use the classification results of each datasets to obtain an aggregated classifier. Bagging-based ensemble learning can be categorized as competitive learning and data diversity strategy. In the literature, there are a few bagging classifiers reported. The original bagging algorithm was applied to classification trees [7]. The bagging classification tree improved the performance of the classification tree significantly in terms of error reduction on eight datasets. In [9], the bagging algorithm was employed to a Bayesian predictor. In [10], the bagging algorithm was employed to a tree-based ranking predictor. The 917
2 empirical results showed that with bagging, both predictors outperformed the original predictors on most of the datasets. An improved bagging-based classification tree is reported in [2]. The algorithm is called Random Forest. Random forest introduces an additional mode of randomness to bagging, thereby increasing the data diversity. In addition to bootstrapping on training data, which is a standard procedure in bagging, random forest improves the construction of the classification trees. The normal way of constructing a classification tree is to split the node at the best split among all attributes whereas for random forest, the node is split using the best split among a randomly chosen subset of attributes [11]. The authors claimed that random forest yields good accuracy and robustness. Bagging can also be applied to neural networks and FNNs [12], [13]. In [13], an ensemble fuzzy classifier was created with bagging and adaptive neuro-fuzzy inference systems (ANFIS). Twenty datasets were evaluated and the bagging ANFIS outperformed the original ANFIS on most of the datasets in terms of accuracy. Support Vector Machine (SVM) is a machine learning tool for classification as well as regression [14]. The advantage of SVM is that it can be implemented in a high-dimensional linear attribute space but with a non-linear classifier by appropriately choosing a kernel function. The kernel function enables SVM to perform well for high-dimensional datasets. However, choosing a kernel function and the parameters of the kernel function usually undergoes an exhaustive search process [5]. Improperly chosen kernel functions and parameters will reduce the accuracy of the SVM. Two variations of bagging-based ensemble SVMs were reported in [15], [16]. A low bias bagged SVM was reported in [16]. The authors claimed that the bagging algorithm in the learning phase should be subjected to minimizing bias even with a trade-off on increasing variance. Therefore, the bagging algorithm was modified to a lobag algorithm. The lobag algorithm first estimated the parameters of an SVM that minimizes the bias and then based on the chosen parameters creating bags for the SVM classifiers. In [15], a double SVM bagging was presented. The authors modified the conventional bagging SVM with two phase classification: first phase was an SVM classification and second phase was a decision tree classification. Unlike the conventional bagging SVM classifiers, the reported double SVM bagging algorithm used the out-of-bag samples for training and combined the bagging sample to a probability matrix. In the testing phase, decision tree was employed and the final decision was voted by the confidence levels. Both ensemble SVMs returned a conclusion of improved performance. An ensemble SVM based on cooperative ensemble learning was presented in [1]. The data diversity was realized by iteratively learning from the training data and adding the false predicted data in the previous iteration as support vectors in the next iteration. The accuracy was determined by aggregating the decisions of all iterations. However, the authors noticed that for certain datasets, there were no convergence but oscillations and the oscillation problem is still un-solved. Extreme learning machine (ELM) is an effective machine learning algorithm based on single-hidden layer feedforward neural network (SLFN) architecture [17]. It is a fast algorithm and can provide good generalization. There are several ensemble ELM developed in the literature. A parameter diversity based ensemble ELM was reported in [18]. The ELM classifiers were created by k-fold CV. For each fold, there were a few training iterations and the parameters w and b of the output neuron feature space H were tuned for better performance in each training iteration. The best performed parameters were saved for each ELM classifiers. The testing results were decided by a majority vote. Another parameter diversity based ensemble ELM was presented in [19]. The authors created the ensembles by assigning different initial weight values to different ELM classifiers and followed by training with ELM. Therefore, a lot of ELM classifiers were created. Finally the testing result was aggregated by a majority voting. Both of the ensemble ELMs [18], [19] outperformed the original ELM in classification. A bagging-based ensemble ELM regressor was developed in [20]. The bagging algorithm was employed to bootstrap the training dataset to several sub-training datasets. ELM regressor was applied on each sub-training datasets and the error rates were calculated and stored as weights of vote. The testing results were decided by a weigh vote combining method. This paper will use the similar algorithm but with ELM classifier for empirical study on binary classification problem. The above-mentioned classifiers belong to FNN, kernel machine and decision trees. This paper will study four typical classifiers from these four categories, they are ANFIS of FNN, SVM of kernel machine, ELM of FNN and kernel machine and random forest of decision trees. The objective of this paper is to evaluate four bagging-based ensemble classifiers: the bagging ANFIS, the bagging SVM, the bagging ELM and the random forest. The performance of these four bagging ensemble methods are compared. The evaluation is based on thirteen UCI binary class datasets. To the authors knowledge, the comparison of the four ensemble classifiers has not been studied in the existing literature. The remaining of the paper is organized as follows: Section II elaborates the details of the bagging algorithm and four classifiers; Section III details the ensemble SVM, the ensemble ANFIS, the ensemble ELM and the random forest; Section IV discusses and compares the empirical results and finally Section V concludes the paper. II. CLASSIFIERS A. Preliminary Definition Let D = {x be a dataset with n samples and each sample has d attributes. Let T be a training set and S beatestingset, where D = T + S. There is an array of identification labels y { 1, 1 mapping to D and therefore y = y T +y S, where y T and y S correspond to the labels of T and S respectively. The general idea of a classifier is to find ŷ S based on a 918
3 wx+b=1 wx+b=0 Wx+b=-1 Fig. 1: A Typical ANFIS Architecture function ŷ S = t(s, y S ), where the function t( ) is determined by a learning function l(t, y T ). B. ANFIS Adaptive Neural Fuzzy Inference System (ANFIS) [21] is an FNN structure based on TSK fuzzy inference system model [22]. It processes the data in a fuzzy space. The adaptive and learning abilities are realized by the ANN structure. The architecture of a typical ANFIS is shown in Fig. 1. As shown in Fig. 1, there are six layers in the ANFIS architecture (six-layer feed-forward neural network). The example ANFIS architecture has two inputs x 1,x 2, and one output y. Each input is fuzzified by two fuzzy sets A 1,A 2,B 1,B 2 corresponding to four membership functions. Layer 1 is the input layer. No specific function is applied. O 1 i = x i (1) Layer 2 is the fuzzification layer. Inputs from Layer 1 are converted into fuzzy numbers by the membership functions μ( ). The membership functions are usually chosen to be non-linear bell shaped functions. O 2 i = μ Ai (x i ) (2) Layer 3 is the rule layer. Each fuzzified input undergoes an IF-ELSE rule and combined by multiplication (T- NORM). O 3 i = μ Ai (x 1 ) μ Bi (x 2 ) (3) Layer 4 is the normalization layer. The outputs of the rule layers are normalized by the firing strength (the weights) and then are combined by summation (S-NORM). Oi 4 = Oi 3 Oj 3 (4) j Layer 5 is the defuzzification layer. The defuzzification layer formed a series of linear equations based on the inputs and the outputs from the normalization layer. O 5 i = O 4 i f i = O 4 i (p i x 1 + q i x 2 + r i ) (5) Layer 6 is the output layer. The linear equations are summed to form a single linear equation. The output of the ANFIS is the output of the linear equation. O 6 = i O 5 i (6) Fig. 2: A Linearly Separable SVM Case The learning algorithm of ANFIS comprises two parts, one is the back-propagation gradient descent method which is usually used in ANN and the other is the least-square estimator which is usually used in linear statistics. The gradient descent method is mainly used for updating the antecedent parameters and the least-square estimator is mainly used for updating the consequent parameters. The users can choose different combinations of learning algorithms, such as gradient descent only, gradient descent plus one pass of least square estimator, etc. based on the design objectives (speed v.s. accuracy). However, ANFIS has a fixed architecture for a particular dataset. The learning algorithm can only change the parameters of the membership functions and the coefficients of the output linear equations but not the fuzzy rules. This limitation reduces the adaptability of ANFIS for complex datasets with large number of attributes and samples. In addition, the output of ANFIS is a floating number and therefore, the output should be round off to the nearest values of y for final decision. C. SVM d-dimensional samples can be mapped on a d-dimensional hyperspace. The basic idea of SVM is to draw the best hyperplane in the hyperspace to separate the samples according to their corresponding labels y. The hyperplane is optimum when the margin of separation between samples with different labels is maximized. The samples sitting on the margins are called support vectors [14]. Suppose the samples are linearly separable, then a hyperplane in the hyperspace can be defined as: wx i + b (7) where w is a vector perpendicular to the hyperplane and b is abias. For a linearly-separable case, two classes are separated by wx + b =0(a solid line in Fig. 2) and there is also a region bounded by wx + b =1and wx + b = 1 (two dashed lines in Fig. 2) in between the two classes so that a soft margin is created. 919
4 2 The distance between two margins is w. So, the best hyperplane is obtained by solving: subject to min 1 2 wt w (8) y i (wx i + b) 1 For linearly un-separable samples, certain samples will be misclassified. They are represented by non-negative variables ξ i such that: y i (wx i + b) 1 ξ i. (9) Therefore, the optimization problem becomes: min 1 n 2 wt w + C ξ i (10) subject to i=1 y i (wφ(x i )+b) 1 ξ i and ξ i 0 where C is the penalty factor to weight the misclassification and φ( ) is a nonlinear function to map the input to higher dimensional space. If C is large, the solution converges towards the solution obtained by the optimal separating hyperplane with more tolerance of misclassification. If C is small, the solution converges to one where the margin maximization dominates [23]. The optimization equation (10) can be represented more formally by Lagrange method as shown: n max L(α) = α 1 n n α i α j y i y j φ(x i ) T φ(x j ) (11) 2 subject to i=1 i=1 j=1 n α i y i =0and 0 α i C i=1 where the term φ(x i ) T φ(x j ) can be represented by a kernel function K(x i,x j ). The advantage of the kernel function is to save the computation effort of computing the higher-dimensional mapping of x i and x j separately by computing the internal product of x i and x j first and then map to the specific higher-dimensional space. There are several popular kernel functions such as polynomial, radial basis function (RBF) and sigmoid, etc. The users can choose a proper kernel with proper kernel parameters for a specific dataset. D. ELM ELM has a neural network like structure, but it is advantageous over neural network because the hidden layer needs not be neuron alike and needs not be tuned [17]. The output function of an ELM binary classifier is a function of the hidden layer feature space h( ) and the output weights β as shown: f L (x) =sign(h(x)β) (12) This function is equivalent to mapping a d-dimensional input datatoanl-dimensional hidden layer feature space h and then scale by the output weights of L hidden neurons. The output weights β is determined by the hidden layer output matrix H = [h(x 1 )...h(x T )] as well as the predefined labels y T shown: { (H T H) 1 H T y T H T H is nonsingular, β = H(HH T ) 1 y T HH T (13) is nonsingular. To minimize the error ŷ S y S = Hβ y S and β, Lagrange multiplier method is used and finally the equation of binary ELM classifier is shown: f(x) =sign(h(x)h T ( I C + HHT ) 1 )y T ) (14) ELM can be easily adapted to multi-class classification and regression. The advantage of ELM is the fast training speed due to its neural network alike structure. E. Bagging Usually T is small and with a large variance. The bootstrap phase in bagging under-samples data with replacement from T according to the uniform probability distribution and creates B independent datasets (bags) T (1)...T (B). The purpose of making B bags is to reduce the variance of the original dataset T. The aggregation phase in bagging aggregates the decisions drawn from each bag by voting or other equivalent methods to reduce bias [7], [16]. Therefore, bagging can reduce variance and in the mean time reduce bias. The schematic diagram of bagging is shown in Fig. 3. The reasons why bagging over performs a single classifier are listed in [24]. There are three reasons: statistical, computational and expressiveness. The statistical reason is that the learning in bagging aggregates different possible hypotheses to obtain the best hypotheses, which is advantageous over a single hypothesis. The computational reason is that the learning algorithm in bagging has less chance to be trapped in a local minimum than the single classifier. The expressive reason is that the bagging classifier can expand the search space to higher dimensional space which improves the expressiveness from the single classifier. F. Random Forest Random forest is developed from decision trees. The basic idea of decision trees is to classify a sample by binary partition. At each node of the tree, a test is applied to the input sample and a decision (the direction of the preceding branches) is made by evaluating the outcome of the test. The test terminates at a leaf node (no more preceding branches) and the final classification is made. Decision tree is fast, easy to understand and insensitive to a missing data [25]. Random forest improves the decision trees by bagging and random subspace method. A random forest consists of a set of decision tree classifiers where each classifier is trained by independent identically distributed random vectors and the finial decision is aggregated by a majority vote [2]. 920
5 Testing Data Bag (1) Classifier (1) Prediction (1) Training Data Sampling with Replacement Bag (2) Bag (3) Classifier (2) Classifier (3) Prediction (2) Prediction (3) Majority Vote Aggregated Prediction Bag (B) Classifier (B) Prediction (B) Fig. 3: A Schematic Diagram of Bagging The advantage of using bagging for creating the decision trees is to increase the diversity of the base classifiers. In each node, random forest employs random subspace method for node splitting, which adds one more degree of randomness to the base classifier. Therefore, the author claimed that the random forest is a promising ensemble method [2]. III. ENSEMBLE CLASSIFIERS This section shows the detailed algorithms of the bagging ANFIS, the bagging SVM, the bagging ELM and the random forest. The generalized algorithm of the bagging-based ensemble classifier is shown in Fig. 4. There are four nested loops in the algorithm: the outer loop is to repeat the classification process to get an average testing result; the first inner loop is to train a classification model for each bag; and the second and third inner loop is to find the best parameters for the single classifier for that particular bag based on a k-fold CV. The testing phase includes a majority vote to obtain the final decision. For the bagging ANFIS, the best performing ANFIS model is obtained by k-fold CV only because ANFIS will learn and update the parameters by its back-propagation and least-square estimator. For the bagging SVM, the parameter C and kernel parameter γ (for RBF kernel) will be tuned by a grid search. For the bagging ELM (sigmoid kernel), the parameter C and the number of hidden layer nodes L will be tuned by a grid search as well. In order not to overfit each base classifier, the parameter is searched once before bootstrapping. If the grid search is implemented for each base classifier after bootstrapping, the base classifiers will have high chance to be overfitted to the training data. The random forest is pre-compiled from a R package [11]. There are several parameters can be tuned in the program but the only parameter to be tuned in the experiment is the number of bags so that there will be a fair comparison against the other three ensemble classifiers. The number of stratified testing S is set to be 50 in the empirical analysis; the number of bags B is set to 20, 50 and 80 for three different experiments; and the ratio of the training data and the testing data is n T : n S = 70% : 30%. The training epoches of ANFIS is set to 10. The grid search of the bagging SVM has two dimensions: C {2 i,i = 5, 4,...,5 and γ {2 i,i = 4, 3,...,4. The grid search of the bagging ELM has two dimensions as well: C {2 i,i = 5, 4,...,5 and L {10, 20,...,1000. Inputs: D: Dataset y: Labels B: Number of bags S: Number of Stratified Testing C, σ, L: Parameters of the classifier FOR st =1to S, DO{ BootStrapping Phase: Randomly PARTITION D, y T, y T + S, y S, where T : S = n T : n S ; BOOTSTRAP T, y T T (i), y (i), i =1...B; Training Phase: FOR each parameters, DO{ FOR each fold in k-fold CV, DO{ M = Train(T, y T, P arameters); SAVE the best-performed parameters P best ; FOR i =1to B, DO{ M (i) = Train(T (i), y (i) T,P best); Testing Phase: FOR i =1to B, DO{ ŷ (i) S = Test(M (i), S, y S ); Aggregation Phase: Majority Vote ŷ (i) S ŷ S, i =1...B; A (st) = ŷ S y S y S ; RETURN A and σ A ; Fig. 4: The Bagging-based Ensemble Classification Algorithm IV. EMPIRICAL ANALYSIS The bagging ANFIS, the bagging SVM, the bagging ELM and the random forest were evaluated using thirteen UCI T 921
6 binary classification datasets [26] and shown in Table I. The ANFIS algorithm is the standard release in the Matlab Fuzzy Logic Toolbox based on [21]; The SVM algorithm is build from a package called libsvm [27]; The ELM algorithm is obtained from [17] and the random forest algorithm is obtained from [28] that is converted from the R package reported in [11]. In order to speed up the computation of ANFIS, three attributes were chosen for each datasets under ANFIS and the bagging ANFIS evaluation. The attributes were chosen based on one-pass single attribute ANFIS classification with ascending classification error. TABLE I: Description of Datasets Dataset No. of Samples No. of Attributes australian (credit) breast (wisconsin) bright credit diabetes dimdata heart hepatitis ionosphere liver mammo(graphic) mushroom pima The tuned parameters for SVM and ELM averaged over 5-fold CV are shown in Table II. The tuned parameters of SVM did not vary much among each fold of CV whereas the tuned parameters of ELM, especially L, had a large standard deviation among each fold of CV. TABLE II: Parameters of SVM and ELM Dataset SVM ELM log 2 C log 2 γ log 2 C L australian 0[0] -2[0] -1.8[3.56] 364[376.94] breast 0.2[0.45] -2[0] 3.8[0.84] 650[409.39] bright 3.8[1.79] -2[0] 2.6[2.79] 880[188.28] credit 0.2[0.45] -2[0] 0[3.08] 424[310.45] diabetes 0[0] -1.8[0.45] 1.4[2.88] 348[341.20] dimdata 1[0] -2[0] 0[1.87] 730[236.43] heart 1.2[2.17] -2[0] -2.8[3.49] 586[386.43] hepatitis 5[0] -0.4[2.19] 1.8[3.35] 294[191] ionosphere 4[2.24] -2[0] 1.6[1.82] 590[389.55] liver 1.2[0.84] -1.2[0.84] 0.8[5.31] 500[445.93] mammo 2.2[1.3] -2[0] 4.4[1.34] 412[470.61] mushroom 5[0] 2[0] -2[2.92] 514[439.29] pima 0.2[0.45] -2[0] -0.6[2.97] 258[394.23] The base classifier accuracies (averaged over 80 base classifiers) before majority vote is shown in Table III. Compared with the aggregated accuracy shown in Table VI, the average of base classifier accuracies is generally 1 5% less. For example, the improvements of the average testing accuracies for ionosphere from before to after aggregation are 8.03%, 2.54%, 2.26% and 7.80% for bagging ANFIS, bagging SVM, bagging ELM and random forest, respectively. In terms of diversity, the standard deviations of the base classifiers are generally more than unity. These two tables show that bagging indeed increases the accuracy by aggregating base classifiers decisions. For example, the standard deviations of the base classifiers testing accuracies of ionosphere are 1.94, 2.27, 3.35 and 3.15 for bagging ANFIS, bagging SVM, bagging ELM and random forest, respectively. TABLE III: Base Classifier Accuracy (%, μ[σ]) of Four Ensemble Classifiers with 80 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 83.29[0.49] 84.36[1.67] 85.20[2.41] 79.96[4.43] breast 92.77[2.00] 95.87[1.45] 95.01[1.26] 93.76[1.23] bright 97.10[0.38] 99.03[0.31] 99.06[0.38] 97.59[0.77] credit 85.12[1.69] 82.96[2.32] 81.55[2.53] 79.44[3.68] diabetes 73.86[1.22] 74.67[2.30] 72.98[3.40] 67.51[3.35] dimdata 88.34[0.48] 95.53[0.51] 95.60[0.54] 92.01[0.88] heart 80.68[2.04] 77.27[5.23] 80.86[5.14] 76.63[4.37] hepatitis 80.20[4.45] 79.21[1.85] 78.09[4.70] 76.36[5.98] ionosphere 81.02[1.94] 92.15[2.27] 85.76[3.35] 85.17[3.15] liver 59.92[3.55] 68.17[4.11] 67.21[4.19] 61.72[5.44] mammo 81.40[1.61] 82.20[1.42] 81.03[2.09] 78.30[2.00] mushroom 74.69[0.75] 88.90[0.75] 88.99[0.76] 83.04[3.11] pima 75.14[1.34] 74.79[2.21] 75.93[2.58] 68.36[2.75] Tables IV, V and VI show the testing accuracies of four ensemble classifiers with 20 bags, 50 bags and 80 bags, respectively. The testing accuracies are averaged over 50 stratified testings. Generally, the testing accuracies of the four bagging-based ensemble classifiers increase when the number of bags increases. This is because that the data diversity is related to the number of base classifiers [24]. For the four bagging-based ensemble classifiers with the same number of bags, the bagging SVM is the most favorable ensemble classifier among them. When there are 20 bags, the bagging SVM has the best predicted accuracies on 7 datasets out of 13; when there are 50 bags, the bagging SVM has the best predicted accuracies on 6 datasets out of 13; and when there are 80 bags, the bagging SVM has the best predicted accuracies on 8 datasets out of 13. The second most favorable ensemble classifier is the random forest. When there are 20 bags, the random forest has the best predicted accuracies on 3 datasets out of 13; when there are 50 bags, the random forest has the best predicted accuracies on 5 datasets out of 13; and when there are 80 bags, the random forest has the best predicted accuracies on 4 datasets out of 13. The four ensemble classifiers are ranked by their testing accuracies of each dataset for 20 bags, 50 bags and 80 bags, respectively. The overall rank is calculated by the summation of the individual ranks. The overall rank of the four ensemble classifiers for different bags is: bagging SVM < random forest < bagging ELM < bagging ANFIS. V. CONCLUSION This paper has discussed four bagging-based ensemble classifiers, namely the ensemble ANFIS, the ensemble SVM, the ensemble ELM and the random forest. The discussion of the four ensemble classifiers is novel because it is not reported in the existing literature. The ensemble classifiers have been evaluated on thirteen UCI binary class datasets with different bagging numbers (20, 50 and 80). The empirical results have 922
7 TABLE IV: Testing Accuracy (%, μ[σ](rank)) offouren- semble Classifiers with 20 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 86.73[2.23] 85.57[1.99] 85.19[2.33] 86.66[1.82] (1) (3) (4) (2) breast 96.07[1.05] 96.40[1.09] 96.25[1.24] 96.51[1.12] (4) (2) (3) (1) bright 98.95[0.29] 99.32[0.29] 99.20[0.29] 99.12[0.33] credit 84.54[2.18] 84.77[2.16] 85.14[2.32] 86.52[2.07] diabetes 76.22[1.90] 76.61[2.27] 75.53[2.57] 75.75[2.14] (2) (1) (4) (3) dimdata 94.92[0.49] 96.23[0.51] 95.71[0.57] 95.44[0.56] heart 78.36[3.74] 81.89[2.61] 79.89[2.88] 81.07[2.89] hepatitis 76.00[4.21] 82.30[4.05] 80.04[4.82] 82.00[4.14] ionosphere 88.13[2.91] 94.57[1.77] 88.48[3.04] 93.30[1.84] liver 65.96[3.55] 69.34[4.22] 67.61[4.19] 69.90[3.64] (4) (2) (3) (1) mammo 83.19[1.89] 82.53[2.05] 81.75[1.80] 81.88[1.86] (1) (2) (4) (3) mushroom 78.66[1.12] 89.10[0.50] 89.12[0.52] 88.45[0.77] (4) (2) (1) (3) pima 75.02[2.54] 76.06[2.07] 75.56[2.85] 75.43[2.41] Rank (44) (21) (36) (29) TABLE VI: Testing Accuracy (%, μ[σ](rank)) of Four Ensemble Classifiers with 80 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 86.33[2.10] 85.52[2.04] 85.52[1.92] 86.78[2.06] (2) (4) (3) (1) breast 95.91[1.01] 96.56[1.12] 96.58[1.10] 96.77[0.95] bright 98.86[0.37] 99.38[0.24] 99.27[0.23] 99.17[0.32] credit 84.35[1.88] 85.08[1.81] 85.27[1.98] 86.94[1.64] diabetes 76.92[2.41] 77.36[1.92] 76.65[2.07] 76.98[1.64] (3) (1) (4) (2) dimdata 94.84[0.54] 96.29[0.48] 96.00[0.52] 95.72[0.49] heart 78.82[3.64] 81.57[3.13] 80.32[3.64] 80.89[3.05] hepatitis 75.13[4.08] 83.61[3.17] 81.30[3.83] 82.48[3.61] ionosphere 89.05[2.53] 94.69[1.89] 88.02[2.89] 92.97[2.06] (3) (1) (4) (2) liver 66.91[4.13] 69.96[4.20] 70.95[4.24] 71.83[4.43] mammo 83.48[2.08] 82.59[2.06] 82.03[2.17] 81.90[1.84] (1) (2) (3) (4) mushroom 79.04[0.68] 89.27[0.52] 89.25[0.57] 88.90[0.57] pima 75.86[2.14] 77.03[1.92] 76.04[2.41] 76.10[1.97] Rank (45) (23) (35) (27) TABLE V: Testing Accuracy (%, μ[σ](rank)) of Four Ensemble Classifiers with 50 Bags Dataset Bagging Bagging Bagging Random ANFIS SVM ELM Forest australian 87.10[1.78] 86.25[1.91] 86.19[2.07] 87.52[2.05] (2) (3) (4) (1) breast 95.83[1.23] 96.55[1.05] 96.62[1.07] 96.73[1.10] bright 98.83[0.32] 99.35[0.23] 99.24[0.26] 99.15[0.28] credit 84.45[2.35] 85.50[2.14] 85.74[2.17] 86.91[1.85] diabetes 76.83[2.33] 77.30[2.42] 76.33[3.07] 76.52[2.31] (2) (1) (4) (3) dimdata 94.99[0.49] 96.31[0.35] 95.96[0.42] 95.77[0.48] heart 77.77[3.56] 81.02[3.65] 79.84[4.19] 80.52[3.97] hepatitis 75.26[4.79] 82.22[3.31] 81.87[4.23] 82.70[3.83] (4) (2) (3) (1) ionosphere 88.25[2.42] 94.82[1.92] 88.29[2.36] 93.35[2.01] liver 67.22[3.65] 70.02[4.22] 69.38[4.55] 70.87[3.83] (4) (2) (3) (1) mammo 83.53[1.78] 82.61[1.85] 81.84[2.08] 81.85[1.75] (1) (2) (4) (3) mushroom 79.00[0.82] 89.21[0.59] 89.27[0.55] 88.91[0.70] (4) (2) (1) (3) pima 75.37[2.38] 76.38[2.47] 75.97[2.89] 75.73[2.61] Rank (45) (23) (35) (27) shown that all four ensemble classifiers have returned high testing accuracy. The effect of data diversity has been reflected from the comparison of base classifier accuracies and the ensemble classifier accuracies. In addition, the effect of data diversity has also been reflected from the number of bags because the results have shown that the testing accuracies increase with increasing of the base classifiers. Among the four ensemble classifiers, the ensemble SVM has been identified to be the most favorable ensemble classifier and the random forest has been identified to be the second most favorable one. The bagging ELM and the bagging ANFIS have been identified to be the third and fourth most favorable classifiers. REFERENCES [1] M. Bhardwaj, T. Gupta, T. Grover, and V. Bhatnagar, An efficient classifier ensemble using SVM, in International Conference on Methods and Models in Computer Science, Dec. 2009, pp [2] L. Breiman, Random forests, Machine Learning, vol. 45, pp. 5 32, [3] D. Opitz and R. Maclin, Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research, vol. 11, pp , [4] L. Yu, S. Wang, and K. K. Lai, Investigation of diversity strategies in SVM ensemble learning, in International Conference on Natural Computation, vol. 7, Oct. 2008, pp [5] R. Lei, X. Kong, and X. Wang, An ensemble SVM using entropy-based attribute selection, in Chinese Control and Decision Conference, May 2010, pp [6] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in International Joint Conference on Artificial Intelligence, vol. 2, no. 12, 1995, pp [7] L. Breiman, Bagging predictors, Machine Learning, vol. 24, pp , [8] Y. Freund and R. E. Schapire, A short introduction to boosting, Journal of Japanese Scociety for Artificial Intelligence, vol. 14, no. 5, pp , Sept [9] S. Kurogi and K. Harashima, Improving generalization performance of bagging ensemble via Bayesian approach, in IEEE International Symposium of Computational Intelligence in Robotics and Automation, Mar. 2009, pp
8 [10] S. Clemenon, M. Depecker, and N. Vayatis, Bagging ranking trees, in International Conference on Machine Learning and Applications, Dec. 2009, pp [11] A. Liaw and M. Wiener, Classification and regression by randomforest, RNews, vol. 2/3, pp , Dec [12] R. Chen and J. Yu, An improved bagging neural network ensemble algorithm and its application, in International Conference on Natural Computation, Aug. 2007, pp [13] J. Canul-Reich, L. Shoemaker, and L. O. Hall, Ensembles of fuzzy classifiers, in IEEE International Fuzzy Systems Conference, July 2007, pp [14] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol. 20, no. 3, pp , [15] F. M. Zaman and H. Hirose, Double SVMbagging: A new double bagging with support vector machine, Engineering Letters, vol. 17, no. 2, pp , [16] G. Valentini and T. G. Dietterich, Low bias bagged support vector machines, in International Conference on Machine Learning, 2003, pp [17] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme learning machine for regression and multi-class classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. PP, no. 99, pp. 1 17, Oct [18] N. Liu and H. Wang, Ensemble based extreme learning machine, IEEE Signal Processing Letters, vol. 17, no. 8, pp , Aug [19] Y. Liu, X. Xu, and C. Wang, Simple ensemble of extreme learning machine, in International Congress on Image and Signal Processing, Oct. 2009, pp [20] H. Tian and B. Meng, A new modeling method based on bagging elm for day-ahead electricity price prediction, in IEEE International Conference on Bio-Inspired Computing: Theories and Applications, Sept. 2010, pp [21] J.-S. R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp , May [22] M. Sugeno and G. Kang, Structure identification of fuzzy model, Fuzzy Sets and Systems, vol. 28, pp , [23] J. Suyken and C. Alzate, Support vector machines and kernel methods: new approaches in unsupervised learning, in Tutorial WCCI 2010, July [24] T. G. Dietterich, Machine-learning research: Four current directions, The AI Magazine, vol. 18, no. 4, pp , [25] W.-Y. Loh, Classification and regression trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 1, pp , Jan [26] [Online]. Available: [27] C.-C. Chang and C.-J. Lin, LIBSVM : a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 1 27, [Online]. Available: cjlin/libsvm [28] A. Jaiantilal. (2009) Classification and regression by randomforest-matlab. [Online]. Available: randomforest-matlab 924
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationBagging and Boosting Algorithms for Support Vector Machine Classifiers
Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima
More informationThe Effects of Outliers on Support Vector Machines
The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationCGBoost: Conjugate Gradient in Function Space
CGBoost: Conjugate Gradient in Function Space Ling Li Yaser S. Abu-Mostafa Amrit Pratap Learning Systems Group, California Institute of Technology, Pasadena, CA 91125, USA {ling,yaser,amrit}@caltech.edu
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationChap.12 Kernel methods [Book, Chap.7]
Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first
More informationMODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM
CHAPTER-7 MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM 7.1 Introduction To improve the overall efficiency of turning, it is necessary to
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationEuropean Journal of Science and Engineering Vol. 1, Issue 1, 2013 ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM IDENTIFICATION OF AN INDUCTION MOTOR
ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM IDENTIFICATION OF AN INDUCTION MOTOR Ahmed A. M. Emam College of Engineering Karrary University SUDAN ahmedimam1965@yahoo.co.in Eisa Bashier M. Tayeb College of Engineering
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationIn the Name of God. Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System
In the Name of God Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System Outline ANFIS Architecture Hybrid Learning Algorithm Learning Methods that Cross-Fertilize ANFIS and RBFN ANFIS as a universal
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias
More informationImproving Classification Accuracy for Single-loop Reliability-based Design Optimization
, March 15-17, 2017, Hong Kong Improving Classification Accuracy for Single-loop Reliability-based Design Optimization I-Tung Yang, and Willy Husada Abstract Reliability-based design optimization (RBDO)
More informationA New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines
A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines Trung Le, Dat Tran, Wanli Ma and Dharmendra Sharma Faculty of Information Sciences and Engineering University of Canberra, Australia
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationModel combination. Resampling techniques p.1/34
Model combination The winner-takes-all approach is intuitively the approach which should work the best. However recent results in machine learning show that the performance of the final model can be improved
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationNelder-Mead Enhanced Extreme Learning Machine
Philip Reiner, Bogdan M. Wilamowski, "Nelder-Mead Enhanced Extreme Learning Machine", 7-th IEEE Intelligent Engineering Systems Conference, INES 23, Costa Rica, June 9-2., 29, pp. 225-23 Nelder-Mead Enhanced
More informationVersion Space Support Vector Machines: An Extended Paper
Version Space Support Vector Machines: An Extended Paper E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, G.I. Nalbantov 2, and S. Vanderlooy Abstract. We argue to use version spaces as an approach to reliable
More informationRobust 1-Norm Soft Margin Smooth Support Vector Machine
Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationENSEMBLE RANDOM-SUBSET SVM
ENSEMBLE RANDOM-SUBSET SVM Anonymous for Review Keywords: Abstract: Ensemble Learning, Bagging, Boosting, Generalization Performance, Support Vector Machine In this paper, the Ensemble Random-Subset SVM
More informationTable of Contents. Recognition of Facial Gestures... 1 Attila Fazekas
Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationAn Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods
An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationApplication of Support Vector Machine In Bioinformatics
Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationDynamic Ensemble Construction via Heuristic Optimization
Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple
More informationDESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES
EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset
More informationAn Endowed Takagi-Sugeno-type Fuzzy Model for Classification Problems
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationSoft computing algorithms
Chapter 1 Soft computing algorithms It is indeed a surprising and fortunate fact that nature can be expressed by relatively low-order mathematical functions. Rudolf Carnap Mitchell [Mitchell, 1997] defines
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationA Practical Guide to Support Vector Classification
A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationKernel Combination Versus Classifier Combination
Kernel Combination Versus Classifier Combination Wan-Jui Lee 1, Sergey Verzakov 2, and Robert P.W. Duin 2 1 EE Department, National Sun Yat-Sen University, Kaohsiung, Taiwan wrlee@water.ee.nsysu.edu.tw
More informationHALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA
1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationCombining SVMs with Various Feature Selection Strategies
Combining SVMs with Various Feature Selection Strategies Yi-Wei Chen and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan Summary. This article investigates the
More informationApril 3, 2012 T.C. Havens
April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate
More informationSoftware Documentation of the Potential Support Vector Machine
Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany
More informationAdvanced Video Content Analysis and Video Compression (5LSH0), Module 8B
Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen
More informationCS 8520: Artificial Intelligence
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software
More informationSELF-ORGANIZING methods such as the Self-
Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Maximal Margin Learning Vector Quantisation Trung Le, Dat Tran, Van Nguyen, and Wanli Ma Abstract
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationLab 2: Support vector machines
Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationNEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE)
NEAREST-INSTANCE-CENTROID-ESTIMATION LINEAR DISCRIMINANT ANALYSIS (NICE LDA) Rishabh Singh, Kan Li (Member, IEEE) and Jose C. Principe (Fellow, IEEE) University of Florida Department of Electrical and
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationImproving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets
Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationAn Empirical Comparison of Spectral Learning Methods for Classification
An Empirical Comparison of Spectral Learning Methods for Classification Adam Drake and Dan Ventura Computer Science Department Brigham Young University, Provo, UT 84602 USA Email: adam drake1@yahoo.com,
More informationStatistical foundations of machine learning
Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Machine Learning Group Computer Science Department mlg.ulb.ac.be Some algorithms for nonlinear modeling Feedforward neural network
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationSVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1
SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used
More informationData-driven Kernels for Support Vector Machines
Data-driven Kernels for Support Vector Machines by Xin Yao A research paper presented to the University of Waterloo in partial fulfillment of the requirement for the degree of Master of Mathematics in
More informationFEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION
FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT
More informationProbabilistic Approaches
Probabilistic Approaches Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Independence
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationAn Ensemble of Classifiers using Dynamic Method on Ambiguous Data
An Ensemble of Classifiers using Dynamic Method on Ambiguous Data Dnyaneshwar Kudande D.Y. Patil College of Engineering, Pune, Maharashtra, India Abstract- The aim of proposed work is to analyze the Instance
More informationImage Compression: An Artificial Neural Network Approach
Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and
More informationCombined Weak Classifiers
Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract
More informationTransactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN
Comparative study of fuzzy logic and neural network methods in modeling of simulated steady-state data M. Järvensivu and V. Kanninen Laboratory of Process Control, Department of Chemical Engineering, Helsinki
More informationRule Based Learning Systems from SVM and RBFNN
Rule Based Learning Systems from SVM and RBFNN Haydemar Núñez 1, Cecilio Angulo 2 and Andreu Català 2 1 Laboratorio de Inteligencia Artificial, Universidad Central de Venezuela. Caracas, Venezuela hnunez@strix.ciens.ucv.ve
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationLecture 20: Bagging, Random Forests, Boosting
Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively
More informationCS294-1 Final Project. Algorithms Comparison
CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationRough Set Approach to Unsupervised Neural Network based Pattern Classifier
Rough Set Approach to Unsupervised Neural based Pattern Classifier Ashwin Kothari, Member IAENG, Avinash Keskar, Shreesha Srinath, and Rakesh Chalsani Abstract Early Convergence, input feature space with
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More information