COMPARISON OF SUBSAMPLING TECHNIQUES FOR RANDOM SUBSPACE ENSEMBLES

Size: px
Start display at page:

Download "COMPARISON OF SUBSAMPLING TECHNIQUES FOR RANDOM SUBSPACE ENSEMBLES"

Transcription

1 COMPARISON OF SUBSAMPLING TECHNIQUES FOR RANDOM SUBSPACE ENSEMBLES SANTHOSH PATHICAL 1, GURSEL SERPEN 1 1 Elecrical Engineering and Computer Science Department, University of Toledo, Toledo, OH, 43606, USA santhosh.pathical@utoledo.edu, gursel.serpen@utoledo.edu Abstract: This paper presents the comparison of three subsampling techniques for random subspace ensemble classifiers through an empirical study. A version of random subspace ensemble designed to address the challenges of high dimensional classification, entitled random subsample ensemble, within the voting combiner framework was evaluated for its performance for three different sampling methods which entailed random sampling without replacement, random sampling with replacement, and random partitioning. The random subsample ensemble was instantiated using three different base learners including C4.5, k-nearest neighbor, and naïve Bayes, and tested on five high-dimensional benchmark data sets in machine learning. Simulation results helped ascertain the optimal sampling technique for the ensemble, which turned out to be the sampling without replacement. Keywords: Random subsampling; curse of dimensionality; ensemble classification; random subspace 1. Introduction The need for classification in high dimensional feature spaces arises in contemporary fields like biomedical, finance, satellite imagery, customer relationship management, network intrusion detection etc. [1]-[4]. Classifying high dimensional datasets is a challenging task due to several reasons. The machine learning algorithms used for classification are exponential in terms of their computational complexity with respect to the number of dimensions [2]. The required number of labeled training samples for supervised classification increases exponentially with the dimensionality [5]. Sparsity of data points in the higher dimensions makes the learning task very difficult if not impossible. High dimensionality also causes scalability problems for machine learning algorithms [3]. Such inability to scale exhibits itself in the form of substantial computational cost, which translates into indefinitely long training periods, or possibly (in the worst case) the non-learnability of the classification task associated with that dataset. This adverse effect is also called as the curse of dimensionality [1], [6]. To address the challenges associated with algorithm scalability, data sparsity and information loss due to the curse of dimensionality, the authors in [7] presented a novel application of the random subspace ensembles. The proposed ensemble, called the random subsample ensemble (RSE), generated random lower dimensional projections/subspaces of the original high dimensional feature space. Each of the subspaces constituted a (much smaller, particularly when compared to the original) classification subtask. The predictions of classifiers developed on these subtasks were aggregated to produce the final classification prediction or outcome within an ensemble framework. Accordingly, one can note that RSE employs the divide-and-conquer methodology to deal with the adverse effects of high dimensionality. RSE was presented as the alternative approach for managing high dimensionality to feature selection techniques and feature transformation methods. Among which, for instance, principal component analysis and K-L transform suffer from drawbacks like information loss and lack of interpretability. RSE makes the high dimensional learning task scalable by using lower dimensional projections. The problem of sparsity of data points in the high dimensional feature space is inherently addressed by RSE due to the lower dimensional projections. It was established through empirical testing on five high dimensional datasets in [7] that RSE is well positioned to address the problems of high dimensionality. In this paper, we explore the effect of the sampling technique on the performance of RSE by comparing the performance results of RSE using three different sampling techniques. An introduction to the classification approach based on the random subsample ensemble and the sampling techniques is presented in Section 2. In section 3, simulation results along with the results of the comparison of the three sampling techniques are presented and discussed. Finally, conclusions are presented in Section 4.

2 2. Random Subsample Ensemble The rationale behind the Random Subsample Ensemble (RSE) is to break down a complex high dimensional problem into several lower-dimensional sub-problems, and thus conquer the computational complexity aspects of the original problem. High dimensional feature space is projected onto a set of lower dimensions by selecting random feature subsets or subsamples from the original set. The lower-dimensional projections can be generated using a variety of techniques based on random subsampling. The three such specific methods used for the generation of the random subsamples of the original feature space, as presented in this paper, are: (1) random sampling without replacement, (2) random sampling with replacement, and (3) random mutually exclusive partitioning. Random sampling without replacement generates subsamples, where a selected feature is unique within a subsample. However, there is a possible overlap amongst different subsamples: the same feature might exist in more than one subsample. Random sampling with replacement allows for a single feature to be repeated within a subsample and across the set of subsamples. The subsamples generated using random partitioning are mutually exclusive, i.e. a selected feature is unique within a subsample and across the set of subsamples. Each of the feature subsamples may have the same number of features for all the three techniques although this is also another adjustable parameter. As an example, consider the original high-dimensional feature space to be represented by F={x 1, x 2 x k }. The feature space F is randomly projected onto N d-dimensional subspaces, f i = {x 1, x 2 x d } for, i=1 to N with d<<k. The values for d (cardinality of subsample set) and N (number of subsamples) are likely to be problem-dependent and require empirical search and/or theoretical analysis for optimality. The d-dimensional feature sets are generated either by randomly sampling d features N times from the original feature set without/with replacement or by randomly partitioning the feature space into N d-dimensional subsamples and hence, one of the following holds, respectively: and N fi x 1, x2,..., xk, (1) i 1 N fi x,,..., 1 x2 xk, (2) i 1 Random partitioning allows for all the original features to be selected exactly once in the set of subsamples which tends to minimize the potential information loss in terms of the individual features. While random sampling (with and without replacement) does not guarantee selection of all the features, the cardinality of the feature subsample and the number of subsamples can be manipulated to ensure that majority of the original features are selected in at least one subsample. Each lower dimensional subsample of the feature space is used to train a base learner of the random subsample ensemble. Each base learner is trained on the entire original set of training data instances, which has the advantage that all the classifiers are trained using full class representation [8]. Once the base learners are trained, their predictions are combined to get the final ensemble prediction. The architecture of random subsample ensemble (RSE) is depicted in Fig. 1. Figure 1. Conceptual architecture of RSE The results of the simulations with RSE on a set of high dimensional datasets as presented in [7] helped affirm the effectiveness of RSE as an approach for high dimensional classification. The sampling technique used for the instantiation of the RSE ensembles for these particular simulations was random sampling without replacement. It was established, within the context of this sampling technique, that a small subsample size combined with a high number of subsamples (high number of base learners) for RSE performed equally well when compared with other popular ensembles; the small subsample size being critical in this case so as to more effectively address the curse of dimensionality challenge. Ensemble classification methods similar to RSE which include Attribute Bagging, Input Decimation, Random Subspace, Multiple Feature Subset (MFS), and

3 Classification Ensembles from Random Partitions can be found respectively in [8], [9], [10], [11], [12]. Each of these methods subsample or partition the feature space in some form but except for MFS, to an extent, none of the methods explore the subsampling process itself. With MFS, the author uses both sampling with and without replacement for subsampling the feature space and tests the resulting ensembles on classification error rates but finds no significant difference between the two sampling techniques. However, MFS being a specialized classification algorithm intended to improve the performance of the nearest neighbor algorithm, the context along with consequent applicability and generality of the reported study is very limited. 3. Simulation study The aim of the simulation study is to profile the effect of feature space subsampling techniques on the performance of random subsampling ensemble (RSE). The three sampling techniques entail sampling without replacement, sampling with replacement and mutually exclusive partitioning. The datasets employed and their characteristics are presented in the Table 1. The datasets are from the UCI Machine Learning Repository. Madelon (Made) is a two class artificial dataset with continuous input variables. With the Isolet (Isol) dataset the goal is to predict which letter name was spoken by the human subjects while the Multiple Features (Mu Fe) dataset consists of features of handwritten numerals (`0' through`9') extracted from a collection of Dutch utility maps. Internet Ads (In Ad) dataset represents a set of possible advertisements on Internet pages. Dexter (Dext) is a text classification problem in a bag-of-word representation. There are two parameters that need to be set for sampling without replacement and sampling with replacement, namely subsample size and number of subsamples. For partitioning however, we only need to set either of the two parameters as the other would be dictated by the one that is set. For example, if the number of subsamples is set to 4 then the subsample size would be 25% of original feature set size and vice versa. Accordingly, different parameter values for the cardinality of the feature subsamples and the number of subsamples were explored for each dataset for both sampling without and with replacement. The values for the subsample size/cardinality was varied from five percentage to fifty percentage of original feature count in increments of five percentage points, i.e. from 5% to 50% in increments of 5%. The subsample count parameter that directly corresponds to the number of base learners/classifiers is explored for 3, 5, 7, 9, 11, 19 and 25. In the case of partitioning, the values selected for the number of partitions (which incidentally is the same as classifier count) were 3, 5, 7, 9, 11, 15, 19 and 25, and accordingly, subsample sizes were dictated as approximately 33%, 20%, 14%, 11%, 9%, 7%, 5% and 4%, respectively. Table 1. Data Sets Dataset Feature Count Training Instances Testing Instances Madelon Isolet Multiple Features Internet Ads Dexter Simulation experiments were conducted using the algorithm implementations in the WEKA environment (version 3.5.8) [13]. Among the machine learning algorithms experimented with as base learners for RSE are C4.5, naïve Bayes (nb), and k-nearest neighbor (knn). Default parameter values in WEKA were used for the learning algorithms, which is a common practice in literature [14], [15]. In all of the simulation experiments, ten-fold cross validation was used for base learners in the absence of a test set. The performance metrics recorded are prediction accuracy, SAR and cpu time. SAR is measured using the average of the squared error, accuracy and area under the ROC curve. The cpu time was calculated based on the processor time taken by the process of an ensemble/learning algorithm as implemented within WEKA as measured by the time utility on the Solaris TM OS (version 9) platform. All base classifiers were the same type, i.e. one of C4.5, knn or nb only, for a specific instantiation of RSE. Consequently, for each dataset there are 80 possible combinations of subsampling percentage vs. base classifier count to be explored for sampling without and with replacement. There are eight instantiations of RSE for partitioning, one for each base classifier count. The ensemble architecture chosen for modeling RSE was voting (average of the posterior probabilities) given its implementation simplicity, low computational overhead, and proven track record for a variety of applications as reported in the literature. The prediction accuracy results of RSE for the two random sampling techniques (not the partitioning) across the different combinations of subsampling percentages and number of base classifiers, using C4.5 as the base learning algorithm, for the Dexter dataset are shown in Tables 2 and 3. The results of the simulations for partitioning are shown in Table 4. This entire scenario as presented in Tables 2, 3, and 4 was repeated for the other two learning algorithms,

4 namely knn and naïve Bayes, and three performance metrics, namely prediction accuracy, SAR, and cpu time, although not presented here due to space limitations. Similar evaluations, again omitted due to space limitations, were also carried out for the other four datasets and three base learning algorithms recording values for all three performance metrics. Table 2. Accuracy results (%) for Dexter dataset across subsampling percentage vs. base classifier count for sampling without replacement using C4.5 Sub- Classifier Count samp. % Table 3. Accuracy results (%) for Dexter dataset across subsampling percentage vs. base classifier count for sampling with replacement using C4.5 Sub- Classifier Count samp. % Table 4. Accuracy results (%) for Dexter dataset for different base classifier counts for partitioning using C4.5 Classifier Accuracy Count 3 (33%) (20%) (14%) (11%) (9%) (7%) (5%) (4%) 90.3 The comparative performance analysis procedure was divided into two parts. In the first part, sampling without replacement and sampling with replacement are compared using the Wilcoxon signed-ranks test [16]. The second part compares the three techniques using three-dimensional plots of the established performance results of instantiations of RSE for each of the sampling techniques. The comparative analysis had to be divided into two parts because it was not suitable to compare the three sampling techniques together using a formal statistical procedure due to the much smaller sample size of performance values for random partitioning as compared to the other two techniques (8 vs. 80). For the Wilcoxon signed-ranks test, a single sample consists of the performance values of RSE recorded for a single metric for the sampling technique (under consideration) for a single base learning algorithm on a single dataset. The simulation experiments were conducted on five datasets using three learning algorithms recording three performance metrics. Thus, there are 45 pairs of samples for the two sampling techniques: without and with replacement. Each pair of samples was compared using a single Wilcoxon signed-ranks test. It was hypothesized for each test that the sampling without replacement has better performance than sampling with replacement. The tests were conducted using α=0.05. The values of the z-statistic for each test are shown in Table 5. With α=0.05, the observed value of z is significant if it is greater than The cases where sampling without replacement performs significantly better than sampling with replacement are shown in bold in the table while the cases where it seems to perform poorly are shown in italics. The case with NA as its entry is the one for which the value for non-zero pair differences is too small to be approximated by a normal distribution. In that case the value of W (Wilcoxon statistic) is 12, which is greater than the tabulated critical value and hence there is no statistically significant difference between the two techniques. Table 5 shows that for the accuracy metric, seven z scores that cover all three algorithms and four of the five datasets (Multiple Features excluded) favor the random subsampling without replacement. In four tests, z scores favor random subsampling with replacement while noting that distribution of four z scores cover all three algorithms and three out of five datasets (Multiple Features and Internet Ads excluded). For the remaining four cases, the tests indicate no statistically significant difference. The bias in favor of the random subsampling without replacement is much more apparent in the case of the SAR performance metric. Analysis for the SAR performance metric indicates that there is hardly any statistically significant difference in performances due to the two subsampling schemes. In ten of 15 cases, z scores indicate no statistically significant difference. For four z scores, random subsampling without replacement claims

5 statistically significant superiority in performance. This happens for all three learning algorithms but only two data sets, Madelon and Internet Ads. In only one case, knn for Dexter, there is a statistically significant difference in favor of random subsampling with replacement. It is also possible to suggest that there is no apparent bias for either sampling technique with respect to either accuracy or SAR metric if the learning algorithm is chosen as knn. Overall, the statistical tests show that the performance values for sampling without replacement are somewhat, but not to large degree, better than sampling with replacement at a statistically significant level. Table 5. The values for the z-statistic for the Wilcoxon signed-rank tests comparing sampling without replacement against sampling with replacement Datasets Algo. C4.5 knn nb Metric Made Isol Mu Fe In Ad Dext Acc SAR cpu time Acc SAR cpu time Acc SAR NA cpu time The time complexity or cost appears to have been noticeably affected by the choice of the subsampling scheme. In eight out of fifteen cases, measurements suggest that random subsampling without replacement is more affordable at a statistically significant level. Interestingly enough, in four cases there are statistically significant differences in favor of the random subsampling with replacement and all four cases are for the knn algorithm only. A single sample of random partitioning has 8 data points while for the other two techniques the sample size is 80. Due to such disparity in sample sizes, it was decided not to use a formal statistical comparison procedure to compare partitioning to the other two sampling techniques. Instead, three-dimensional scatter plots of the performance values of the three sampling techniques were employed to facilitate visual comparison. In overall, there are 45 individual cases for comparison. Sample plots depicting the performance of the three sampling techniques on the Dexter dataset with C4.5 as the learning algorithm on all the three performance measures are shown in Figures 2 through 4. The green (lighter gray in non-color version) dots represent the performance values for sampling without replacement, orange (darker gray in non-color version) asterisks represent the values for sampling with replacement, and blue diamonds represent the values for random partitioning. Referring to the rest of the plots, which could not be included here due to space limitations, there are 12 cases where the two sampling techniques, with and without replacement, appear to be significantly better than random partitioning. On the Madelon dataset, the two techniques have higher values for prediction accuracy and SAR for C4.5 and knn algorithms. These two are better than random partitioning on accuracy for C4.5 on the Isolet dataset while they are better than partitioning on both accuracy and SAR for all the three algorithms on the Internet Ads dataset. Finally, SAR values for without and with replacement are higher than those for partitioning for C4.5 on Dexter. Random partitioning is better than the other two sampling techniques on the measure of SAR for nb on Dexter dataset. On all the other cases for accuracy and SAR, the performances of the three techniques are comparable. In conclusion, then, the random sampling with/without replacement duo has a lead over the random partitioning technique. Figure 2. Accuracy values for the three sampling techniques using C4.5 as the learning algorithm on the Dexter dataset For all the cpu time results, partitioning appears to be better than the other two algorithms. However, this is expected as for partitioning there are no cases where there are combinations of higher classifier counts and higher subsampling percentages. These are the combinations which have the highest cpu time costs for the other two sampling techniques.

6 4. Conclusions Three sampling techniques were explored for their effect on the performance of the random subsample ensemble, which is a variant of so-called random subspace ensembles in the literature. A simulation study was performed on high dimensional datasets from the UCI repository to establish the optimal sampling technique amongst sampling without replacement, sampling with replacement, and mutually exclusive partitioning. The results indicate a better performance of the ensemble for the sampling without replacement method over the other two techniques within the context of the simulation study. Figure 3. SAR values for the three sampling techniques using C4.5 as the learning algorithm on the Dexter dataset Figure 4. cpu time values for the three sampling techniques using C4.5 as the learning algorithm on the Dexter dataset References [1] D. L. Donoho, High dimensional data analysis: The curses and blessings of dimensionality, Aide-memoire of a American Math Society Lecture Math Challenges of the 21st Century, [2] L. Yu, H. Liu Feature selection for high-dimensional data: A fast correlation-based filter solution, Proc. 20 th ICML, 2003, pp [3] R. Caruana, N. Karampatziakis, A. Yessenalina, An Empirical Evaluation of Supervised Learning in High Dimensions, Proc. 25 th ICML, 2008, pp [4] Y. Zhao, Y. Chen, X. Zhang, A Novel Ensemble Approach for Cancer Data Classification, Proc. 4 th Intl. Symposium on Neural Networks, LNCS, vol. 4492, Springer, 2007, pp [5] O. Maimon, L. Rokach, Improving Supervised Learning by Feature Decomposition, Foundations of Information and Knowledge Systems, 2002, pp [6] R. Bellman, Adaptive control processes: A guided tour. Princeton University Press, Princeton, [7] G. Serpen, S. Pathical, Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble, in Proc. ICMLA, 2009, pp [8] R. Bryll, R. Gutierrez-Osuna, F. Quek, Attribute Bagging: Improving Accuracy of Classifier Ensembles by using Random Feature Subsets, Pattern Recognition, vol. 36, 2003, pp [9] N. C. Oza, K. Tumer Input Decimation Ensembles: Decorrelation through Dimensonality Reduction, Proc. Intl. W shop on Multiple Classifier Systems, LNCS, vol. 2096, Springer, 2001, pp [10] T. K. Ho, The Random Subspace Method for Constructing Decision Forests, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, 1998, pp [11] S. Bay, Combining Nearest Neighbor Classifiers through Multiple Feature Subsets, in Proc. 15 th ICML, 1998, pp [12] H. Ahn, H. Moon, M. J. Fazzari, N. Lim, J. Chen, R. Kodell, Classification by ensembles from random partitions of high-dimensional data, Computational Statistics and Data Analysis, vol. 51, 2007, pp [13] I. H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, [14] S. Dzeroski, B. Zenko, Is Combining Classifiers with Stacking Better than Selecting the Best One, Machine Learning, vol. 54, 2004, pp [15] J. J. Rodriguez, L. I. Kuncheva, C. J. Alonso, Rotation Forest: A New Classifier Ensemble Method, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, 2006, pp [16] F. Wilcoxon, Individual comparisons by ranking methods, Biometrics, vol. 1, 1945, pp

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Feature-weighted k-nearest Neighbor Classifier

Feature-weighted k-nearest Neighbor Classifier Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka

More information

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Remco R. Bouckaert 1,2 and Eibe Frank 2 1 Xtal Mountain Information Technology 215 Three Oaks Drive, Dairy Flat, Auckland,

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

The Role of Biomedical Dataset in Classification

The Role of Biomedical Dataset in Classification The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,

More information

Multiple Classifier Fusion using k-nearest Localized Templates

Multiple Classifier Fusion using k-nearest Localized Templates Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram CSE 6242/CX 4242 Ensemble Methods Or, Model Combination Based on lecture by Parikshit Ram Numerous Possible Classifiers! Classifier Training time Cross validation Testing time Accuracy knn classifier None

More information

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu

More information

3 Virtual attribute subsetting

3 Virtual attribute subsetting 3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Feature Selection with Decision Tree Criterion

Feature Selection with Decision Tree Criterion Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8 Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Available online at ScienceDirect. Procedia Computer Science 35 (2014 )

Available online at  ScienceDirect. Procedia Computer Science 35 (2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 388 396 18 th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree

More information

Diagonal Principal Component Analysis for Face Recognition

Diagonal Principal Component Analysis for Face Recognition Diagonal Principal Component nalysis for Face Recognition Daoqiang Zhang,2, Zhi-Hua Zhou * and Songcan Chen 2 National Laboratory for Novel Software echnology Nanjing University, Nanjing 20093, China 2

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Chapter 8 The C 4.5*stat algorithm

Chapter 8 The C 4.5*stat algorithm 109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the

More information

Feature Selection and Classification for Small Gene Sets

Feature Selection and Classification for Small Gene Sets Feature Selection and Classification for Small Gene Sets Gregor Stiglic 1,2, Juan J. Rodriguez 3, and Peter Kokol 1,2 1 University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor,

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

Nearest Neighbors in Random Subspaces

Nearest Neighbors in Random Subspaces Nearest Neighbors in Random Subspaces Tin Kam Ho Bell Laboratories, Lucent Technologies 700 Mountain Avenue. 2C-425, Murray Hill, NJ 07974, USA Abstract. Recent studies have shown that the random subspace

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

A Novel Feature Selection Framework for Automatic Web Page Classification

A Novel Feature Selection Framework for Automatic Web Page Classification International Journal of Automation and Computing 9(4), August 2012, 442-448 DOI: 10.1007/s11633-012-0665-x A Novel Feature Selection Framework for Automatic Web Page Classification J. Alamelu Mangai 1

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany {park,juffi}@ke.informatik.tu-darmstadt.de Abstract. Pairwise

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

An Empirical Comparison of Spectral Learning Methods for Classification

An Empirical Comparison of Spectral Learning Methods for Classification An Empirical Comparison of Spectral Learning Methods for Classification Adam Drake and Dan Ventura Computer Science Department Brigham Young University, Provo, UT 84602 USA Email: adam drake1@yahoo.com,

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

CHAPTER 6 EXPERIMENTS

CHAPTER 6 EXPERIMENTS CHAPTER 6 EXPERIMENTS 6.1 HYPOTHESIS On the basis of the trend as depicted by the data Mining Technique, it is possible to draw conclusions about the Business organization and commercial Software industry.

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

arxiv: v2 [cs.lg] 11 Sep 2015

arxiv: v2 [cs.lg] 11 Sep 2015 A DEEP analysis of the META-DES framework for dynamic selection of ensemble of classifiers Rafael M. O. Cruz a,, Robert Sabourin a, George D. C. Cavalcanti b a LIVIA, École de Technologie Supérieure, University

More information

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens Faculty of Economics and Applied Economics Trimmed bagging a Christophe Croux, Kristel Joossens and Aurélie Lemmens DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 0721 Trimmed Bagging

More information

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

CLASSIFICATION FOR SCALING METHODS IN DATA MINING CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department

More information

Trade-offs in Explanatory

Trade-offs in Explanatory 1 Trade-offs in Explanatory 21 st of February 2012 Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable

More information

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Offer Sharabi, Yi Sun, Mark Robinson, Rod Adams, Rene te Boekhorst, Alistair G. Rust, Neil Davey University of

More information

Experiments with Classifier Combining Rules

Experiments with Classifier Combining Rules Experiments with Classifier Combining Rules Robert P.W. Duin, David M.J. Tax Pattern Recognition Group, Department of Applied Physics Delft University of Technology, The Netherlands 1 Abstract. A large

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Statistical dependence measure for feature selection in microarray datasets

Statistical dependence measure for feature selection in microarray datasets Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Features: representation, normalization, selection. Chapter e-9

Features: representation, normalization, selection. Chapter e-9 Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features

More information

Performance Comparison of the Automatic Data Reduction System (ADRS)

Performance Comparison of the Automatic Data Reduction System (ADRS) Performance Comparison of the Automatic Data Reduction System (ADRS) Dan Patterson a, David Turner a, Arturo Concepcion a, and Robert Lynch b a Department of Computer Science, California State University,

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques M. Lazarescu 1,2, H. Bunke 1, and S. Venkatesh 2 1 Computer Science Department, University of Bern, Switzerland 2 School of

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Framework for Sense Disambiguation of Mathematical Expressions

Framework for Sense Disambiguation of Mathematical Expressions Proc. 14th Int. Conf. on Global Research and Education, Inter-Academia 2015 JJAP Conf. Proc. 4 (2016) 011609 2016 The Japan Society of Applied Physics Framework for Sense Disambiguation of Mathematical

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition

More information

Sensor Based Time Series Classification of Body Movement

Sensor Based Time Series Classification of Body Movement Sensor Based Time Series Classification of Body Movement Swapna Philip, Yu Cao*, and Ming Li Department of Computer Science California State University, Fresno Fresno, CA, U.S.A swapna.philip@gmail.com,

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems Comparative Study of Instance Based Learning and Back Propagation for Classification Problems 1 Nadia Kanwal, 2 Erkan Bostanci 1 Department of Computer Science, Lahore College for Women University, Lahore,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Information Driven Healthcare:

Information Driven Healthcare: Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Annie Chen ANNIEC@CSE.UNSW.EDU.AU Gary Donovan GARYD@CSE.UNSW.EDU.AU

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing Asif Dhanani Seung Yeon Lee Phitchaya Phothilimthana Zachary Pardos Electrical Engineering and Computer Sciences

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Improving Imputation Accuracy in Ordinal Data Using Classification

Improving Imputation Accuracy in Ordinal Data Using Classification Improving Imputation Accuracy in Ordinal Data Using Classification Shafiq Alam 1, Gillian Dobbie, and XiaoBin Sun 1 Faculty of Business and IT, Whitireia Community Polytechnic, Auckland, New Zealand shafiq.alam@whitireia.ac.nz

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Constraint Projections for Ensemble Learning

Constraint Projections for Ensemble Learning Constraint Projections for Ensemble Learning Daoqiang Zhang Songcan Chen Zhi-Hua Zhou 2 Qiang Yang 3 Department of Computer Science & Engineering, Nanjing University of Aeronautics & Astronautics, China

More information

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition using Eigenfaces SMAI Course Project Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Mondrian Forests: Efficient Online Random Forests

Mondrian Forests: Efficient Online Random Forests Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan Joint work with Daniel M. Roy and Yee Whye Teh 1 Outline Background and Motivation Mondrian Forests Randomization mechanism Online

More information

Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter

Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter Marcin Blachnik 1), Włodzisław Duch 2), Adam Kachel 1), Jacek Biesiada 1,3) 1) Silesian University

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

arxiv: v1 [cs.lg] 1 Nov 2018

arxiv: v1 [cs.lg] 1 Nov 2018 Can automated smoothing significantly improve benchmark time series classification algorithms? arxiv:1811.00894v1 [cs.lg] 1 Nov 2018 James Large Paul Southam Anthony Bagnall November 2018 Abstract tl;dr:

More information

Perceptron-Based Oblique Tree (P-BOT)

Perceptron-Based Oblique Tree (P-BOT) Perceptron-Based Oblique Tree (P-BOT) Ben Axelrod Stephen Campos John Envarli G.I.T. G.I.T. G.I.T. baxelrod@cc.gatech sjcampos@cc.gatech envarli@cc.gatech Abstract Decision trees are simple and fast data

More information

Efficient Voting Prediction for Pairwise Multilabel Classification

Efficient Voting Prediction for Pairwise Multilabel Classification Efficient Voting Prediction for Pairwise Multilabel Classification Eneldo Loza Mencía, Sang-Hyeun Park and Johannes Fürnkranz TU-Darmstadt - Knowledge Engineering Group Hochschulstr. 10 - Darmstadt - Germany

More information

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Kunal Sharma, Nov 26 th 2018 Dr. Lewe, Dr. Duncan Areospace Design Lab Georgia Institute of Technology Objective

More information