Using Decision Boundary to Analyze Classifiers

Size: px
Start display at page:

Download "Using Decision Boundary to Analyze Classifiers"

Transcription

1 Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China Abstract In this paper we propose to use decision boundary to analyze classifiers. Two algorithms called decision boundary point set (DBPS) and decision boundary neuron set (DBNS) are proposed to obtain the data on the decision boundary. Based on DBNS, a visualization algorithm called SOM based decision boundary visualization (SOMDBV) is proposed to visualize the high-dimensional classifiers. The decision boundary can give an insight into classifiers, which cannot be supplied by accuracy. And it can be applied to select proper classifier, to analyze the tradeoff between accuracy and comprehensibility, to discovery the chance of over-fitting, to calculate the similarity of models generated by different classifiers. Experimental results demonstrate the usefulness of the method. 1. Introduction Classification is an important problem in machine learning, and it has many applications in real world. There are many classifiers now [1], whose performance is usually estimated by accuracy. Accuracy is the proportion of correct predictions to all predictions [1]. But accuracy is a raw performance score, and it cannot give much insight to classifiers [2]. Accuracy cannot tell which of the data are classified right and which are not, and it is unable to reveal the relative positions of the data predicted correct and incorrect. Data sets in real world are mostly highdimensional. Users usually get an intuitive insight by high-dimensional data visualization algorithms, which are unsupervised. The class boundary cannot be clearly visualized by these algorithms [3]. Because there is no powerful tools users cannot understand classifiers very well. [2] proposes to use decision region connectivity to analyze high-dimensional classifiers, which can be used to analyze convexity of decision region. The algorithm is independent of the dimension of the data set. [3] proposes an algorithm named SVMV to visualize classification results of Support Vector Machine(SVM) [4] using self-organizing mapping(som) [5]. The algorithm can clearly visualize SVM classification boundary and the distance between data and classification boundary in a 2-D map. But the algorithm substitutes weight matrix of SOM for input data of SVM decision function, which limits its application to other classifiers. The method for using decision boundary to analyze classifiers is proposed in this paper. Decision boundary is the distinguishing boundary classifier uses to predict data, so the predicting labels of the data on the two sides of boundary are different. Two algorithms are provided to obtain the data on the classifier s decision boundary. The first algorithm named decision boundary point set (DBPS) is used to get the points near the decision boundary of classifiers. The second algorithm named decision boundary neuron set (DBNS) is used to get the neurons of SOM near the decision boundary of classifiers. Based on DBNS an algorithm named SOM based decision boundary visualization (SOMDBV) is proposed to visualize the decision boundary of high-dimensional classifiers in a 2-D SOM map. In the next section, the procedures of DBPS, DBNS and SOMDBV are described. In section 3, analysis of classifiers using decision boundary is given. In section 4, experiments are performed to demonstrate the usefulness of the algorithms and analysis proposed. Conclusion is drawn in section 5. We assume the output of classifiers is discrete class label, not the probability of input data to belong to some class, although the latter one can be transformed to the former easily. 2. Decision boundary algorithms In this section, we describe the details of following three decision boundary algorithms, DBPS algorithm, DBNS algorithm and SOMDBV algorithm.

2 A model will be obtained after a classifier is trained on the training data set. When the new data is coming, the model will be used to predict the labels of the data. That is the normal usage of classifiers. Some classifiers behave like a white box, and can provide users with comprehensible results. For example, RIPPER [6], a well-known rule-based algorithm, learns set of rules, and the obtained rules give users a good understanding. But some other classifiers behave like a black box, and users are unable to understand what they have learned! SVM is an example of this kind of classifiers. The knowledge obtained by the trained SVM model is hidden in the decision function, which is complicated and abstract for user to understand. Users even do not know what has happened in the latter case. However, every classifier predicts the labels of data according to some guide lines. For example, RIPPER predicts the labels according to the rule set it has learned, while SVM predicts the labels according to the decision function it has trained. The guide lines, in spite of their forms, form decision boundaries in the input data space. The procedure of prediction can be seen as finding the relation between input data and boundaries. Using a trained classifier to classify data is equal to using the decision boundary of the same classifier to partition data. If the decision boundary of the classifier is obtained and visualized, users will have an insight into the classifier, which will help users to select proper classifier. The forms of the knowledge which classifiers adopt to construct decision boundary are diverse, so acquiring the analytical equations of the decision boundary is an exhausting task. Instead we obtain some sample points on the decision boundary to analyze classifiers. DBPS algorithm is used to obtain the sample point set near the decision boundary in the input space, while DBNS algorithm is used to obtain the sample neurons near the decision boundary on the 2-D SOM map, and SOMDBV makes use of DBNS to visualize the decision boundary on the 2-D SOM map DBPS algorithm There are two methods for obtaining the points on the decision boundary. The first one is internal method, which uses classifiers internal forms of knowledge to obtain the points on the boundary. For example, using the decision function of SVM we can compute the points on the boundary. The second one is external method, which uses some approximation methods to get the points near the boundary. The first method s advantage is that it can generate accurate points on the boundary, and its disadvantage is that every classifier needs its own implementation of this method because classifiers forms of knowledge are diverse. While the second method can be applied to more classifiers, but the points generated are not as accurate as the first one. DBPS algorithm adopts the second method, and generates the points near the boundary. DBPS uses binary search to calculate the points of intersection of decision boundary and connection between two data points predicted as different by classifiers. The detail of DBPS can be seen in Algorithm 1. Users can control the precision of the points by adjusting iter_no and toler. Algorithm 1* The Decision Boundary Point Set Algorithm generates the point set near the boundary X is the set of sample points. B is the set of decision boundary points. c(x) is the classifier function. iter_no is the limit of iterations numbers. toler is the tolerance of the boundary. for all x X do if c(x) == a then Xa Xa {x} else Xb Xb {x} for all xa Xa do for all xb Xb do B B {DBP (xa, xb, c, iter_no, toler)} function DBP (x1, x2, c, iter_no, toler) x_bound (x1 + x2) / 2 for i = 1 : iter_no do if distance(x1, x2) / 2 < toler then break if c(x_bound) == c(x1) then x1 x_bound else x2 x_bound x_bound (x1 + x2) / 2 return x_bound * The function distance(x1, x2) is trivial, so we do not describe the procedure of this function here.

3 If the classifier is high-dimensional, it needs highdimensional data visualization algorithm to visualize the obtained decision boundary point set DBNS algorithm There are two methods for visualizing the decision boundary of a high-dimensional classifier. The first one is to use DBPS algorithm to obtain the decision boundary point set in input space which is visualized by some high-dimensional data visualization method. The second one is to project the input data onto some low-dimensional map and calculate the point set on the decision boundary in the map space. SOMDBV algorithm adopts the second method, and DBNS algorithm is used to obtain the neurons near the decision boundary on the 2-D SOM map. SVMV algorithm uses the decision function of SVM to calculate the distance between the neuron and the classification boundary [3]. In DBNS algorithm, classifiers employ the weights of neurons as input to predict the labels of the data projected to these neurons. [7] adopts interpolation to get extended weight matrix, which avoids high computational complexity. We adopted the same process to obtain the neurons near the decision boundary. The method used to get the neurons near the boundary is the external method in 2.1, which is the same as DBPS algorithm. The topology of SOM used in this paper is rectangle grid, the algorithm can be applied to other topologies easily, however. As seen in Figure 1(a), if four neurons of the rectangle are predicted the same labels, then we suppose there is no neurons inside the rectangle which are near the boundary. Otherwise we use the interpolation to get neurons e, f, g, h, i, then partition the rectangle to 4 smaller ones. And we continue partitioning the small rectangles whose labels are not the same until the times achieve the number user given (Figure 1(b)). At last the center neuron of the rectangle is selected as the one near the decision boundary (Figure 1(c)). (a) (b) (c) Figure 1. Three cases of finding the neurons near the decision boundary. (a) predictions are the same; (b) predictions are not the same; (c) the last step. The detail procedure of DBNS algorithm can be seen in Algorithm2. Algorithm 2* The Decision Boundary Neuron Set Algorithm finds the neuron set near the boundary N is the neurons of the SOM, whose size is m n. B is the set of neurons near the decision boundary. c(x) is the classification model. iter_no is the limit of iterations numbers. for i = 1 : m-1 do for j = 1 : n-1 do N[] {N(i,j), N(i+1,j), N(i+1,j+1), N(i,j+1)} B B GetDBNeuron(N[], c, iter_no) function GetDBNeuron(N[], c, iter_no) dbn = {} if c(n[]) are not the same then if iter_no == 1 then dbn {GetCenterNeuron(N[])} else N2[][] Partition(N[]) for i = 1:4 do dbn dbn GetDBNeuron(N2[i], c, iter_no-1) return dbn * The function GetCenterNeuron(N[]) and function Partition(N[]) are trivial, so we do not describe procedures of these two functions here SOMDBV algorithm The SOMDBV algorithm adopts the second method in 2.2. It first projects the data onto 2-D SOM map, then uses DBNS algorithm to obtain the neurons near the decision boundary, and at last display the labels of the data, classifier s predictions of each neuron and the neurons near the decision boundary. The procedure of SOMDBV algorithm is as follows: 1) Classifier is trained on data set X to get the classification model C. 2) SOM algorithm is trained on the same data set X to get the weights W. 3) C is used to classify the W, and gives predictions L. 4) DBNS algorithm is used to get the neuron set N near the decision boundary.

4 5) Input data set X, classifier s predictions L and decision boundary neuron set N are displayed on the 2- D SOM map. 3. Applications of decision boundary What decision boundary can be used to analyze is as follows: 1) The distance between the data and decision boundary is clearly understood by users, which cannot be provided by accuracy. This will help user to select proper classifier. The classifier with boundary in the middle of the data belonging to different class is usually better than the classifier with boundary near data of one class and far from data of other class. It is also able to tell users in which region of the data space the classifier makes incorrect predictions. If users know the region which the new data is likely to fall into and there are several classifiers, they may be able to choose the proper classifier. 2) There is the tradeoff between accuracy and comprehensibility in data mining models [8]. The visualization of decision boundary is able to give an insight into the classifiers with high accuracy which usually results in low comprehensibility. The visualization will help users to analyze the tradeoff between accuracy and comprehensibility. 3) Over-fitting is struggled to avoid by classier users. Visualization of decision boundary can give insight to over-fitting. Given the same accuracy, the generalization of classifiers with complicated decision boundary is usually not as good as the ones with simpler decision boundary. This can help users to select the classifier with higher generalization, or set the proper parameters for classifier to obtain a more general model. 4) Decision boundary can be adopted to define the similarity of two models obtained by different classifiers. For example, the proportion of the region two classifiers predict the same labels to the whole region the data fall into may be a measure of models similarity. Then we can conclude two models are the same in the case of some given similarity. Given the similarity, one model may be able to be transformed into the model trained by another classifier, which can overcome the drawback of some classifiers. For example, trained artificial neural network (ANN) can be transformed into rule set by extracting rules from ANN, which improves the comprehensibility of the trained ANN with high accuracy [9]. The method for calculating similarity can be used to calculate the fidelity for extracting rules for ANN [9]. 5) Diversity among the base classifiers is important when constructing a classifier ensemble [10]. The decision boundary can be used to calculate the diversity. For example, the integral of the difference between two classifiers decision boundaries may be a measure of diversity, which reflects the difference of partitions of the data space between two classifiers. 4. Experimental results In this section, two experiments are performed to demonstrate the usefulness of proposed algorithms. Classifiers we used are RIPPER and SVM, and WEKA [11] is used for the implementation of these two classifiers. Gaussian kernel with parameter gamma is used as the kernel function of SVM. Implementation of SOM is from MATLAB SOM Toolbox. For SOM, the total iteration numbers are 1000, and the topology is grid topology. The size of SOM is Experimental results of DBPS DBPS algorithm is used to generate the decision boundary point sets of RIPPER and SVM for diamond data. Diamond data is two-class simulation data with 2 dimensions, whose boundary is diamond, and its two diagonals length is Each class has 100 data points generated randomly. The results are shown in Figure 2, where cross symbols denote the data inside the diamond, while star symbols denote the data outside the diamond, and line between data of different classes is decision boundary. The decision boundary generated by RIPPER is like a cross, while the one generated by SVM is like a diamond as seen in Figure 2. The decision boundary generated by SVM is almost in the middle of the data of different class, while the decision boundary by RIPPER is close to data of one class and far from the data of other one. The position of decision boundary by SVM is more proper than that by RIPPER, so SVM will be the proper one for diamond data. At the same time, the shape of decision boundary generated by RIPPER is more regular, which can be understood better by users. The decision boundary by SVM is more complicated, and it is more difficult for user to understand it. In this experiment, there is the tradeoff between a powerful model with high accuracy and a transparent model with high comprehensibility.

5 (a) (a) (b) Figure 2. Decision boundary point set (a) by RIPPER; (b) by SVM using Gaussian kernel with gamma = Experimental results of SOMDBV The data set for SOMDBV algorithm is Johns Hopkins University Ionosphere database, which is from UCI machine learning repository [12]. The data set contains 351 records with 34 dimensions, of which 225 records are labeled Good, and 126 are labeled Bad. The results are shown in Figure 3, where square symbols denote the data of class Bad, and triangle symbols denote the data of class Good. Cross symbols denote the neurons predicted Bad, and dot symbols denote the neurons predicted Good. The line of Figure 3 denotes the decision boundary. As the analysis of 4.1, SVM in Figure 3(b) is more proper than RIPPER. (b) (c) Figure 3. Visualization of the Ionosphere data set (a) by RIPPER; (b) by SVM using Gaussian kernel with gamma = 2; (c) by SVM using Gaussian kernel with gamma = 20.

6 As seen in Figure 3(b) and Figure 3(c), the decision boundary generated by SVM using Gaussian kernel with gamma being 20 is more complicated than that generated by SVM with gamma being 2. So the SVM using Gaussian kernel with gamma being 20 is more likely to over-fit the data. And this conclusion agrees with the experience and the common sense. The number of neurons predicted the same labels by RIPPER and SVM with gamma being 2 is larger than that by SVM with gamma being 20 and SVM with gamma being 2. So although the two SVM models are generated by the same classifier using different parameter, their similarity is less than that of SVM with gamma being 2 and RIPPER. 5. Conclusion and future work In this paper, a novel method for using decision boundary to analyze classifiers is proposed. Two algorithms are proposed to obtain data on decision boundary in different spaces. DBPS algorithm is used to obtain point set on decision boundary in input data space, while DBNS algorithm is used to obtain neuron set on decision boundary on 2-D SOM map. SOMDBV algorithm using DBNS algorithm is proposed to visualize the decision boundary of highdimensional classifiers. With the help of decision boundary, users can get an insight into the classifiers. Decision boundary can be used to select proper classifier, to reveal the tradeoff between accuracy and comprehensibility, to detect over-fitting, to calculate the similarity of classifiers and to calculate diversity in ensemble learning. This paper has not supplied calculation method for obtaining similarity and diversity. This work will be done in future, and the decision boundary will be used to analyze extracting rules from ANN and ensemble learning. Acknowledgements [3] X. Wang, S. Wu, and Q. Li, SVMV a novel algorithm for the visualization of SVM classification results, Advances in Neural Networks - ISNN 2006, Springer-Verlag, Berlin Heidelberg, 2006, pp [4] Vapnik, V.N., The nature of Statistical Learning Theory, Springer, Berlin Heidelberg, [5] Kohonen, T., Self-Organizing Maps, Springer, Berlin Heidelberg, [6] W. Cohen, Fast effective rule induction, Proceedings of the 12th International Conference on Machine Learning, Morgan Kaufmann, Tahoe City, CA, 1995, pp [7] S. Wu, and W.S. Chow, Support vector visualization and clustering using self-organization map and support vector one-class classification, Proceedings of IEEE International Joint Conference on Neural Networks, Portland, USA, 2003, pp [8] U. Johansson, L. Niklasson, and R. König, Accuracy vs. comprehensibility in data mining models, Proceedings of 7th International Conference on Information Fusion, Stockholm, Sweden, 2004, pp [9] R. Andrews, J. Diederich, and A.B. Tickle, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-Based Systems, Elsevier, Amsterdam, 1995, pp [10] E.K. Tang, P.N. Suganthan, and X. Yao, An analysis of diversity measures, Machine Learning, Springer, Berlin Heidelberg, 2006, pp [11] Witten, I.H., and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, [12] P.M. Murphy, and D.W. Aha, UCI Repository of machine learning databases [ Irvine, CA: University of California, Department of Information and Computer Science This paper is supported by 863 plan (No. 2007AA01Z197), and National Natural Science Foundation of China (No ). References [1] S.B. Kotsiantis, I.D. Zaharakis, and P.E. Pintelas, Machine learning: a review of classification and combining techniques, Artificial Intelligence Review, Springer, Berlin Heidelberg, 2006, pp [2] O. Melnik, Decision region connectivity analysis: a method for analyzing high-dimensional classifiers, Machine Learning, Kluwer, Netherlands, 2002, pp

Rule extraction from support vector machines

Rule extraction from support vector machines Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800

More information

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

More information

Graph projection techniques for Self-Organizing Maps

Graph projection techniques for Self-Organizing Maps Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Rule Based Learning Systems from SVM and RBFNN

Rule Based Learning Systems from SVM and RBFNN Rule Based Learning Systems from SVM and RBFNN Haydemar Núñez 1, Cecilio Angulo 2 and Andreu Català 2 1 Laboratorio de Inteligencia Artificial, Universidad Central de Venezuela. Caracas, Venezuela hnunez@strix.ciens.ucv.ve

More information

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan

More information

Support Vector Regression for Software Reliability Growth Modeling and Prediction

Support Vector Regression for Software Reliability Growth Modeling and Prediction Support Vector Regression for Software Reliability Growth Modeling and Prediction 925 Fei Xing 1 and Ping Guo 2 1 Department of Computer Science Beijing Normal University, Beijing 100875, China xsoar@163.com

More information

Feature-weighted k-nearest Neighbor Classifier

Feature-weighted k-nearest Neighbor Classifier Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Cluster Analysis using Spherical SOM

Cluster Analysis using Spherical SOM Cluster Analysis using Spherical SOM H. Tokutaka 1, P.K. Kihato 2, K. Fujimura 2 and M. Ohkita 2 1) SOM Japan Co-LTD, 2) Electrical and Electronic Department, Tottori University Email: {tokutaka@somj.com,

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

A Lazy Approach for Machine Learning Algorithms

A Lazy Approach for Machine Learning Algorithms A Lazy Approach for Machine Learning Algorithms Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi Abstract Most machine learning algorithms are eager methods in the sense that a model is generated

More information

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX THESIS CHONDRODIMA EVANGELIA Supervisor: Dr. Alex Alexandridis,

More information

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

An ICA-Based Multivariate Discretization Algorithm

An ICA-Based Multivariate Discretization Algorithm An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of

More information

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

Clustering of Data with Mixed Attributes based on Unified Similarity Metric Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1

More information

Fabric Defect Detection Based on Computer Vision

Fabric Defect Detection Based on Computer Vision Fabric Defect Detection Based on Computer Vision Jing Sun and Zhiyu Zhou College of Information and Electronics, Zhejiang Sci-Tech University, Hangzhou, China {jings531,zhouzhiyu1993}@163.com Abstract.

More information

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION Justin Z. Zhan, LiWu Chang, Stan Matwin Abstract We propose a new scheme for multiple parties to conduct data mining computations without disclosing

More information

The Role of Biomedical Dataset in Classification

The Role of Biomedical Dataset in Classification The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

Cost-sensitive C4.5 with post-pruning and competition

Cost-sensitive C4.5 with post-pruning and competition Cost-sensitive C4.5 with post-pruning and competition Zilong Xu, Fan Min, William Zhu Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363, China Abstract Decision tree is an effective

More information

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA 1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

A Two-phase Distributed Training Algorithm for Linear SVM in WSN

A Two-phase Distributed Training Algorithm for Linear SVM in WSN Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 015) Barcelona, Spain July 13-14, 015 Paper o. 30 A wo-phase Distributed raining Algorithm for Linear

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Kernel Methods and Visualization for Interval Data Mining

Kernel Methods and Visualization for Interval Data Mining Kernel Methods and Visualization for Interval Data Mining Thanh-Nghi Do 1 and François Poulet 2 1 College of Information Technology, Can Tho University, 1 Ly Tu Trong Street, Can Tho, VietNam (e-mail:

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps Stability Assessment of Electric Power Systems using Growing Gas and Self-Organizing Maps Christian Rehtanz, Carsten Leder University of Dortmund, 44221 Dortmund, Germany Abstract. Liberalized competitive

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 9805 jplatt@microsoft.com Abstract Training a Support Vector Machine

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Two-step Modified SOM for Parallel Calculation

Two-step Modified SOM for Parallel Calculation Two-step Modified SOM for Parallel Calculation Two-step Modified SOM for Parallel Calculation Petr Gajdoš and Pavel Moravec Petr Gajdoš and Pavel Moravec Department of Computer Science, FEECS, VŠB Technical

More information

The Effects of Outliers on Support Vector Machines

The Effects of Outliers on Support Vector Machines The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results

More information

Univariate Margin Tree

Univariate Margin Tree Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Controlling the spread of dynamic self-organising maps

Controlling the spread of dynamic self-organising maps Neural Comput & Applic (2004) 13: 168 174 DOI 10.1007/s00521-004-0419-y ORIGINAL ARTICLE L. D. Alahakoon Controlling the spread of dynamic self-organising maps Received: 7 April 2004 / Accepted: 20 April

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany {park,juffi}@ke.informatik.tu-darmstadt.de Abstract. Pairwise

More information

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-&3 -(' ( +-   % '.+ % ' -0(+$, The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure

More information

Individualized Error Estimation for Classification and Regression Models

Individualized Error Estimation for Classification and Regression Models Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models

More information

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Offer Sharabi, Yi Sun, Mark Robinson, Rod Adams, Rene te Boekhorst, Alistair G. Rust, Neil Davey University of

More information

Advanced visualization techniques for Self-Organizing Maps with graph-based methods

Advanced visualization techniques for Self-Organizing Maps with graph-based methods Advanced visualization techniques for Self-Organizing Maps with graph-based methods Georg Pölzlbauer 1, Andreas Rauber 1, and Michael Dittenbach 2 1 Department of Software Technology Vienna University

More information

Cluster analysis of 3D seismic data for oil and gas exploration

Cluster analysis of 3D seismic data for oil and gas exploration Data Mining VII: Data, Text and Web Mining and their Business Applications 63 Cluster analysis of 3D seismic data for oil and gas exploration D. R. S. Moraes, R. P. Espíndola, A. G. Evsukoff & N. F. F.

More information

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving

More information

ANALYZING AND OPTIMIZING ANT-CLUSTERING ALGORITHM BY USING NUMERICAL METHODS FOR EFFICIENT DATA MINING

ANALYZING AND OPTIMIZING ANT-CLUSTERING ALGORITHM BY USING NUMERICAL METHODS FOR EFFICIENT DATA MINING ANALYZING AND OPTIMIZING ANT-CLUSTERING ALGORITHM BY USING NUMERICAL METHODS FOR EFFICIENT DATA MINING Md. Asikur Rahman 1, Md. Mustafizur Rahman 2, Md. Mustafa Kamal Bhuiyan 3, and S. M. Shahnewaz 4 1

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

Cartographic Selection Using Self-Organizing Maps

Cartographic Selection Using Self-Organizing Maps 1 Cartographic Selection Using Self-Organizing Maps Bin Jiang 1 and Lars Harrie 2 1 Division of Geomatics, Institutionen för Teknik University of Gävle, SE-801 76 Gävle, Sweden e-mail: bin.jiang@hig.se

More information

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey

More information

Some questions of consensus building using co-association

Some questions of consensus building using co-association Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

Non-linear gating network for the large scale classification model CombNET-II

Non-linear gating network for the large scale classification model CombNET-II Non-linear gating network for the large scale classification model CombNET-II Mauricio Kugler, Toshiyuki Miyatani Susumu Kuroyanagi, Anto Satriyo Nugroho and Akira Iwata Department of Computer Science

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd.,

More information

Machine Learning : Clustering, Self-Organizing Maps

Machine Learning : Clustering, Self-Organizing Maps Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The

More information

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION 6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 3 March 2017, Page No. 20765-20769 Index Copernicus value (2015): 58.10 DOI: 18535/ijecs/v6i3.65 A Comparative

More information

Improving Classifier Performance by Imputing Missing Values using Discretization Method

Improving Classifier Performance by Imputing Missing Values using Discretization Method Improving Classifier Performance by Imputing Missing Values using Discretization Method E. CHANDRA BLESSIE Assistant Professor, Department of Computer Science, D.J.Academy for Managerial Excellence, Coimbatore,

More information

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Ritika Luthra Research Scholar Chandigarh University Gulshan Goyal Associate Professor Chandigarh University ABSTRACT Image Skeletonization

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

AKA: Logistic Regression Implementation

AKA: Logistic Regression Implementation AKA: Logistic Regression Implementation 1 Supervised classification is the problem of predicting to which category a new observation belongs. A category is chosen from a list of predefined categories.

More information

Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification

Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification Min Chen and Simone A. Ludwig Department of Computer Science North Dakota State University Fargo, ND, USA min.chen@my.ndsu.edu,

More information

Automatic Group-Outlier Detection

Automatic Group-Outlier Detection Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

A Boosting-Based Framework for Self-Similar and Non-linear Internet Traffic Prediction

A Boosting-Based Framework for Self-Similar and Non-linear Internet Traffic Prediction A Boosting-Based Framework for Self-Similar and Non-linear Internet Traffic Prediction Hanghang Tong 1, Chongrong Li 2, and Jingrui He 1 1 Department of Automation, Tsinghua University, Beijing 100084,

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Binoda Nand Prasad*, Mohit Rathore**, Geeta Gupta***, Tarandeep Singh**** *Guru Gobind Singh Indraprastha University,

More information

Well Analysis: Program psvm_welllogs

Well Analysis: Program psvm_welllogs Proximal Support Vector Machine Classification on Well Logs Overview Support vector machine (SVM) is a recent supervised machine learning technique that is widely used in text detection, image recognition

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Version Space Support Vector Machines: An Extended Paper

Version Space Support Vector Machines: An Extended Paper Version Space Support Vector Machines: An Extended Paper E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, G.I. Nalbantov 2, and S. Vanderlooy Abstract. We argue to use version spaces as an approach to reliable

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

ORT EP R RCH A ESE R P A IDI!  #$$% &' (# $! R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,

More information

Instantaneously trained neural networks with complex inputs

Instantaneously trained neural networks with complex inputs Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2003 Instantaneously trained neural networks with complex inputs Pritam Rajagopal Louisiana State University and Agricultural

More information

Discretizing Continuous Attributes Using Information Theory

Discretizing Continuous Attributes Using Information Theory Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification

More information

Local Linear Approximation for Kernel Methods: The Railway Kernel

Local Linear Approximation for Kernel Methods: The Railway Kernel Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Efficient Object Tracking Using K means and Radial Basis Function

Efficient Object Tracking Using K means and Radial Basis Function Efficient Object Tracing Using K means and Radial Basis Function Mr. Pradeep K. Deshmuh, Ms. Yogini Gholap University of Pune Department of Post Graduate Computer Engineering, JSPM S Rajarshi Shahu College

More information

5.6 Self-organizing maps (SOM) [Book, Sect. 10.3]

5.6 Self-organizing maps (SOM) [Book, Sect. 10.3] Ch.5 Classification and Clustering 5.6 Self-organizing maps (SOM) [Book, Sect. 10.3] The self-organizing map (SOM) method, introduced by Kohonen (1982, 2001), approximates a dataset in multidimensional

More information

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery

Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Annie Chen ANNIEC@CSE.UNSW.EDU.AU Gary Donovan GARYD@CSE.UNSW.EDU.AU

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information