An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

Similar documents
An Empirical Study on feature selection for Data Classification

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

A Novel Feature Selection Framework for Automatic Web Page Classification

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Performance Analysis of Data Mining Classification Techniques

Analysis of Feature Selection Techniques: A Data Mining Approach

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

KBSVM: KMeans-based SVM for Business Intelligence

Experimental analysis of methods for imputation of missing values in databases

Version Space Support Vector Machines: An Extended Paper

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Slides for Data Mining by I. H. Witten and E. Frank

SVM Classification in Multiclass Letter Recognition System

Reihe Informatik 10/2001. Efficient Feature Subset Selection for Support Vector Machines. Matthias Heiler, Daniel Cremers, Christoph Schnörr

Machine Learning Techniques for Data Mining

Unsupervised Discretization using Tree-based Density Estimation

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Fisher Score Dimensionality Reduction for Svm Classification Arunasakthi. K, KamatchiPriya.L, Askerunisa.A

A Comparative Study of Selected Classification Algorithms of Data Mining

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

A study of classification algorithms using Rapidminer

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Learning to Recognize Faces in Realistic Conditions

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

Correlation Based Feature Selection with Irrelevant Feature Removal

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Face Detection using Hierarchical SVM

Cluster based boosting for high dimensional data

Data mining with Support Vector Machine

All lecture slides will be available at CSC2515_Winter15.html

Efficient Pairwise Classification

Iteration Reduction K Means Clustering Algorithm

Univariate Margin Tree

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Applying Supervised Learning

The Role of Biomedical Dataset in Classification

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

MIXED VARIABLE ANT COLONY OPTIMIZATION TECHNIQUE FOR FEATURE SUBSET SELECTION AND MODEL SELECTION

Machine Learning for NLP

A Cloud Based Intrusion Detection System Using BPN Classifier

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Categorization of Sequential Data using Associative Classifiers

Classification and Feature Selection Techniques in Data Mining

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

International Journal of Software and Web Sciences (IJSWS)

Improved DAG SVM: A New Method for Multi-Class SVM Classification

SUPPORT VECTOR MACHINES

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

An Algorithm for user Identification for Web Usage Mining

An Efficient Clustering for Crime Analysis

Efficient Pairwise Classification

Keywords: clustering algorithms, unsupervised learning, cluster validity

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Improving Recognition through Object Sub-categorization

An Efficient Approach for Color Pattern Matching Using Image Mining

Data Preprocessing. Data Preprocessing


Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

CS229 Final Project: Predicting Expected Response Times

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

CSE 158. Web Mining and Recommender Systems. Midterm recap

Filter methods for feature selection. A comparative study

FEATURE SELECTION BASED ON INFORMATION THEORY, CONSISTENCY AND SEPARABILITY INDICES.

Naïve Bayes for text classification

Using Graphs to Improve Activity Prediction in Smart Environments based on Motion Sensor Data

Medical datamining with a new algorithm for Feature Selection and Naïve Bayesian classifier

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

I. INTRODUCTION II. RELATED WORK.

Supervised Learning Classification Algorithms Comparison

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Classification with Diffuse or Incomplete Information

A Performance Assessment on Various Data mining Tool Using Support Vector Machine

Application of Data Mining in Manufacturing Industry

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*

SSV Criterion Based Discretization for Naive Bayes Classifiers

Fraud Detection using Machine Learning

2.5 A STORM-TYPE CLASSIFIER USING SUPPORT VECTOR MACHINES AND FUZZY LOGIC

Lecture 9: Support Vector Machines

6.034 Design Assignment 2

Data Mining in Bioinformatics Day 1: Classification

Transcription:

Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy and S. Appavu Alias Balamurugan Abstract--- This paper proposes one method of feature selection by using Support Vector Machine. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to Zero Probability and One Probability classification decisions. Then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 6 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms. Keywords--- Feature Selection, Classification, Data Mining J48, K-Star, SMO I. INTRODUCTION D ATA mining is a form of knowledge discovery essential for solving problems in a specific domain. Classification is a technique used for discovering classes of unknown data. As the world grows in complexity, overwhelming us with the data it generates, data mining becomes the only hope for elucidating the patterns that underlie it [1]. The manual process of data analysis becomes tedious as size of data grows and the number of dimensions increases, so the process of data analysis needs to be computerized. The term Knowledge Discovery from data(kdd) refers to the automated process of knowledge discovery from databases. The process of KDD is comprised of many steps namely data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation [1].A "feature" or "attribute" or "variable" refers to an aspect of the data. Usually before collecting data, features are specified or chosen. Features can be discrete, continuous, or nominal. Generally, features are characterized as: Relevant: These are features which have an influence on the output and their role cannot be assumed by the rest [1].Irrelevant: Irrelevant features are defined as those features not having any influence on the output, and whose values are generated at random for each example. Redundant: A redundancy exists whenever a feature can take the role of another (perhaps the simplest way to model redundancy).problem of selecting some subset of a learning algorithms input variables upon which it should focus attention, while ignoring the rest. Feature selection is the process of selecting the best feature among all the features because all the features are not useful in constructing the clusters: some features may be redundant or irrelevant thus not contributing to the learning process [1]. Feature selection, a process of choosing a subset of features from the original ones, is frequently used as a preprocessing technique in data mining. It has proven effective in reducing dimensionality, improving mining efficiency, increasing mining accuracy, and enhancing result comprehensibility. Feature selection methods can broadly fall into the wrapper model and the filter model [1].The quality of the data is one such factor. if information is irrelevant or redundant, or the data is noisy and unreliable, then knowledge discovery during training is more difficult. Feature subset Selection is the process of identifying and removing as much of the irrelevant and redundant information as possible. Machine learning algorithms differ in the amount of emphasis they place on feature selection [2]. II. PROPOSED WORK In this paper, we introduce a novel approach for feature selection in high dimensional data using Support Vector Machine. Support Vector Machines [19] are basically binary classification algorithms. Support Vector Machines (SVM) are A.Veeraswamy, Research Scholar, Veltech Dr.RR & Dr.SR Technical University, Chennai, TamilNadu, India. S. Appavu Alias Balamurugan, Professor & Research Coordinator, KLN College of Information Technology, Madurai, TamilNadu, India.

Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 428 a classification system derived from statistical learning theory. It has been applied successfully in fields such as text categorization, hand-written character recognition, image classification, bio sequences analysis, etc. The SVM separates the classes with a decision surface that maximizes the margin between the classes. The surface is often called the optimal hyper plane, and the data points closest to the hyper plane are called support vectors. The support vectors are the critical elements of the training set. The mechanism that defines the mapping process is called the kernel function [5]. The SVM can be adapted to become a nonlinear classifier through the use of nonlinear kernels. SVM can function as a multiclass classifier by combining several binary SVM classifiers. The output of SVM classification is the decision values of each pixel for each class, which are used for probability estimates. The probability values represent "true" probability in the sense that each probability falls in the range of 0 to 1, and the sum of these values for each pixel equals 1.Classification is then performed by selecting the highest probability. SVM includes a penalty parameter that allows a certain degree of misclassification, which is particularly important for no separable training sets. The penalty parameter controls the trade-off between allowing training errors and forcing rigid margins. It creates a soft margin that permits some misclassifications, such as it allows some training points on the wrong side of the hyper plane. Increasing the value of the penalty parameter increases the cost of misclassifying points and forces the creation of amore accurate model that may not generalize well [4]. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 6 datasets from University of California, Irvine (UCI) machine learning databases [6]. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms. Fig 1: Data Flow Analysis Diagram for Feature Selection with Classification III. SYSTEM IMPLEMENTATION The proposed algorithm is implemented using Java. The stepwise approach is as follows. The input to the system is given as an attribute-relation file format (ARFF) file. A table is created in Oracle using the name specified in @relation". The attributes specified under @attribute" and instances specified under @data" are retrieved from the ARFF le and then they are added to the created table. This procedure is followed for providing the training set as well as test set. The created table acts as the dataset and is given as the input to the proposed algorithm. The combination of attribute value should occur at least once in the dataset, because while finding the dependency between attribute values if a combination of attribute value did not occur once, then it will lead to alternate zeros resulting in zero probability and dependency that cannot be found. Thus, the above condition is checked before a combination of attribute value is given to the proposed method. The probabilities are calculated for the given input. Based on the probabilities, the dependent attributes are identified. IV. EXPERIMENTAL RESULTS AND DISCUSSION The feature selection using SVM is applied to many datasets, and the performance evaluation is done. We presented the performance evaluation on 6 datasets. All together 8 datasets are selected from the UCI machine learning repository and the UCI knowledge discovery in databases (KDD) archive [6].

Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 429 A summary of datasets is presented in Table 1. For each dataset, we run all seven feature-selection algorithms, SVM, consistency subset eval, Info Gain attribute eval, Gain Ratio attribute eval, OneR attribute eval, Chi Squared attribute eval, principal components, classifier subset eval, respectively, and record the number of selected features for each algorithm. We then apply SVM,Naïve Bayes, decision tree (J 48), K-Star,Random Tree, OneR on the original dataset as well as each newly obtained dataset containing only the selected features from each algorithm and recorded the overall accuracy by 10 fold cross validation. From Table 2, it is found that for all the datasets, some feature selection algorithms will select the attributes that are fewer than the number of attributes selected by the proposed method. However, when the selected attributes by the above specified existing feature selection methods are used for the classification. The main idea in the proposed method is finding the dependency between the attributes in deciding the class attributes value, and also the probabilities will decide the dependency between a set of attributes. Therefore, the proposed method removes the dependent attributes and identifies the perfect attributes which are sufficient for the classification of the datasets and also improve the classification accuracy. Table 1: Details Description of Dataset used in the Experiment Dataset No. of Attributes No. Of Instances Selected Attributes on Proposed Algorithm 1 anneal 39 898 5 2 audiology 70 226 24 3 balance-scale 5 625 3 4 kr-vs-kp 37 3196 2 5 tic-tac-toe 10 958 2 6 vehicle 19 846 4 Fig 2: Number of Features Selected by the Proposed Method Table 2: Number of Selected Features for each Feature Selection Algorithm Dataset Cfs Chi Gain Ratio Info Gain One Attribute PCA Constituent Subset Value 1 anneal 7 38 38 38 38 37 8 5 2 audiology 6 69 69 69 69 63 13 24 balancescale 3 1 4 4 4 4 4 4 3 4 kr-vs-kp 3 36 36 36 36 31 6 2 5 tic-tac-toe 1 9 9 9 9 16 8 2 6 vehicle 11 18 18 18 18 7 18 4 SMO

Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 430 Fig 3: No. Of Features selected by each Feature Selection Algorithm Table 3: Accuracy of each Classification Algorithm Applying for each Dataset Dataset K-Star ONE-R J48 SMO Naïve bayes Random Tree 1 anneal 95.66 83.63 97.22 97.33 85.97 96.43 2 audiology 70.80 46.46 69.47 81.86 68.14 59.29 3 balance-scale 63.52 56.32 63.52 87.52 63.52 77.76 4 kr-vs-kp 90.43 66.45 90.43 95.56 90.43 88.45 5 tic-tac-toe 69.94 69.93 69.94 98.32 69.94 73.48 6 vehicle 66.43 51.53 68.32 74.34 48.46 64.30 Average Accuracy 76.13 62.39 76.48 89.15 71.08 76.62 Table 3 Specifies the Accuracy of all Classification Algorithms like K-Star, OneR, J48,SVM, Naïve Bayes and Random Tree. These Accuracy are Compared by SVM. Fig 4: Accuracy of each Classification algorithm applying for each Dataset

Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 431 Table 4: Applying Each Feature Selection Algorithm with K-Star and Compared With Proposed Algorithm Dataset Cfs Chi Gain Ratio Info Gain One Attribute PCA Constituent Subset Value K-Star SMO 1 anneal 88.08 98.99 95.76 98.66 98.66 96.1 88.08 95.66 97.33 2 audiology 53.53 99.55 99.55 99.55 99.55 99.11 50.01 70.80 81.86 3 balance-scale 74.72 45.76 45.76 45.76 88.48 45.76 45.76 63.52 87.52 4 kr-vs-kp 70.91 72.62 72.62 72.62 72.62 92.45 70.52 90.43 95.56 5 tic-tac-toe 82.46 17.84 17.84 17.84 17.84 96.86 39.03 69.94 98.32 6 vehicle 66.78 25.65 25.65 25.65 25.65 68.55 25.65 66.43 74.34 Average 72.75 60.07 59.53 60.01 67.13 83.14 53.18 76.13 89.15 V. CONCLUSION Fig 5: Performance Analysis Graph by using the Proposed Algorithm SVM This paper proposes a feature selection algorithm based on SVM. The algorithm can remove redundancy from the original dataset. The main idea provided is to find the dependent attributes and remove the redundant ones among them. The technology to obtain the dependency needed is based on SVM. A new attribute reduction algorithm of using SVM is implemented and evaluated through extensive experiments via comparison with related attribute reduction algorithms. In this paper, we consider the task of feature selection and investigate the performance of nine feature selection algorithms. Our findings can be summarized as follows: 1) In feature selection approach, we have shown that SVM is a promising approach for automatic feature selection. It outperforms most existing algorithms in terms of number of selected features, classification accuracy. Wellestablished algorithms, such as InfoGain attribute eval, GainRatio attribute eval, OneR attribute eval, ChiSquared attribute eval, principal components are also more complex than SVM feature Selection,SVM based feature selection runs very efficiently on large datasets, which makes it very attractive for feature selection in high dimensional data. 2) We have implemented a new feature selector using SVM and found that it performs better thanthe popular and computationally expensive traditional algorithms. 3) We compared the performance of a number of algorithms on the UCI machine learning repository datasets. REFERENCES [1] Classification and Feature Selection Techniques in Data Mining, Sunita Beniwal*, Jitender Arora International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 6, August 2012 ISSN: 2278-0181. [2] Feature Selection for Discrete and Numeric Class Machine Learning, Mark A. Hall, University of Waikato, Hamilton, New Zealand. [3] Effective and Efficient Feature Selection for Large-scale Data Using Bayes Theorem, Subramanian Appavu Alias Balamurugan, Ramasamy Rajaram, International Journal of Automation and Computing, February 2009, 62-71. [4] C.W. Hsu, C.C. Chang and C.J. Lin, A practical guide to support vector classification, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, 2003. [5] V. N. Vapnik, Statistical Learning Theory, Wiley New York., 1998 [6] C. L. Blake, C. J. Merz. UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine, USA, [Online],Available: http://www.ics.uci.edu/ ~ mlearn, 1998.