Feature Selection as an Improving Step for Decision Tree Construction

Similar documents
Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Cluster Analysis of Electrical Behavior

The Research of Support Vector Machine in Agricultural Data Classification

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Classifier Selection Based on Data Complexity Measures *

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A Binarization Algorithm specialized on Document Images and Photos

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A Hill-climbing Landmarker Generation Algorithm Based on Efficiency and Correlativity Criteria

An Optimal Algorithm for Prufer Codes *

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

A Powerful Feature Selection approach based on Mutual Information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Support Vector Machines

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Meta-heuristics for Multidimensional Knapsack Problems

A Deflected Grid-based Algorithm for Clustering Analysis

Smoothing Spline ANOVA for variable screening

Performance Evaluation of Information Retrieval Systems

Data Mining: Model Evaluation

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

UB at GeoCLEF Department of Geography Abstract

Query Clustering Using a Hybrid Query Similarity Measure

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Network Intrusion Detection Based on PSO-SVM

Machine Learning: Algorithms and Applications

Classifier Ensemble Design using Artificial Bee Colony based Feature Selection

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

A fault tree analysis strategy using binary decision diagrams

X- Chart Using ANOM Approach

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Intelligent Information Acquisition for Improved Clustering

Available online at Available online at Advanced in Control Engineering and Information Science

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Detection of an Object by using Principal Component Analysis

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Solving two-person zero-sum game by Matlab

Associative Based Classification Algorithm For Diabetes Disease Prediction

An Image Fusion Approach Based on Segmentation Region

Related-Mode Attacks on CTR Encryption Mode

Web Document Classification Based on Fuzzy Association

Image Feature Selection Based on Ant Colony Optimization

APPLICATION OF IMPROVED K-MEANS ALGORITHM IN THE DELIVERY LOCATION

Feature Reduction and Selection

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

From Comparing Clusterings to Combining Clusterings

Unsupervised Learning

Ant Colony Optimization to Discover the Concealed Pattern in the Recruitment Process of an Industry

Analysis of Continuous Beams in General

Using Neural Networks and Support Vector Machines in Data Mining

An Anti-Noise Text Categorization Method based on Support Vector Machines *

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

Machine Learning 9. week

Feature Selection for Target Detection in SAR Images

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

A Lazy Ensemble Learning Method to Classification

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Support Vector Machines

An Improved Image Segmentation Algorithm Based on the Otsu Method

On Supporting Identification in a Hand-Based Biometric Framework

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Extraction of Fuzzy Rules from Trained Neural Network Using Evolutionary Algorithm *

User Authentication Based On Behavioral Mouse Dynamics Biometrics

A Statistical Model Selection Strategy Applied to Neural Networks

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Optimizing Document Scoring for Query Retrieval

Correlative features for the classification of textural images

Constructing Minimum Connected Dominating Set: Algorithmic approach

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

Collaboratively Regularized Nearest Points for Set Based Recognition

Parallel matrix-vector multiplication

Feature Subset Selection Based on Ant Colony Optimization and. Support Vector Machine

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Efficient Text Classification by Weighted Proximal SVM *

Classification / Regression Support Vector Machines

A Two-Stage Algorithm for Data Clustering

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Transcription:

2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor 2 1 Department of Computer Scence, Islamc Azad Unversty (Kashan Branch), Iran 2 Faculty of Informatcs, Unversty of Debrecen, Hungary Abstract. The removal of rrelevant or redundant attrbutes could beneft us n makng decsons and analyzng data effcently. Feature Selecton s one of the most mportant and frequently used technques n data preprocessng for data mnng. In ths paper, specal attenton s made on feature selecton for classfcaton wth labeled data. Here an algorthm s used that arranges attrbutes based on ther mportance usng two ndependent crtera. Then, the arranged attrbutes can be used as nput one smple and powerful algorthm for constructon decson tree (Oblvous Tree). Results ndcate that ths decson tree usng featured selected by proposed algorthm outperformed decson tree wthout feature selecton. From the expermental results, t s observed that, ths method generates smaller tree havng an acceptable accuracy. Keywords: Decson Tree, Feature Selecton, Classfcaton Rules, Oblvous Tree 1. Introducton Feature selecton plays an mportant role n data mnng tasks. Methods always perform better wth lower-dmensonal compared to hgher-dmensonal data. Irrelevant or redundant attrbutes as useless nformaton often nterfere wth useful ones. In the classfcaton task, the man am of feature selecton s to reduce the number of attrbutes used n classfcaton whle mantanng acceptable classfcaton accuracy. In optmal feature selecton, all possble feature combnatons should be searched. Ths searched space s exponentally prohbtve for exhaustve search even wth a moderate attrbutes. In ths case, the hgh computatonal cost s stll a problem unsolved. Under certan crcumstances, suboptmal feature selecton algorthms are an alternatve. Though suboptmal feature selecton algorthms do not guarantee the optmal soluton, the selected feature subset usually leads to a hgher performance n the nducton system (such as a classfer). Search may also be started wth a randomly selected subset n order to avod beng trapped nto local optmal [1]. Each feature selecton algorthm needs to be evaluated usng a certan crteron. An optmal subset selected utlzng one crteron may not be optmal accordng to another crteron. An evaluaton crteron can be broadly categorzed nto two groups based on ther dependency on mnng algorthms that wll fnally be appled on the selected feature subset [2]. An ndependent crteron, as the name suggests, tres to evaluate a feature subset by characterstcs of the tranng data wthout nvolvng any mnng algorthm. Some popular ndependent crtera are dstance measures, nformaton measures, dependency measures, and consstency measures [3][4][5]. Instead, a dependent crteron requres a predetermned mnng algorthm n feature selecton and uses the performance of the mnng algorthm appled on the selected subset to determne whch features are selected. There are two man technques for feature subset selecton,.e. the flter and wrapper methods. All flter methods use heurstcs based on general characterstcs of the data rather than a learnng algorthm to Tel.: +98 361 5550055; fax: +98 361 5550056. E-mal address: M.Esmael@aukashan.ac.r, Fazekas.Gabor@crc.undeb.hu. 35

evaluate the mert of feature subsets. Wrapper methods for feature selecton use an nducton algorthm to estmate the mert of feature subsets. Flter methods are n general much faster than wrapper methods and more practcal for usng on hgh-dmensonal data. Feature wrappers often acheve better results than flter due to ths fact that they are tuned to the specfc nteracton between an nducton algorthm and ts tranng data [6]. Early research efforts manly are focused on feature selecton for classfcaton wth labeled data where class nformaton s avalable [1][2][7][8]. Dvde-and-conquer algorthms such as ID3 choose an attrbute to maxmze the nformaton gan; proposed algorthm whch we wll descrbe chooses an attrbute to maxmze the probablty of the desred classfcaton. Experments wth a decson tree learner (C4.5) have shown that addng to standard datasets a random bnary attrbute generated by tossng an unbased con affects classfcaton performance, causng t to deterorate (typcally by 5% to 10% n the stuatons tested). Ths happens because at some pont n the trees those are learned the rrelevant attrbute s nvarably chosen to branch on, causng random errors when test data s processed [9]. We should know that there s no sngle machne learnng method whch be approprate for all possble learnng problems. The unversal learner s an dealstc fantasy. In ths paper be used an algorthm that arrange attrbutes based on mportance by two ndependent crtera. Then, ranked attrbutes are used as nput for constructon one decson tree. Our goal s to consder nfluences data preprocessng (feature selecton) on classfcaton. Ths paper s organzed as follows. Secton 2 s detaled descrpton of the proposed method. Secton 3 descrbes the data sets, results and dscusson. Fnally, secton 5 concludes the research. 2. Proposed Method Proposed method s descrbed n ths secton. Schematc dagram of method shows n Fgure 1. Attrbute Rankng Algorthm (ARA) Top-Down Inducton of Decson Trees (TDIDT) Phase 1 Phase 2 Fg. 1: Schematc Dagram of Proposed Method 2.1. The Frst Phase In the frst phase, t s used attrbute rankng algorthm (ARA) before rule generaton. In partcular, we want to address the nducer to optmze the model through feature selecton. In ARA algorthm be used a measure whch a knd of ths measure was proposed n [10] for determnng mportance of the orgnal attrbutes. Then, ranked attrbutes obtaned based on ths algorthm are fed as nputs to the second phase. As mentoned prevously, dstance measures and dependency measures are two popular ndependent crtera. In dstance measures we try to fnd the feature that can separate the two classes as far as possble. Dependency measures are also known as correlaton measures or smlarty measures. They measure the ablty to predct the value of one varable from the value of another. In feature selecton for classfcaton, we look for how strongly a feature s assocated wth the class [2]. The ARA ncludes two parts, class dstance rato and an attrbute-class correlaton measure. Class dstance rato s measured from two parameters. These parameters are calculated wth the kth attrbute omtted from each nstance. Equaton (1) and (2) show how to do ths. c n Dstance1= P X k k m X k m 1 1 c T 1 2 Dstance2= P m mm m 1 T 1 2 (1) (2) 36

C s the number of classes n the data set and P s the probablty of the th class. m and m are the mean vector of the th class and mean of all nstances n the data set, respectvely. n s number of nstances n the th class, and N s the total number of nstances n the data set,.e., N=n1 +n2 + +nc On the other sde, the attrbute-class correlaton measure s used to evaluate the power of each attrbute affectng the class label for each nstance. The larger the correlaton factor, the more mportant the attrbute s for determnng the class labels of nstances. A great magntude of attrbute class correlaton shows that there s a close correlaton between class labels and attrbute, whch ndcates the great mportance of ths attrbute n classfyng the nstances, and vce versa. Equaton (3) ndcates attrbute-class correlaton. Equaton calculated for attrbutes that not belong to the same class. Attrbute class correlaton= Xk Xjk (3) 2.2. The Second Phase In the second phase smple but very powerful algorthm s used for generatng rules called Top-Down Inducton of Decson Trees (TDIDT). Ths has been known snce the md-1960s and has formed the bass for many classfcaton systems, two of the best known beng ID3 and C4.5, as well as beng used n many commercal data mnng packages [11]. Fgure 2 shows ths algorthm. # j IF all the nstances n the tranng set belong to the same class THEN Return the value of class ELSE (a) Select an attrbute A from ranked lst (b) Sort the nstances n the tranng set nto subsets, one for each value of attrbute A (c) Return a tree wth one branch for each non-empty subset, Each branch havng a descendant subtree or a class value Produced by applyng the algorthm recursvely 3. Results and Dscusson Fg. 2: Top-Down Inducton of Decson Trees(TDIDT) The effectveness of newly proposed method has to be evaluated n practcal experment. For ths reason we selected four data sets from UCI repostory [12]. Table 1 shows tranng datasets and ther characterstcs. Table 1. Descrpton for Test Number of Attrbutes Number of Instances Number of classes 4 150 3 7 432 2 10 214 6 34 351 2 Above datasets are used as nput of ARA algorthm, frst phase of proposed algorthm. A ranked attrbutes lst are obtaned from ths phase. Table 2 and Table 3 show output of ARA and attrbute orderng, respectvely. Table 2: Output of ARA algorthm Output of ARA 1662,3471,3727,27 21388,22750,20231,22198,22725,524 2452,6280,3210,2554,1413,2206,2127,2875,59 1796,10766,9392,11534,9338,10705,10385,10217,9408,10496,9777,11543, 9947,12124,9096,11158,9973,11337,10297,11666,10074,11562,10454, 11081,9995,9458,11398,11370,9837,11535,10115,10441,8927,1800 37

Table 3: Importance rankng results obtaned by frst phase of proposed algorthm Attrbutes Orderng 3,2,1,4 2,5,4,1,3,6 2,3,8,4,1,6,7,5,9 14,20,22,12,30,4,27,28,18,16,24,2,6,10,23,32,7,19, 8,31,21,25,17,13,29,11,26,9,3,5,15,33,34,1 On the bass of attrbute orderng n Table 3, attrbutes are passed to second phase whch constructs a decson tree. As mentoned before n ths phase of algorthm, smple and very strong algorthm s used. Fgure 3 shows output of ths algorthm for dataset. It s obvous from Fgure 3, rule orderng s same as attrbute rankng. So that the most mportant attrbute compare n the frst term of rule. All of rules nclude ths attrbute. Feld3<=1.7 : -setosa( 48) Feld3>1.7 AND Feld2<=2.2 : -verscolor(4/1) Feld3>1.7 AND Feld2>2.2 AND Feld1<=4.9 : -verscolor(3/2) Feld3>1.7 AND Feld2>2.2 AND Feld1>4.9 AND Feld4<=1.4 : -verscolor(34/2) Feld3>1.7 AND Feld2>2.2 AND Feld1>4.9 AND Feld4>1.4 : -vrgnca(61/14) Fg. 3: Rule generaton by the second phase of algorthm After tree constructon and also confuson matrx, evaluaton parameters such as Recall, F-measure, Precson, and Accuracy are calculated. Ths step s done for all of data set (Table 4). Table 4: Detaled Accuracy by Class Monk s Problems Glass Identfcaton TP Rate 0.96 0.10 0.96 0.06 0.54 0.86 0.70 FP Rate 0 0.05 0.14 0.03 0.58 0.02 0.10 0.17 Recall 0.90 0.07 0.84 0.07 0.08 0.22 0.70 Precson 1.00 0.88 0.77 0.64 0.48 0.34 0.64 0.93 0.87 0.85 F-measure 0.84 0.80 0.83 0.13 0.61 0.02 0.13 0.14 0.36 0.89 Class Setosa Verscolour Vrgnca Class 0 Class 1 Buldng_w_f_p Buldng_w_nf_p Vehcle_w_f_p Contaners Tableware Headlamps Bad Good As explaned we need to use other algorthms for comparson them wth proposed algorthm. One of the most common applcatons s Weka. The methods that we use n ths applcaton are J48, BFTree, REPTree, and NBTree. Weka use 10-fold cross valdaton for accuracy. The standard way of predctng the error rate of a learnng technque gven a sngle, fxed sample of data s to use stratfed 10-fold cross valdaton. The sze of nduced decson trees s one of the evaluaton crtera. Fnally we complete our overvew wth a comparson between proposed algorthm and Weka algorthms output. The result of ths comparson s summarzed n Table 5 and Table 6. Table 5: Calculaton Number of Leaves/Sze of Tree for all dataset J48 5/9 2/3 30/59 18/35 BFTree 6/11 2/3 16/31 11/21 REPTree 3/5 8/15 12/23 5/9 NBTree 4/7 1/1 9/17 8/15 Proposed Method 5/8 4/6 14/30 13/24 38

Table 6: Comparson of Error rate Error Rate J48 BFTree REPTree NBTree Proposed Method 0.04 0.06 0.06 0.06 0.12 0.25 0.25 0.15 0.25 0.34 0.38 0.30 0.45 0.09 0.10 0.11 0.10 0.18 The resultng tree s an oblvous tree. In ths knd of tree each level check the same attrbute. For ths reason, error ratng of proposed method s more than other algorthms. 4. Concluson and Recommendatons In ths paper, feature selecton for decson tree constructon s presented. Feature selecton as one way of data preprocessng can effect n all steps of data mnng algorthms. Attrbutes mportance rankng obtan by runnng ARA algorthm whch s the frst phase of proposed algorthm. In the next phase smple algorthm s used for generatng rules. Fnally evaluaton parameters such as sze of tree, number of leaves, error rate, recall, and precson are computed. Results of comparson show that average number of leaves and sze of decson tree generated by proposed method are better than other algorthm. As other data mnng algorthm, the results of proposed algorthm depend on characterstc of dataset. However, ths method generated smaller trees when comparng wth other algorthm such as J48 or BFTree. It s also found that error rate s acceptable. For mprovng accuracy we can repeat two phase of algorthm nstead of TDIDT method. Thus we have an algorthm wth more tme complexty but better accuracy. 5. Acknowledgments The authors wsh to thank Mansour Tarafdar. Hs programmng and constructve comments and suggestons helped us sgnfcantly mprove ths work. 6. References [1] J. Doak. An Evaluaton of Feature Selecton Methods and Ther Applcaton to Computer Securty, techncal report, Unversty of Calforna at Davs, Department of Computer Scence, 1992 [2] Huan Lu, Le Yu. Toward Integratng Feature Selecton Algorthms for Classfcaton and Clusterng, IEEE Transactons on Knowledge and Data Engneerng, Vol. 17, No. 4, pp. 491-502, Aprl-2005 [3] H. Almuallm, T.G. Detterch. Learnng Boolean Concepts n the Presence of Many Irrelevant Features, Artfcal Intellgence, Vol. 69, pp.279-305, 1994 [4] M.A. Hall. Correlaton-Based Feature Selecton for Dscrete and Numerc Class Machne Learnng, Proc. 17 th Int'l conf. Machne Learnng, pp. 359-366, 2000 [5] H. Lu, H. Motoda. Feature Selecton for Knowledge Dscovery and Data Mnng. Boston :Kluwer Academc, 1998. [6] Oded Mamon, Lor Rokach. The Data Mnng and Knowledge Dscovery Handbook, Sprnger,pp. 93-111, pp. 149-164, 2005 [7] M. Dash, H. Lu. Feature Selecton for classfcaton, Intellgent Data Analyss: An Int'l J., Vol. 1, No. 3, pp. 131-156, 1997 [8] W. Sedleck, J. Sklansky. On Automatc Feature Selecton, Int'l J. Pattern Recognton and Artfcal Intellgence, Vol. 2, pp. 197-220, 1988. [9] Ian H.Wtten, Ebe Frank. Data Mnng: Practcal Machne Learnng Tools and Technques, Second Edton, Morgan Kaufmann, pp. 288-296, 2005 [10] Lpo Wang, Xuju Fu. Data Mnng wth Computatonal Intellgence, Sprnger, pp. 117-123,2005 [11] Max Bramer. Prncples of Data Mnng, Sprnger, pp. 47-48, 2007 [12] Blake, C.L. and Merz. C.J. UCI Repostory of Machne Learnng Databases. Irvne, CA: Unversty of Calforna, Department of Informaton and Computer Scence. [http://www.cs.uc.edu/~mlearn/mlrepostory.html] 39