BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Similar documents
Support Vector Machines

Classification / Regression Support Vector Machines

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Machine Learning 9. week

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

The Research of Support Vector Machine in Agricultural Data Classification

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Parallelism for Nested Loops with Non-uniform and Flow Dependences

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Three supervised learning methods on pen digits character recognition dataset

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Classifier Selection Based on Data Complexity Measures *

Using Neural Networks and Support Vector Machines in Data Mining

Announcements. Supervised Learning

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Efficient Text Classification by Weighted Proximal SVM *

Feature Reduction and Selection

A Lazy Ensemble Learning Method to Classification

Fast Feature Value Searching for Face Detection

Classifier Ensemble Design using Artificial Bee Colony based Feature Selection

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Module Management Tool in Software Development Organizations

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Japanese Dependency Analysis Based on Improved SVM and KNN

Random Kernel Perceptron on ATTiny2313 Microcontroller

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

Data Mining: Model Evaluation

Face Recognition Method Based on Within-class Clustering SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Face Recognition Based on SVM and 2DPCA

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Relevance Feedback Document Retrieval using Non-Relevant Documents

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

Lecture 5: Multilayer Perceptrons

Collaboratively Regularized Nearest Points for Set Based Recognition

Evolutionary Support Vector Regression based on Multi-Scale Radial Basis Function Kernel

A Semi-Supervised Approach Based on k-nearest Neighbor

Edge Detection in Noisy Images Using the Support Vector Machines

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

An algorithm for correcting mislabeled data

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

An Optimal Algorithm for Prufer Codes *

Discriminative classifiers for object classification. Last time

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set

Machine Learning. Topic 6: Clustering

Dynamic Integration of Regression Models

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

An Image Fusion Approach Based on Segmentation Region

Tighter Perceptron with Improved Dual Use of Cached Data for Model Representation and Validation

Consensus-Based Combining Method for Classifier Ensembles

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Smoothing Spline ANOVA for variable screening

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

CS 534: Computer Vision Model Fitting

Cluster Analysis of Electrical Behavior

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

Classifying Acoustic Transient Signals Using Artificial Intelligence

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Specialized Weighted Majority Statistical Techniques in Robotics (Fall 2009)

CSCI 5417 Information Retrieval Systems Jim Martin!

A Multivariate Analysis of Static Code Attributes for Defect Prediction

Learning-based License Plate Detection on Edge Features

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

SVM-based Learning for Multiple Model Estimation

Machine Learning Algorithm Improves Accuracy for analysing Kidney Function Test Using Decision Tree Algorithm

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

Support Vector Machines. CS534 - Machine Learning

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Histogram of Template for Pedestrian Detection

Deep Classification in Large-scale Text Hierarchies

A Fusion of Stacking with Dynamic Integration

User Authentication Based On Behavioral Mouse Dynamics Biometrics

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Reducing Frame Rate for Object Tracking

Extraction of Fuzzy Rules from Trained Neural Network Using Evolutionary Algorithm *

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Combining Multiresolution Shape Descriptors for 3D Model Retrieval

Additive Groves of Regression Trees

A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines

A Statistical Model Selection Strategy Applied to Neural Networks

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

S1 Note. Basis functions.

INTELLECT SENSING OF NEURAL NETWORK THAT TRAINED TO CLASSIFY COMPLEX SIGNALS. Reznik A. Galinskaya A.

Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System

Recognition of Handwritten Numerals Using a Combined Classifier with Hybrid Features

Transcription:

1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 OKAN K. ERSOY School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 ABSTRACT It s common to tran a classfer wth a tranng set, and to test t wth a testng set to study the classfcaton accuracy. In ths paper, we show how to effectvely use a number of valdaton sets obtaned from the orgnal tranng data to mprove the performance of a classfer. The proposed valdaton boostng algorthm s llustrated wth a support vector machne (SVM) n the applcaton of lymphography classfcaton. A number of runs wth the algorthm s generated to show ts robustness as well as to generate consensus results. At each run, a number of valdaton datasets are generated by randomly pckng a porton of the orgnal tranng dataset. At each teraton durng a run, the traned classfer s used to classfy the current valdaton dataset. The msclassfed valdaton vectors are added to the tranng set for the next teraton. Every tme the tranng set s changed, new classfcaton borders are generated wth the classfer used. Expermental results on a lymphography dataset shows that the proposed method wth valdaton boostng can acheve much better generalzaton performance wth a testng set than the case wthout valdaton boostng. INTRODUCTION Machne learnng has been used n cancer predcton and prognoss for nearly 20 years (Cruz and Wshart 2006). There are several methods wdely used for ths purpose, such as decson tree, Naïve Bayes, k-nearest neghbor, neural network and support vector machne algorthms. New algorthms are stll developed to mprove the classfcaton accuracy. One approach s to utlze feature extracton to select fewer features to tran the classfer. Baggng and boostng technques to generate dfferent tranng samples are also utlzed for ths purpose. In ths way, a number of dfferent classfers can be generated, and consensus technques such as majorty votng and least squares estmaton-based weghtng (Km 2003) can be used to acheve better and more stable classfcaton accuracy. In Baggng (Breman 1996), several classfers are traned ndependently va a bootstrap method, and ther results are combned together to obtan the fnal decson. In ths procedure, a sngle tranng set TR={(x ;y ) =1,2,,n} s used to generate K dfferent classfers. In order to get K dfferent tranng sets and make them ndepent of each other, the orgnal tranng set s resampled. The

2 new K tranng sets have the same sze as the orgnal dataset, but some nstances may appear more than once, and some nstances may not be n a new resampled tranng set. The adaboost algorthm by Freund and Schapre (1994, 1996, 1997) s generally consdered as a frst step towards more practcal boostng algorthms. A boostng algorthm defnes dfferent dstrbutons over the tranng samples, and uses a weak learner to generate hypotheses wth respect to the generated dstrbutons. From the dfferent dstrbutons of tranng samples, dfferent classfers are generated, and they are next combned wth dfferent weghts to get the fnal results. Although the resamplng technque of our proposed method s smlar to baggng and boostng, our appoach s dfferent snce we utlze a number of valdaton sets obtaned from the tranng set, and they are utlzed to modfy the tranng sets. Intally, we dvde the orgnal tranng data nto 2 groups, one for tranng, and the other for valdaton. We use the tranng porton to tran the classfer, and then we valdate t wth the valdaton set. The msclassfed valdaton samples are added to the current tranng set to generate the next tranng set. At each teraton, the current valdaton set s regenerated as a randomly chosen part of the orgnal tranng dataset wth a fxed percentage. At each run, the procedure s repeated n several teratons untl the valdaton accuracy reaches ts maxmum. At ths pont, a classfer s generated. Due to the random ntalzaton of the tranng set and the valdaton set at each run, dfferent ndependent classfers are obtaned wth a number of runs. The results from dfferent runs can be combned by a consensus rule such as majorty votng to get the fnal results. DATASET Lymphography data s obtaned from the UCI machne learnng repostory (Kononenko and Cestnk 1988). The examples n ths data set use 18 attrbutes, wth four possble fnal dagnostc classes. The attrbutes nclude lympho node dmenson, number of nodes, types of lymphatcs, etc. For convenent represantaton, the attrbutes are transformed to nteger type. There are a total of 148 samples, 2 are normal, 81 are metastases, 61 are malgnant lymph and 4 are fbross. Because normal and fbross cases are scarce compared to the other two cases, we used 142 samples to classfy whether the sample s metastases or malgnant lymph. SUPPORT VECTOR MACHINES Vapnk nvented SVM s wth a kernel functon n the 1990s (Vapnk 1992). Ths algorthm s ntally desgned for the two-class classfcaton problem. One class output s marked as 1, and the other class output s marked as -1. The algorthm tres to fnd the best separatng hyperplane wth the largest margn wdth. By gettng a better hyperplane from the tranng samples, t s expected to get better testng accuracy. In SVM, the hyperplane of the nonseparable case s determned by solvng the followng equaton:

3 1 2 mn w + C 2 ξ subject to T y( x w+ b) 1 ξ ξ 0 where x s the th data vector, y s the bnary (-1 or 1) class label of the th data vector, ξ s the slack varable, w s the weght vector normal to the hyperplane, C s the regularzaton parameter, and b s the bas. It can be shown that the margn wdth s equal to 2/ w. Usually the orgnal data s mapped by usng a kernel functon to a hgher dmensonal representaton before classfcaton. Some common kernel functons are lnear, polynomal, radal bass and sgmod functons. In our case, we used the radal bass functon gven by K x x = C x x (2) 2 (, ) *exp( γ * ) In the experments conducted, the SVM-Lght (Joachms, 2004) software was utlzed. We pcked γ equal to 1 and C equal to 1 n these experments. (1) TRAINING AND VALIDATION RESAMPLING TECHNIQUE In the tranng phase, we ntally decde the percentage of the tranng set as p tran and the percentage of the valdaton set as p val. We dvde the orgnal tranng set nto 2 groups accordng to p tran and p val. These 2 ntal tranng and valdaton sets do not overlap wth each other. The tranng set s used to tran the classfer, whch s next valdated wth the valdaton set (Fgure 1). Then, the msclassfed valdaton samples are ncluded n the tranng set to generate the next teraton tranng set. In the next teraton, the valdaton set s randomly pcked from the orgnal complete tranng set wth percentage p val. Wth the new tranng set and the valdaton set, other msclassfed valdaton samples are generated and are ncluded n the tranng set to generate the next teraton tranng set. After several teratons, the performance of the classfer traned n ths way becomes better than that of the classfer traned wth all the orgnal tranng set wthout any valdaton set. The teratons are stopped after reachng nearly 100% valdaton accuracy. Fgure 1. The msclassfed valdaton samples are added to the tranng samples of the prevous stage.

4 The proposed method emphaszes msclassfed valdaton samples. If a msclassfed sample s stll msclassfed the next tme, t would be reemphaszed, resultng n the followng weghtng: 2 1 x, ( ) 1, ( ) msclassfed x + x correclyclassfed x (3) where 1, x > type 1 x, ( ) type x = (4) 0, x > other type Due to the random ntalzatons of the tranng and the valdaton set, we get dfferent classfers at each run. In order to get better results, we can use consensus such as majorty votng between these classfers. EXPERIMENTS We ntally pcked 50% of all the data for tranng and the other 50% for testng. In the tranng phase, we chose p tran equal to 0.5 and p val equal to 0.5. The results wth four runs wth dfferent tranng and testng data are shown n Table 1. From Table 1, we can see that when the valdaton accuracy nearly reaches 100%, the testng accuracy also reaches ts maxmum. Because 100% valdaton accuracy means there are no msclassfed valdaton samples, the teraton process s stopped after nearly reachng ths value. Table 1. Comparson of the testng classfcaton accuracy between the classfer traned by all tranng data and the classfer traned by the proposed method. TestByAll means testng accuracy of the classfer traned by all tranng data. Vald means valdaton accuracy of that teraton. Test. means testng accuracy of that teraton. TestByAll 0.5634 0.57746 0.59155 0.5493 teraton Vald. Test Vald Test Vald Test Vald Test 1 0.5714 0.5634 0.6286 0.5775 0.5143 0.5775 0.6000 0.5493 2 0.7143 0.4366 0.7714 0.4507 0.8286 0.4225 0.6571 0.4789 3 0.9714 0.7183 0.8571 0.4789 0.8286 0.5493 1.0000 0.5916 4 1.0000 0.6620 1.0000 0.6620 1.0000 0.6197 1.0000 0.5916 5 1.0000 0.6620 1.0000 0.6620 1.0000 0.6197 1.0000 0.5916 To test whether our proposed method s sgnfcant to mprove the accuracy, we could convert the decmal value to the percentle value and then calculate Ka Square. By pckng α=0.05, f the Ka square s larger than 0.3841, we can say that our proposed method s sgnfcantly dfferent. k 2 2 ( x E ) χ = (5) = 1 E where x s the percentle value of testng accuracy from our proposed method, and E s the percentle value of testng accuracy from the classfer whch s traned by usng all tranng data. Due to short of space, we only show 4 cases. After we take more runs, we can see that t s sgnfcantly dfferent. The followng fgure (Fg. 2) shows that the testng accuracy drops down frst, and then boost to hgher than ntal one.

5 Fgure 2. The testng classfcaton accuracy vares wth teratons. In order to test for consensus results,we fxed the same tranng data and testng data for 3 classfers, and then used majorty votng to combne dfferent classfer results. The results are shown n Tables 2 and 3. Table 2. Combnng 3 dfferent classfers by majorty votng. In ths tranng set and testng set, f we use all the tranng data to tran the classfer, the testng accuracy s 0.5493. The three classfers are generated from dfferent ntalzatons of the tranng set and the valdaton set. Classfer 1 2 3 Iteraton Vald. Test. Vald. Test. Vald. Test. Consensus Test. 1 0.54286 0.5493 0.57143 0.5493 0.6286 0.5493 0.5493 2 0.74286 0.46479 0.74286 0.46479 0.6857 0.4507 0.4507 3 1 0.60563 1 0.60563 1 0.60563 0.6056 4 0.97143 0.60563 1 0.60563 1 0.60563 0.6056 Table 3. Combnng 3 dfferent classfers by majorty votng. In ths tranng set and testng set, f we use all the tranng data to tran the classfer, the testng accuracy s 0.5634. Classfer 1 2 3 Consensus Iteraton Vald. Test. Vald. Test. Vald. Test. Test. 1 0.6000 0.5634 0.3143 0.4648 0.5429 0.5634 0.5634 2 0.6571 0.4507 0.7714 0.5634 0.7143 0.4507 0.4648 3 1 0.6479 0.9429 0.5775 1.0000 0.6620 0.6479 4 1 0.6479 0.9714 0.5775 1.0000 0.6620 0.6479 DISCUSSION AND CONCLUSIONS From the results of the experments, t s apparent that the resamplng technque does generate a better tranng set to tran the classfer, resultng n better classfcaton accuracy. Another approach would be to use all the tranng data to tran the classfer, and then used the classfer to search for the msclassfed vectors. However, then t s lkely to get 100% tranng accuracy, meanng we do not know whch samples to emphasze. Includng valdaton sets works better due to ths reason.

6 In some cases, we notced that the testng accuracy was lower n teraton 2 or 3. However, the testng results always mproved when we reached 100% valdaton accuracy n succeedng teratons. In prevous boostng methods, t s possble to overft by runnng too many rounds. Wth our approach, we only add the msclassfed valdaton samples to the tranng set. When 100% valdaton accuracy s reached, there are no more rounds to be used. We also notced that f the valdaton accuracy n the frst teraton s not suffcently hgh, such as better than 50%, t s a good dea to regenerate the ntalzaton of the tranng and valdaton sets. Ths approach reduces the number of teratons to get the best tranng set. We also consdered rates of convergence. In all the experments, the maxmum valdaton accuracy s always reached wthn 5 teratons. Ths may take extra computaton tme compared to usng all the tranng set to tran one classfer, but the number of teratons to get better results s not excessve. By usng random ntalzaton of the tranng set and the valdaton set, we can generate a number of dfferent classfers. The results from these classfers can be combned, for example, by usng majorty votng to acheve better results. However, n our consensus experments, the results dd not mprove further. Ths topc needs to be further nvestgated. We only generated 3 classfers and then aggregated the results. It s possble that more classfers would ncrease performance. In the experments, we chose p tran equal to 0.5 and p val eaul to 0.5. To estmate the optmal values of these parameters, we should do further research. ACKNOWLEDGEMENT Ths research was supported by NSF Grant MCB-9873139 and partly by NSF Grant #0325544. REFERENCES Joseph A. Cruz, and Davd S. Wshart, 2006, Applcatons of Machne Learnng n Cancer Predcton and Prognoss, Cancer Informatcs. Hyun-Chul Km, Shaonng Pang, Hong-Mo Je, Dajn Km, Sung Yang Bang, 2003, Constructng support vector machne ensemble, Pattern Recognton 36 pp. 2757 2767. Leo Breman, 1996, Baggng Predctors Machne Learnng 24 (2) pp.123 140. Y. Freund and R.E. Schapre, 1994, A decson-theoretc generalzaton of on-lne learnng and an applcaton to boostng, In Euro COLT: European Conference on Computatonal Learnng Theory. LNCS. Y. Freund and R.E. Schapre, 1996, Experments wth a new boostng algorthm, In Proceedngs 13th Internatonal Conference on Machne Learnng, pp. 146-148. Morgan Kaufmann. Y. Freund and R.E. Schapre,1997, A decson-theoretc generalzaton of on-lne learnng and an applcaton to boostng. Journal of Computer and System Scences,55(1):119-139. Igor Kononenko and Bojan Cestnk, 1988, Repostory of machne learnng databases, http://www.cs.uc.edu/~mlearn/mlrepostory.html, Irvne, CA: Unversty of Calforna, Department of Informaton and Computer Scence. B. E. Boser, I. M. Guyon, and V. N. Vapnk, 1992, A tranng algorthm for optmal margn classfers, D. Haussler, edtor, 5th Annual ACM Workshop on COLT, pages 144-152, Pttsburgh, PA. ACM Press. Thorsten Joachms, 2004, http://www.cs.cornell.edu/people/tj/svm_lght/.