A classification scheme for applications with ambiguous data

Similar documents
Support Vector Machines

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Classifier Selection Based on Data Complexity Measures *

Edge Detection in Noisy Images Using the Support Vector Machines

The Research of Support Vector Machine in Agricultural Data Classification

Classifying Acoustic Transient Signals Using Artificial Intelligence

Feature Reduction and Selection

CS 534: Computer Vision Model Fitting

Machine Learning 9. week

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Lecture 5: Multilayer Perceptrons

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Support Vector Machines

Classification / Regression Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

A Binarization Algorithm specialized on Document Images and Photos

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Classification Methods

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Announcements. Supervised Learning

SVM-based Learning for Multiple Model Estimation

Cluster Analysis of Electrical Behavior

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

S1 Note. Basis functions.

Face Recognition Based on SVM and 2DPCA

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Smoothing Spline ANOVA for variable screening

Meta-heuristics for Multidimensional Knapsack Problems

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Complex System Reliability Evaluation using Support Vector Machine for Incomplete Data-set

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Related-Mode Attacks on CTR Encryption Mode

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Data Mining: Model Evaluation

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

An Entropy-Based Approach to Integrated Information Needs Assessment

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Extraction of Fuzzy Rules from Trained Neural Network Using Evolutionary Algorithm *

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

An Anti-Noise Text Categorization Method based on Support Vector Machines *

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Applying EM Algorithm for Segmentation of Textured Images

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Wishing you all a Total Quality New Year!

Detection of an Object by using Principal Component Analysis

Lecture 4: Principal components

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

A Multivariate Analysis of Static Code Attributes for Defect Prediction

User Authentication Based On Behavioral Mouse Dynamics Biometrics

An Optimal Algorithm for Prufer Codes *

CLASSIFICATION OF ULTRASONIC SIGNALS

Mathematics 256 a course in differential equations for engineering students

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

High-Boost Mesh Filtering for 3-D Shape Enhancement

TN348: Openlab Module - Colocalization

Hermite Splines in Lie Groups as Products of Geodesics

Collaboratively Regularized Nearest Points for Set Based Recognition

Face Detection with Deep Learning

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Support Vector Machines. CS534 - Machine Learning

Adaptive Transfer Learning

Data Mining For Multi-Criteria Energy Predictions

Vol. 5, No. 3 March 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

A Comparative Study of Fuzzy Classification Methods on Breast Cancer Data *

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

Audio Content Classification Method Research Based on Two-step Strategy

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Tighter Perceptron with Improved Dual Use of Cached Data for Model Representation and Validation

Understanding K-Means Non-hierarchical Clustering

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

EXTENDED BIC CRITERION FOR MODEL SELECTION

Associative Based Classification Algorithm For Diabetes Disease Prediction

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Fast Feature Value Searching for Face Detection

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES

Classification Based Mode Decisions for Video over Networks

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Transcription:

A classfcaton scheme for applcatons wth ambguous data Thomas P. Trappenberg Centre for Cogntve Neuroscence Department of Psychology Unversty of Oxford Oxford OX1 3UD, England Thomas.Trappenberg@psy.ox.ac.uk Andrew D. Back Katestone Scentfc 64 MacGregor Tce Bardon, QLD 4065 Australa back@usa.net Abstract We propose a scheme for pattern classfcatons n applcatons whch nclude ambguous data, that s, where pattern occupy overlappng areas n the feature space. Such stuatons frequently occur wth nosy data and/or where some features are unknown. We demonstrate that t s advantageous to frst detect those ambguous areas wth the help of tranng data and then to re-classfy those data n these areas as ambguous before makng class predctons on test sets. Ths scheme s demonstrated wth a smple example and benchmarked on two real world applcatons. Keywords: data classfcaton, ambguous data, probablstc ANN, k-nn algorthm. 1. Introducton Adaptve data classfcaton s a core ssue n data mnng, pattern recognton, and forecastng. Many algorthms have been developed ncludng classcal methods such as lnear dscrmnant analyss and bayesan classfers, more recent statstcal technques such as k-nn (k-nearest-neghbors, MARS (multvarate adaptve regresson splnes, machne learnng approaches for decson trees etc, ncludng C4.5, CART, C5, Bayes tree and neural network approaches such as multlayer perceptrons and neural trees [1-8]. Most of these classfcaton schemes work well enough when the classes are separable. However, n many real world problems the data may not be separable. That s, there may exst regons n the feature space that are occuped by more than one class. In many problems, ths ambguty n the data s unavodable. A smlar problem occurs when the data are very closely spaced and a hghly nonlnear decson boundary s requred to separate the data. Accordngly, the ams of much recent work on classfcaton has been to seek ways to fnd better nonlnear classfers. Partcularly notable n ths area s the feld of support vector machnes [9, 10]. SVMs have captured much nterest, as they are able to fnd nonlnear classfcaton boundares whle mnmzng the emprcal rsk of false classfcaton. However, t s mportant to consder what the data really means and what the practcal, real world goals are. In many cases, t s desred to fnd a smple classfer whch gves the user a rough, but understandable gude to what the data means. On the other hand, the data tself may be contamnated by nose, some nput varables are completely mssng (.e. the data s then n a feature space whch s mplctly too low or when the data s flawed n other ways. Ths ssue s commonly referred to n the context of robust statstcs and outler detecton. In ths paper we propose a method for preprocessng ambguous data. The basc dea s qute straght forward: rather than seek a complex classfer, we am to frst examne the data wth the am of removng any ambgutes. Once the ambguous data s removed, we then apply whatever classfer s requred, hopefully one whch may lead to a much

smpler soluton than would otherwse be obtaned. In dong ths we acknowledge that data n some regons of the state space can ether not be classfed at all or else our confdence n dong so s low. Hence those data ponts are labeled n a dfferent manner to facltate better classfcaton. Our proposed scheme s to dentfy ambguous data and to re-classfy those data wth an addtonal class. We wll call ths addtonal class IDK (I don't know to ndcate that predctng the class of ths data should not be attempted due to t's ambguty. By dong so one looses some predcton of data. However, we wll show that we can gan on the other hand a drastcally ncrease n the confdence of the classfcaton of the remanng data. Our proposed scheme s outlned n the next secton. There we wll also ntroduce a partcular mplementaton that wll be used n the examples dscussed n ths paper. The synthetc example n secton 3 s amed to llustrate the underlyng problem on the proposed soluton n more detal. In secton 4 we wll report on the performance of our algorthms on two real world (benchmark data sets, that of the classcal Irs benchmark [11], and that of a medcal data set [12]. 2. Detecton of Ambguous Data As stressed n the ntroducton, our scheme s to detect ambguous state space areas and to re-classfy data n these areas. Hence, our proposed scheme has the followng steps: 1. Test all data ponts aganst a crtera of ambguty. 2. Re-classfy tranng data whch are ambguous. 3. Classfy test data wth algorthm traned on the re-classfed data Note that the scheme s ndependent of any partcular classfcaton algorthm. In practce t mght be crtcal to choose approprate algorthms for each of these steps. As a means of llustratng the proposed method, we use some partcular algorthms whch are descrbed below. For the frst step we employ a k-nn algorthm [3]. Ths algorthm takes the k closest data ponts to a data pont n queston nto account to decde on the new class of ths data pont n queston. If an overwhelmng maorty of neghborng data s of one partcular class, then ths class s taken to be the class of ths data pont. If no overwhelmng maorty can be reached then ths data pont s declared as ambguous and classfed as member of the class IDK. The next step requres a classfcaton method for the predctve classfcaton of further data (test data. Whle any type of adaptve classfer could be chosen, n the followng test we use a feedforward neural network Hdden Layer: (1 (1 (1 (1 (1 Hdden Layer : a = w k xk + θ ; h = f ( a (2 (2 (2 (2 (2 Output Layer : a = w h + θ ; y = f ( a wth a softmax output layer defned by the normalzed transfer functon k y = exp( a k exp( (2 (2 ak so that the outputs can be treated as probabltes, or confdence values, that the nput data belongs to a class ndcated by the partcular output node. Ths network s traned on the negatve cross entropy E = µ µ t y ( x µ ; w

Fgure 1: Example wth overlappng data. The left column shows examples wthn a standard classfcaton procedure, whereas the rght column shows examples wth the proposed re-classfcaton scheme. (a The raw nput data (tranng set wth data from class a (crcles and class b (squares. (b Re-classfed tranng set ncludng ambguous data n class IDK (trangles. (c Classfcaton of the orgnal tranng data after tranng wth a probablstc MLP. False classfcatons are marked wth sold symbols. (d Classfcaton of the re-classfed tranng set. (e,f Performance on a test set. (g,h Probablty surface of class a generated by the two networks. whch s approprate to gve the network outputs the probablstc nterpretaton [13]. t µ s thereby the target vector of tranng example µ wth component t = 1 f the example belongs to the class. In the examples below we use the Levenberg-Marquardt algorthm (LM [14] to tran the network on the tranng data set. 4. Partally Overlappng Classes: An Example Here we llustrate the proposed scheme usng two overlappng classes n a two dmensonal state space. In the followng we defne two classes a and b. The features x 1 and x 2 of the data from each class are thereby drawn from an equal dstrbuton wth overlappng areas: class a : class b : x 1 x [0,1]; 1 [0.6,1.6]; 2 x 2 [0,1] x [0.6,1.6]

An example of 100 data ponts of these two classes s shown n Fgure 1(a, data from class a as crcles and data from class b as squares. For comparson we traned a classfcaton network drectly on these tranng data. The networks had always 10 hdden nodes and were traned wth 100 LM tranng steps. The classfcatons of the tranng data wth ths network after tranng are shown n Fgure 1(c. Only 4 data ponts have not been classfed correctly. The network even learned to classfy most of the tranng data n the ambguous area. The re-classfcaton of these tranng wth the KNN algorthm descrbed above s shown n Fgure 1(b. Data wth components 0.6 < x 1 <1 are ambguous. K = 10 nearest neghbors (ncludng the data pont tself were taken nto account when choosng a new class structure for ths data set. The class of the data pont was set to the maorty of data f 80% or more of the neghborng data (ncludng the data pont tself were of ths maorty. If such a maorty could not be reached the data were classfed as class IDK and symbolzed wth open trangles n the fgures. These re-classfed data were used as tranng data for a second classfcaton network smlar to the prevous network. We only added one output node to account for the addtonal class IDK. The classfcaton of the re-classfed tranng data wth the classfcaton network after tranng s shown n Fgure 1(d. Only one data pont was not correctly classfed. More mportant than the performance of the classfcaton network on the tranng data s that of the performance on test data. An example s shown n Fgure 1(e and 1(f for the two separate classfcaton networks respectvely. The network that was traned wth ambguous data (Fgure 1(e falsely classfed 1/3 of the test data correspondng to a standard performance value of P' := n c /n = 0.67, where n c are the number of correct classfcatons and n are the number of data. As mght be expected there are of course numerous false classfcatons n the area wth ambguous data. However, what s even more dsturbng s that there are a lot of false classfcatons of data far away from ths area. Ths can also clearly be seen from the predcton surface of ths network, whch s llustrated n Fgure 1(g wth gray values. Whte correspond thereby to a confdence (value of the frst output node of 1 of predctng class a wth ths network, whereas black correspond to a confdence of 0 of predctng class a (whch correspond to a confdence of 1 to predct class b n ths example. It can clearly be seen that the attempt of the network to fnd a classfcaton scheme n the area wth ambguous data let to the proposal of structure that does not correspond to the underlyng problem. Ths structure s extrapolated to areas wthout ambguous data leadng to the pure performance on data n these areas of the nput space. The stuaton s much better wth the re-classfed data. The results of the classfcatons of the same test data wth the network traned on the re-classfed data are shown n Fgure 1(f. Only fve pattern have been falsely classfed, all of whch are close to the boundares of the area wth ambguous data. Ths correspond to a standard performance of P' = 0.95 (compared to 0.67 when takng all classes nto account. The mprovement comes largely from the fact that the underlyng problem has not any longer ambguous data, so that perfect classfcatons can be expected n the lmt of nfnte data. Ths s contrary to the orgnal problem whch ncludes ambguous data and hence a perfect classfcaton can not be archved n the lmt of nfnte tranng data. Indeed, n the example shown n Fgure 1(f, no data have been classfed wrongly when takng only predctons of class a and b nto account. Ths wll not always be the case but wll be true n the nfnte data lmt. Moreover, the areas far away from the ambguous area can be predcted wth hgh confdence, and the false classfcatons wll have a much lower confdence value. 4. Real World Data: Some Benchmark Examples The prevous example was ntended to descrbe our scheme and to llustrate why ths scheme should be useful. However, only data taken from real-world examples can tell f ths scheme s useful n practce. Hence, we wll report n the followng on some ntal study of the applcaton of ths scheme to some real world data. The followng examples are all taken from the UCI repostory of machne learnng databases [15] whch can be accessed va the Internet. 4.1. Irs Dataset We tested our scheme frst on the classcal Irs-data benchmark. The dataset contans 150 examples wth 4 physcal propertes of 3 members of the famly of rs flowers. The dataset was frst used by Fscher n 1936 [11] to llustrate multdata dscrmnant analyss technques n taxometrc problems. We dvded the dataset evenly nto a tranng set and a test set by takng 25 examples from each class nto each subset. The tranng dataset was re-classfed wth the same procedure as n the example of secton 3 wthout

adustng any parameters. Ths re-classfcaton run classfed 8 examples of the tranng data set as class IDK. The class label of all members of the frst class were preserved, whch showed that the tranng set of ths class dd not nclude ambguous data and was easly separable from the other classes. Ths fndng s n accordance wth smlar fndngs of Duch et al. [16]. We used a smlar network as n the prevous example for the classfcaton task tself; only the number of output nodes were adusted to represent the requred number of classes. After 100 tran steps the network was able to represent all data n the tranng sets, of the orgnal data as well as of the re-classfed data. The network made 4% false classfcatons (3 examples on the test set wth the network traned on the orgnal data. However, no false classfcatons were made n the classfcaton task wth the network traned on the re-classfed data when only takng classfcatons of rs types nto account. The prce to pay was that 11 examples of the test set were labeled as IDK. 4.2. Wsconsn Breast Cancer data The second test was made on medcal data compled by Dr. Wllam H. Wolberg at the Unversty of Wsconsn Hosptals, Madson. A smaller database of these data was ntally studed n [12]. The verson of the dataset we used contaned data from 699 patents wth 9 predctve attrbutes used n breast cancer dagnoss. The data were classfed n two classes: bengn (458 nstances, 65.5% and malgnant (241 nstances, 34.5%. Data from 16 patents were ncomplete. We gnored these records n the followng test. However, t should be stressed that ncomplete nformaton should lead to more ambguous data and should favor our approach. The effect of ncomplete data wll be dscussed n more detal elsewhere. We agan used the same KNN re-classfcaton algorthms as n the prevous example wthout adustng any parameters. The data were randomly dvded nto 340 tranng data and 343 test data. 14 data ponts were classfed as IDK wth the KNN re-classfcaton algorthm. All data ponts of the tranng sets, the orgnal on the re-classfed, were classfed correctly after tranng the networks. However, the network traned on the orgnal data classfed 23 nstances (6.7% of the test data ncorrectly, whereas the second network made only 6 mstakes (1.75%. 22 nstances of the test data were classfed as IDK. 5. Concluson and Outlook In ths paper we have proposed a scheme to solve classfcaton problems wth ambguous data. We showed that classfcaton problems wth ambguous data can lead to pure classfcaton results, not only for data n the ambguous areas but also for data n the areas whch should have a much better predctablty. We showed that the dentfcaton of ambguous nput areas and the use of re-classfed data for the tranng of the classfcaton algorthms can lead to an drastc reducton of false predctve classfcatons. Hence one should consder avodng predctons of some data n areas whch are hghly ambguous. We thnk that ths approach s very sutable when predctons have to be made wth partcular cauton. There are many ssues wthn ths scheme that have to be dscussed n the future. In partcular we used only a smple k-nn re-classfcaton scheme to dentfy ambguous areas n the nput space. We nether studed systematcally the dependence on the parameters of ths algorthm, nor dd we explore whch algorthms mght be best suted for a partcular problem. There are also a varety of classfcaton algorthms avalable, each whch mght have advantages for partcular applcatons. Our network can be mproved wth a Bayesan regularzaton scheme, whch can also provde addtonal nformaton on the complexty of the underlyng problem. Some work n ths drecton s n progress. Also other advanced classfcaton methods such as SVM can be used. Acknowledgment: We would lke to thank Wlodek Duch for the dscussons of hs results on rule extracton durng hs vst n Japan.

References [1] Bretman, L., Fredman, J.H., Olshen, R.A., Stone, C.J., Classfcaton and Regresson Trees, Wadsworth, Belmont, CA, 1984. [2] Buntne, W.L., Learnng classfcaton trees, Statstcs and Computng 2, 63-73, 1992. [3] Cover, T.M., Hart, P.E. Nearest neghbor pattern classfcatons, IEEE Transacton on Informaton Theory 13, 21-27, 1967. [4] Duda, R.O., Hart, P.E.m Pattern Classfcaton and Scene Analyss, Wley, New York, 1973. [5] Hanson R., Stutz, J., Cheeseman, P. Bayesan classfcaton wth correlaton and nhertance,proceedngs of the 12th Internatonal Jont Conference on Artfcal Intellgence 2, Sydney, Australa,Morgan Kaufmann, 692-698,1991. [6] Mche, D., Spegelhalter, D.J., Taylor, C.C., (edtors, Machne Learnng, Neural and Statstcal Classfcaton, Ells Horwood, 1994. [7] Rchard, M.D., Lppmann, R.P., Neural network classfers etmate Bayesan a-posteror probabltes, Neural Computaton 3, 461-483, 1991. [8] Tso, A.C., Pearson, R.A., Comparson of three classfcaton technques, CART, C4.5, and multlayer perceptrons, n Advances n Neural Informaton Processng Systems 3, 963-969, 1991. [9] Vapnk V., The Nature of Statstcal Learnng Theory, Sprnger Verlag, New York, 1995. [10] Vapnk, V., Golowch, S., Smola, A., Support vector method for functon approxmaton, regresson estmaton, and sgnal processng, n Advances n Neural Informaton Processng Systems 9, 1997. [11] Fscher R., The use of multple measurements n taxonomc problems, Annals of Eugencs 7, pp.179-188, 1936 [12] O. L. Mangasaran and W. H. Wolberg, Cancer dagnoss va lnear programmng, SIAM News, Volume 23, Number 5, September 1990, pp 1-18. [13] Amar, S., Backpropagaton and stochastc gradent desent methods, Neurocomputng 5, 185-196, 1993. [14] Hagan, M.T., Menha, M., Tranng feedforward networks wth the Marquardt algorthm, IEEE Transactons n Neural Networks, vol.5, no.6, pp.989-993, 1994 [15] Mertz, C.J., Murphy, P.M., UCI repostory of machne learnng databases, http://www.cs.uc.edu/pub/machne-learnng-databases [16] Duch W, Adamczak R, Grabczewsk K, Zal G (1999 Hybrd neural-global mnmzaton method of logcal rule extracton, Int. Journal of Advanced Computatonal Intellgence (n prnt.