THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

Similar documents
Learning the Kernel Parameters in Kernel Minimum Distance Classifier

The Research of Support Vector Machine in Agricultural Data Classification

Classifier Selection Based on Data Complexity Measures *

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Recognition of Identifiers from Shipping Container Images Using Fuzzy Binarization and Enhanced Fuzzy Neural Network

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Multilayer Neural Networks and Nearest Neighbor Classifier Performances for Image Annotation

Support Vector Machines

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A new Algorithm for Lossless Compression applied to two-dimensional Static Images

Index Terms-Software effort estimation, principle component analysis, datasets, neural networks, and radial basis functions.

Traffic Classification Method Based On Data Stream Fingerprint

Collaboratively Regularized Nearest Points for Set Based Recognition

A New Approach For the Ranking of Fuzzy Sets With Different Heights

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Cluster Analysis of Electrical Behavior

A Classifier Ensemble of Binary Classifier Ensembles

Lecture Note 08 EECS 4101/5101 Instructor: Andy Mirzaian. All Nearest Neighbors: The Lifting Method

A Scheduling Algorithm of Periodic Messages for Hard Real-time Communications on a Switched Ethernet

Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided

Announcements. Supervised Learning

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Machine Learning. Topic 6: Clustering

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Pruning Training Corpus to Speedup Text Classification 1

A new paradigm of fuzzy control point in space curve

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

Impact of a New Attribute Extraction Algorithm on Web Page Classification

An Image Fusion Approach Based on Segmentation Region

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

A Binarization Algorithm specialized on Document Images and Photos

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Scale Selective Extended Local Binary Pattern For Texture Classification

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Web Document Classification Based on Fuzzy Association

Japanese Dependency Analysis Based on Improved SVM and KNN

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Machine Learning: Algorithms and Applications

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Support Vector Machines

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Semantic Image Retrieval Using Region Based Inverted File

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition

A Method of Line Matching Based on Feature Points

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

An Optimal Algorithm for Prufer Codes *

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Skew Estimation in Document Images Based on an Energy Minimization Framework

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Comparison of Robust Nearest Neighbour fuzzy Rough Classifier (RNN-FRC) with KNN and NEC Classifiers

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

A note on Schema Equivalence

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

Fuzzy Rough Neural Network and Its Application to Feature Selection

Face Recognition Based on SVM and 2DPCA

Solving two-person zero-sum game by Matlab

On Evaluating Open Biometric Identification Systems

A Clustering Algorithm for Chinese Adjectives and Nouns 1

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

E-DEEC- Enhanced Distributed Energy Efficient Clustering Scheme for heterogeneous WSN

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

A High-Accuracy Algorithm for Surface Defect Detection of Steel Based on DAG-SVM

An Improved Image Segmentation Algorithm Based on the Otsu Method

Deep Classification in Large-scale Text Hierarchies

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Associative Based Classification Algorithm For Diabetes Disease Prediction

Data Mining: Model Evaluation

A fast algorithm for color image segmentation

Educational Semantic Networks and their Applications

Feature Selection as an Improving Step for Decision Tree Construction

An Internal Clustering Validation Index for Boolean Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Signature and Lexicon Pruning Techniques

Available online at Available online at Advanced in Control Engineering and Information Science

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Fast Feature Value Searching for Face Detection

Bayesian Networks: Independencies and Inference. What Independencies does a Bayes Net Model?

Machine Learning 9. week

Combining The Global and Partial Information for Distance-Based Time Series Classification and Clustering

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

An efficient weighted nearest neighbour classifier using vertical data representation

Classification / Regression Support Vector Machines

A Robust Webpage Information Hiding Method Based on the Slash of Tag

Using an Adaptive Neuro-Fuzzy Inference System (AnFis) Algorithm for Automatic Diagnosis of Skin Cancer

Audio Content Classification Method Research Based on Two-step Strategy

Broadcast Time Synchronization Algorithm for Wireless Sensor Networks Chaonong Xu 1)2)3), Lei Zhao 1)2), Yongjun Xu 1)2) and Xiaowei Li 1)2)

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Transcription:

Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO ZHAI Key Lab. of Machne Learnng and Comutatonal Intellgenceollege of Mathematcs and Comuter Scence, Hebe Unversty, Baodng 07002hna E-MAIL: mczh@hbu.cn Abstract: The fuzzy k-nearest neghbor (F-KNN algorthm was orgnally develoed by Keller n 985, whch generalzed the k-nearest neghbor (KNN algorthm and could overcome the drawback of KNN n whch all of nstances were consdered equally mortant. However, the F-KNN algorthm stll suffers from the roblem of large memory requrement same as the KNN. In order to deal wth the roblem, ths aer rooses the condensed fuzzy k-nearest neghbor rule (CFKNN whch selects the mortant nstances based on samle fuzzy entroy. The exermental results show that our roosed method s feasble and effectve. Keywords: Nearest neghbor; Condensed nearest neghbor; Fuzzy nearest neghbor; Instance selecton; Samle fuzzy entroy. Introducton The nearest neghbor rule (NN was orgnally roosed by Cover and Hart[l]and s wdely used n many felds such as attern recognton[2], data mnng[3], and machne learnng[4-7]. The reasons for the use of ths rule are ts concetual smlcty and easy to understand. However, the nearest neghbor rule suffers from the followng three roblems: ( To classfy a test nstance x, t s requred to store all nstances n tranng set,.e. the comutatonal sace comlexty s O ( n, where n s the number of nstances n tranng set; (2 The dstances between x and all nstances n tranng set are needed to comute,.e. the comutatonal tme comlexty s also O ( n ; (3 Each of the tranng samles s gven equal mortance n classfyng an unseen samle, regardless of ther dfferent contrbuton to classfcaton. In order to deal wth the roblems mentoned above, many researchers dd a great deal of research works. Hart roosed the condensed nearest neghbor rule (CNN to deal wth the roblem ( and (2 [8]. Keller roosed F-KNN algorthm to deal wth the roblem (3 [9]. However, F-KNN algorthm stll suffers from the roblem ( and (2. In ths aer, we roose the method of the condensed fuzzy nearest neghbor rule (CFKNN based on samle fuzzy entroy. The exermental results show that the roosed method s feasble and effectve. The aer s organzed as follows. Prelmnares on our roosed method are gven n secton 2. In secton 3 the CFNN s resented and exermental results and analyss are rovded n Secton 4, Secton 5 concludes the aer. 2. Prelmnares In ths secton, we wll ntroduce several basc concets and algorthms related to our method, manly ncludng the concet of decson table, fuzzy entroy, the algorthm used for determnng the fuzzy membersh degree of nstances n tranng set, and the FNN algorthm. Defnton A decson table (DT n short s a 2-tule DT = ( U, A C where U = { x, x, 2, x N } s a non-emty fnte set of obects (nstances called tranng set and A s a set of real-valued condtonal attrbutes. C s the decson attrbute, wthout loss of generalty we suose that the nstances n U are classfed nto categores C 2,. Defnton 2 Gven a decson DT = ( U, A C, x U, C ( n;, let μ ( x be the fuzzy membersh degree of nstance x belong to class C, the fuzzy entroy of nstance x s defned as follows Entr ( x = ( x log 2 μ( x = μ ( 978--4577-0308-9//$26.00 20 IEEE 282

Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 where the ( x, μ C denotes the fuzzy membersh degree of nstances belong to class C. In ths aer, we use the followng algorthm to determne the fuzzy membersh degree of nstances n tranng set U [0]. Algorthm Inut: A DT wth real-valued condtonal attrbutes. Outut: the fuzzy membersh degree of nstances. STEP : For each class C, calculatng the center c of class C ; STEP 2: For each nstance x U ( n the dstance x and ( d between STEP 3: For each nstance membersh degree ( x 2 ( ( d x = 2 ( d =, calculatng x U C μ as follows, ;, calculatng the fuzzy μ (2 For the convenence, we lst the F-KNN algorthm as follows. Algorthm 2 Inut: A DT wth real-valued condtonal attrbutes, and a test nstance x. Outut: Fuzzy K-nearest neghbor rule. STEP : Intalze K=K 0 ; STEP 2: For each test x, found K-nearest neghbors of x n DT; STEP 2. Intalze =; STEP 2.2 Do STEP 2.3 Comute dstance from x to x ; STEP 2.4 IF ( K 0 THEN Include x n the set of K 0 -nearest neghbors; ELSE IF (x closer to x than any revous nearest neghbor THEN Delete the farthest of the k-nearest neghbors; Include x n the set of K 0 -nearest neghbors; STEP 3: For each test x, comute μ ( x,c usng formula (2 and (3; ( x C K0 μ( x =, = K 0 = x x x x 2 m 2 m μ (3 3. The Condensed Fuzzy K-Nearest Neghbor Rule Based on Samle Fuzzy Entroy In ths secton, we wll resent our method of the CFKNN, In KNN and F-KNN, the roblem of hgh comutatonal comlexty s encountered unavodably due to storng all nstances n tranng set. In fact, for a gven fuzzy nformaton system, dfferent nstance n tranng set has dfferent mortant degree, and has dfferent contrbuton to classfcaton. Some nstances may be more mortant than the others. In our method we select the set of the mortant samles based on the fuzzy entroy of the samle n tranng set. The bgger the fuzzy entroy s, the more mortant the samle, because the samle wth bgger fuzzy entroy can rovde more nformaton for classfcaton and they are closer to the boundares of class. So the set of the mortant samles usually contans the same nformaton wth orgnal dataset n classfcaton. If the mortant samle set s used as tranng set to classfy an unseen test samle, the effcency of classfcaton can be ncreased, and comutatonal comlexty degree can be decreased. In the followng, we rovde the CFKNN algorthm for our method roosed above. Algorthm 3 Inut: a DT, arameter K and α (suose DT =n, and the samles n T are classfed nto classes Outut: S DT STEP : For each nstance x DT, d determne the fuzzy membersh degree of x wth (2; STEP 2: Randomly select one nstance belongng to each class from tranng set, and ut the selected samles nto S. STEP 3: Reeat the followng rocess for each x remaned n DT. STEP 3.: Fnd K nearest neghbors n S; STEP 3.2: Determne the class membersh degree of x wth (3 ( μ( x, μ( x, C2,, μ( x, C ; STEP 3.3: Comute the fuzzy entroy of nstance x 283

Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 wth ( Entr( x = μ ( x log 2 μ( x ; = STEP 3.4: If Entr( x > α dscard x; STEP 3: Return S. 4. Exermental results then S=S{x}; Else The effectveness of our roosed method s demonstrated through numercal exerments n the envronment of Matlab 7.0 on a Pentum 4 PC. In our exerments we totally select 9 datasets ncludng 7 UCI datasets [8] and 2 real world datasets []. The 7 UCI datasets are Irs Dataset, Breast Cancer Dataset-WDBC, Breast Cancer Dataset-WPBC, Glass Dataset, Image Segmentaton Dataset, Parknsons Dataset, Pma Dataset. The 2 real world datasets are CT Image Dataset and RenRu Dataset. The CT Dataset s obtaned by collectng 22 medcal CT mages from Baodng local hostal. All nstances wth 35 numercal attrbutes are classfed nto 2 classes (.e., normal class and abnormal class. The RenRu Dataset s created by the key laboratory of machne learnng and comutatonal ntellgence of Hebe Provncehna. The RenRu Dataset s obtaned by collectng 48 Chnese characters REN and RU wth dfferent tyeface, font and sze, n whch there are 92 Chnese characters REN and 56 Chnese characters RU. For each Chnese character, t s descrbed by 26 numercal features. The basc nformaton of the 0 datasets s lsted n table. In the exerment, we set K=5, and randomly select 70% data n each dataset as tranng set, other 30% data as testng set, For each dataset, we run 0-fold cross-valdaton ten tmes, the exermental results are the average of the 0 oututs and lsted n table 2. The exermental results demonstrate that the roosed method s effectve and effcent. In the exerment, we also exlore the relaton between the value of α and the testng accuracy n CFKNN. We change the arameterα from 0.5 to.0, and each tme adds 0.05. Wth α set to dfferent values we record the classfcaton accuraces of the CFKNN, the changng curves are shown n fgure. From the curves, we can see that the value of α does affect the classfcaton result. For Irs dataset, t s arorate for α take values n the nterval [0.5, 0.75]. For WDBC and WDBC dataset, the arorate nterval s [0.5, 0.7] and [0.5, 0.85] resectvely. For the other datasets, t s arorate for α take values n the nterval [0.5, 0.95]. In addton, n the exerment, we study the relatonsh between the value of α and the number of nstances selected by CFKNN. The curves descrbng the relatonsh between α and the number of nstances selected are obtaned and shown n fgure 2. From the curves, we have observed that the value of α does affect the number of nstances selected by CFKNN, wth the ncrease of the value of α, the number of nstances selected by CFKNN s decreased contnually, when α > 0.5, most of curves become more and more smoothly excet WPBC. So the number of nstances selected by CFKNN wll have lttle change wth the ncrease of the value of α when α > 0. 5. Consderng the testng accuracy, t s reasonable that α takes dfferent value between 0.5 and 0.95 for dfferent dataset. Fgure. The average accuracy of CFNN on 9 datasets wth dfferent thresholds Fgure 2. The number of nstance selected by CFNN on 9 datasets wth dfferent thresholds 284

Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 5. Conclusons In ths aer, n order to overcome the drawback of large comutatonal comlexty requrement of F-KNN, based on the fuzzy entroy of nstance, we roose the condensed fuzzy k-nearest neghbor rule (CFKNN. The exermental results show that our roosed method s feasble and effectve. Acknowledgments Ths research s suorted by the natonal natural scence foundaton of Chna (60903088, 60903089, by the natural scence foundaton of Hebe Provnce (F200000323, F2020063, by the Key Scentfc Research Foundaton of Educaton Deartment of Hebe Provnce (ZD20039, by the Scentfc Research Foundaton of Educaton Deartment of Hebe Provnce (200932, 200940, and by the Undergraduate Scence and Technology Innovaton Proects of Hebe Unversty (20043. References []. T. Cover, P. Hart. Nearest neghbor attern classfcaton. IEEE Transactons on Informaton Theory, 967, 3(:2-27. [2]. B. Dasarathy. Nearest Neghbor (NN Norms: NN Pattern Classfcaton Technques. Comuter Socety Press, 99. [3]. X. Wu, V. Kumar, J. R. Qunlan et al. To 0 algorthms n data mnng. Knowledge Informaton System, 2008, 4(:-37. [4]. T. M. Mtchell. Machne learnng. McGraw-Hll Comanes, Inc. 2003. [5]. K. Small, D. Roth. Margn-based actve learnng for structured redctons. Internatonal Journal of Machne Learnng and Cybernetcs, 200, (-4:3-25. [6]. L. Wang. An mroved multle fuzzy NNC system based on mutual nformaton and fuzzy ntegral. Internatonal Journal of Machne Learnng and Cybernetcs, 20, 2(:25-36. [7]. Z. Lu, Q. Wu, Y. Zhang et al. Adatve least squares suort vector machnes flter for hand tremor cancelng n mcrosurgery. Internatonal Journal of Machne Learnng and Cybernetcs, 20, 2(:37-47. [8]. P. Hart. The condensed nearest neghbor rule. IEEE Transacton on Informaton Theory, 968, 4(5:55-56. [9]. J. M. Keller, M. R. Gray, J. A. Gvens. A fuzzy k-nearest neghbor algorthm. IEEE trans. on SMC, 985, 5(4:580-585. [0]. J. H. Zha. Fuzzy decson tree based on fuzzy-rough technque [J]. Soft Comutng, 200, DOI: 0.007/s00500-00-0584-0. []. C. L. Blake. J. Merz. UCI Reostory of machne learnng databases. 996, htt://www.cs.uc.edu/~mlearn/mlreostory.html. [2]. X. Z. Wang, J. H. Zha, S. X. Lu, Inducton of multle fuzzy decson trees based on rough set technque, Informaton Scences, 2008,78(6:388-3202. 285

Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 Table. The basc nformaton of the 0 datasets used n our exerments DB Number of nstances Number of attrbutes Number of classes Irs 50 4 3 WDBC 555 30 2 WPBC 9 33 2 Glass 60 9 6 Image 94 9 7 Parknsons 95 22 2 Pma 768 8 2 CT Image 22 35 2 RenRu 48 26 2 Table 2. Exermental results wth K=5 Dataset The number of selected nstances The average accuracy CPU Tme(s Irs 0.65 50 0.96 0.026 WDBC 0.55 74 0.92 0.284 WPBC 0.80 90 0.68 0.083 Glass 0.90 3 0.68 0.0580 Image 0.90 03 0.80 0.056 Parknsons 0.95 68 0.75 0.082 Pma 0.75 377 0.69 0.4828 CT Image 0.90 83 0.84 0.049 RenRu 0.95 59 0.8 0.0256 286