An Anti-Noise Text Categorization Method based on Support Vector Machines *

Similar documents
The Research of Support Vector Machine in Agricultural Data Classification

Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Support Vector Machines

Announcements. Supervised Learning

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Classification / Regression Support Vector Machines

Feature Reduction and Selection

Classifier Selection Based on Data Complexity Measures *

Edge Detection in Noisy Images Using the Support Vector Machines

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Support Vector Machines. CS534 - Machine Learning

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Data Mining: Model Evaluation

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Cluster Analysis of Electrical Behavior

A User Selection Method in Advertising System

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Classifying Acoustic Transient Signals Using Artificial Intelligence

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Face Recognition Method Based on Within-class Clustering SVM

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS 534: Computer Vision Model Fitting

Efficient Text Classification by Weighted Proximal SVM *

An Entropy-Based Approach to Integrated Information Needs Assessment

CLASSIFICATION OF ULTRASONIC SIGNALS

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Discriminative classifiers for object classification. Last time

Feature Selection as an Improving Step for Decision Tree Construction

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Solving two-person zero-sum game by Matlab

Unsupervised Learning and Clustering

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Smoothing Spline ANOVA for variable screening

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

An Improvement to Naive Bayes for Text Classification

Unsupervised Learning

Network Intrusion Detection Based on PSO-SVM

Performance Evaluation of Information Retrieval Systems

Pruning Training Corpus to Speedup Text Classification 1

Japanese Dependency Analysis Based on Improved SVM and KNN

TN348: Openlab Module - Colocalization

An Improved Image Segmentation Algorithm Based on the Otsu Method

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Relevance Feedback Document Retrieval using Non-Relevant Documents

Using Neural Networks and Support Vector Machines in Data Mining

A Deflected Grid-based Algorithm for Clustering Analysis

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Associative Based Classification Algorithm For Diabetes Disease Prediction

Face Recognition Based on SVM and 2DPCA

Detection of an Object by using Principal Component Analysis

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Specialized Weighted Majority Statistical Techniques in Robotics (Fall 2009)

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Deep Classification in Large-scale Text Hierarchies

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

Optimizing Document Scoring for Query Retrieval

A Robust LS-SVM Regression

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis

Machine Learning 9. week

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

Learning Statistical Structure for Object Detection

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

A Novel Term_Class Relevance Measure for Text Categorization

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

A Facet Generation Procedure. for solving 0/1 integer programs

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

The Study of Remote Sensing Image Classification Based on Support Vector Machine

INF 4300 Support Vector Machine Classifiers (SVM) Anne Solberg

SVM-based Learning for Multiple Model Estimation

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

CSCI 5417 Information Retrieval Systems Jim Martin!

Learning-Based Top-N Selection Query Evaluation over Relational Databases

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

Multiclass Object Recognition based on Texture Linear Genetic Programming

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Classification and clustering using SVM

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

An efficient iterative source routing algorithm

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Transcription:

An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn, agnes_nudt@yahoo.com.cn Abstract. Wth the rapd growth of onlne nformaton, text categorzaton has become one of the ey technques for handlng and organzng text data. Though the natve features of SVM (Support Vector Machnes) are better than Naïve Bayes for text categorzaton n theory, the classfcaton precson of SVM s lower than Bayesan method n real world. Ths paper tres to fnd out the mysteres by analyzng the shortages of SVM, and presents an ant-nose SVM method. The mproved method has two characterstcs: 1) It chooses the classfcaton space by defnng the optmal n-dmenson classfyng hyperspace. 2) It separates nose samples by preprocessng, and trans the classfer usng nose free samples. ompared wth naïve Bayes method, the classfcaton precson of ant-nose SVM s ncreased about 3 to 9 percent. Keywords: Support Vector Machnes; Outler detecton; Bayes Method 1 Introducton Wth the rapd growth of Internet, text categorzaton has become one of the ey technques for handlng and organzng text data. Text categorzaton s used to classfy of text documents nto categores of le documents that can reduce the overhead requred and provde smaller domans n whch the users may explore smlar documents. Snce buldng text classfers by hand s dffcult and tmeconsumng, more recently, researchers have explored the use of machne learnng technques to automatcally assocate documents wth categores usng a tranng set to adapt the classfer. A lot of statstcal classfcaton and machne learnng technques have been appled n text categorzaton. These nclude Naïve Bayes models [1-4], nearest neghbor classfers [5], decson trees [6][7], neural networs [8][9], symbolc rule learnng [10] and SVM Learnng [11-13]. * Ths wor s supported by the Natonal Grand Fundamental Research 973 Program of hna under Grant No. 2003B314802.

In the paper, we are ntent to fnd out how to mprove precson of SVM by comparng t wth Naïve Bayes method n text categorzaton. The nave vrtues of SVM mae t more approprate for text categorzaton than Bayesan method n theory. However, under the condton that tranng samples have noses, the hyperplane constructed wll badly devate from real optmal hyperplane. For example, there s a postve sample whose characterstc s more close to negatve samples. lassfcaton precson of SVM wll largely declne, even lower than Bayesan method. To solve ths problem, the paper presents an ant-nose classfyng method based on SVM. The mproved method optmzes hgh dmenson space frst, and then bulds classfer by removng noses from tranng samples. Experments prove that the classfyng precson of ant-nose SVM ncreased about 3 to 9 percent than Bayesan method. The rest of the paper s organzed as follows. Secton 2 ntroduces the theores of SVM and Naïve Bayes method. Secton 3 measurements the precson of SVM and Bayesan method, and then analyzes the shortage of SVM n text categorzaton. Secton 4 presents an optmal hyperspace choosng method and an ant-nose SVM classfcaton method. Smulated experments are offered n secton 5. Secton 6 concludes the paper. 2 Related wors 2.1 SVM (Support Vector Machnes) SVM [13] can solve two-class classfcaton problems, whch based on fndng a separaton between hyperplanes defned by classes of data. Agan label the tranng data { x, y }, d = 1,, l, y { 1, + 1}, x R.Suppose we have some hyperplanes whch separate the postve from the negatve examples (a separatng hyperplane ). The ponts x whch le on the hyperplane satsfy w x + b = 0, where w s normal to hyperplane, b w s the perpendcular dstance from the hyperplane to the orgn, and w s the Eucldean norm of w. Let d ( d ) be the shortest dstance + from the separatng hyperplane to the closest postve (negatve) example. Defne the margn of a separatng hyperplane to be d + d.for the lnearly separable case, the + support vector algorthm smply loo for the separatng hyperplane wth largest margn. The hyperplane wth largest margn s called optmal hyperplane. Ths can be formulated as follows: suppose that all the tranng data satsfy the followng constrants: w x + b + 1,for y = + 1 (1) w x + b 1,for y = 1 (2) These can be combned nto one set of nequaltes:

y ( wx + b) 1 0, = 1,, l (3) Thus we ntroduce postve Lagrange multplers α, = 1,, l, for equalty constrants, the Lagrange multplers are unconstraned. Ths gves Lagrangan: l 1 (4) 2 LP = w α { y ( x w+ b) 1} 2 = 1 We must now mnmze L P wth respect to all α vansh, all subject to the constrants α 0. Requrng that the gradent of L P wth respect to w and b vansh gve the condtons: w= α y x (5) α y = 0 (6) Snce these are equalty constrants n the dual formulaton, we can substtute them nto Eq. (4) to gve 1 LD = α αα yyx x 2, j j j j By applyng KKT (Karush-Kuhn-Tucer) condtons, the result must subject to: α { y ( ω x + b) 1} = 0, = 1,2,..., l (8) There s a Lagrange multpler α for every tranng pont. In the soluton, those ponts whchα > 0 are called support vector. For all other ponts, they haveα = 0, whch s unused when tranng. It can get b * * * from b = y ω x by choosng a support vector. At last, classfer can classfy texts from followng functon * * H ( x) = sgn( ω x+ b ) (9) (7) 2.2 Naïve Bayes classfer Naïve Bayes classfer learns from tranng data the condtonal probablty of each attrbute A gven the class label.classfcaton s then done by applyng Bayesan rule to compute the probablty of gven the partcular nstance of A 1,, A, and n then predctng the class wth the hghest posteror probablty. Ths computaton s

rendered feasble by mang a strong ndependence assumpton: all the attrbutes A are condtonally ndependent gven the value of the class. By ndependence we mean probablstc ndependence, that s, A s ndependent of B gven whenever Pr( A B, ) = Pr( A ) for all possble values of A, B and, whenever Pr( ) > 0 [2]. 2.3 SVM s better than Bayesan method n theory Thorsten Joachms [11] provdes several advantages of SVM for text categorzaton. We compare them wth Bayesan method. 1) SVM has potental to handle Hgh dmensonal nput space. The number of potental dfferent words used n text s very large, thus the nput space of text classfer s composed of many features. Snce SVM use over-fttng protecton, whch does not necessarly depend on the number of features, they have the potental to handle these large feature spaces. Whle Bayesan method must calculate posteror probablty from pror probablty, n hgh dmensonal space, whch may be affected by over fttng problem. Therefore, SVM s more effcent than Bayesan method for t can use the raw statstcal values. 2) SVM can process relevant features effectvely. Naïve Bayes classfcaton only uses rrelevant features. Unfortunately, there are very few rrelevant features n text categorzaton. Feature selecton s more dffcult n Bayesan method. When some features are assumed rrelevant, precson of classfcaton s decreased. Whle SVM can avod t, SVM can process both rrelevant features and relevant ones. 3) SVM s born to classfy two nds of samples. Most text categorzaton problems are lnearly separable. SVM s born to classfy two nds of samples. It can completely apart two samples by fndng an optmal hyperplane under lnear separable condtons. Mult-class classfcaton can transform to mult two class classfcaton problems. Bayesan method can deal wth the problem straghtforward. 4) SVM s well sutable for problems wth dense concepts and sparse nstances [13]. Document vectors are sparse. For each document, the correspondng document vector contans only few entres that are not zero. It has been proved that SVM s well sutable for problems wth dense concepts and sparse nstances. When document vectors are sparse, the result of naïve Bayes usng statstcal theory s poor. The natve features of SVM mae t more approprate for text categorzaton than Bayesan method.

3 Measurements and analyze 3.1 Measurements We choose 1000 texts about news and scence as test samples, and select 200 texts from canddates as tranng samples. When comparng two methods, the result n realty can not support the standpont n secton 2.3. In the followng tables, n represents number of features we selected. Table 1. Precson of SVM n=300 n=800 n=1000 n=1500 true postves 85.3% 88.5% 90.9% 93.1% false postves 86.2% 87.6% 92.6% 92.2% Table 2. Precson of Naïve Bayes method n=300 n=800 n=1000 n=1500 true postves 87.1% 89.4% 93.9% 96.6% false postves 88.7% 90.3% 94.1% 95.9% The strange results mae us to fnd what nfluence the SVM. There must be some mysteres n SVM when appled nto real world. 3.2 Shortages of SVM SVM has better nature features than Naïve Bayes, but n real world, t gets opposte results. We try to fnd out mysteres by analyzng the shortages of SVM. At last, we draw followng conclusons. 1) SVM has no crtera n feature choce. SVM can classfy text perfectly. However, f t uses every words emerged n text smply as a dmenson of hyperspace, the computaton of hyperplane wll be very dffcult and classfcaton precson wll be low. Thus, one of our research emphases s how to choose mportant and useful features to optmze mult-dmenson space. 2) The ant-nose ablty of SVM s wea. Although SVM s treated as a good text categorzaton method, ts ant-nose ablty s very wea. Support Vector s a tranng sample wth shortest dstance to the hyperplane. The number of support vector s small, but t contans all nformaton needed for classfcaton. lassfyng effect s decded by mnorty support vectors n the samples, so removng or reducng the samples that are not support vectors has no nfluence on the classfer. If a nose sample s treated as support vector, t wll largely reduce classfcaton precson of SVM. If we get rd of nose-samples frst, then tran SVM by optmzed samples, we can acheve hgher classfyng precson.

4 SVM-based Ant-nose text categorzaton methods In order to obtan hgher precson, we need to get over shortages of SVM. In ths secton, we enhance the method from two aspects. 4.1onstructng an optmal classfyng hyperspace Effcency and effect of SVM s largely nfluenced by the number of dmenson and every dmenson of hyperspace. Although SVM has advantages n text classfyng, t has no crtera n dmenson choce. Ths secton uses statstcal method to choose the most mportant features as dmensons of classfcaton space. Texts consst of words. Frequency of a word can be treated as a dmenson of hyperspace. Nevertheless, the number of words n texts s very large n general. Whch words are chosen as dmensons of hyperspace s very dffcult to decde. As fgure 1 shows, upper dots denote samples n class, and lower squares denote samples n class. We now that hyperspace n fgure (b) s better than fgure (a) s for dfference between and s the more apparent. HS 1 The optmal hyperplane HS 2 The optmal hyperplane Fg. 1. (a) n-dmenson hyperspace HS 1 (b) n-dmenson hyperspace HS 2 Therefore, we need a crteron to choose certan words accordng to ntal learnng samples and construct optmal hyperspace for classfcaton. Assumng HS as n-dmenson hyperspace, each dmenson s frequency of a word. Defnton 1. Barycentre of samples that belong to class n HS s t d Sample B =, Sample t = ( Frd( w1), Frd( w2),..., Frd( wn)) denotes for a sample pont n n-dmenson hyperspace HS, Frt( w ) denotes frequency of word w n text d. Defnton 2. we call HS as optmal classfyng n-dmenson hyperspace about, ff B B for all samples s maxmum under some w, and set cardnalty of w s n. (a) (b)

Defnton 3. The pror odds on class as O ( ) = P ( )/ P( ), O ( ) measures the predctve or prospectve support accorded to by bacground nowledge alone. In practce, we can calculate the pror odds on by the followng formula. [14] O( ) = { t t t Sample} / { t t t Sample} (10) Defnton 4. Defnng the lelhood rato of word w on as: Lw ( ) = Pw ( ) Pw ( ) (11) Lw ( ) denotes the retrospectve support gven to by evdence actually observed. Pw ( ) denotes the average frequency of word w n sample texts. Theorem 1. The posteror odds are gven by the product as follow: O ( w) = Lw ( ) O ( ) (12) In practce, we can calculate Pw ( ) by frequency of w n samples of and Pw ( ) by frequency of w n. At last, we can wor out O ( w ) from equaton (10) (11) (12). OSpam ( w ) represents the effect of classfyng accordng to w s frequency. Theorem 2. When choosng frst n maxmum O ( w ), we can construct optmal hyperspace HS by correspondng Fr( w ) as a dmenson. HS represents a hyperspace n whch the dfferent between and s the most apparent. Text d n HS can be calculated by t HS = ( Fr( w1), Fr( w2),..., Fr( w )), and w n s one of n maxmum O ( w ) words. 4.2 Improvng ant-nose ablty of SVM SVM has hgh classfcaton precson under condtons wth no noses. In nosy condtons, the precson reduces largely. As Fgure 2 shows, pont x s a nose sample n an n-dmenson hyperspace. Although x belong to postve samples, t s largely dfferent from other postve samples. If we consder x as a support vector when computng optmal hyperplane, t wll mae the hyperplane devate from real optmal hyperplane largely. lassfcaton precson s affected serously. The optmal hyperplane affected by nose sample x The optmal hyperplane n the non-nose condton Nose x

Fg. 2. Nose sample x effect the optmal hyperplane Although x s postve one n samples, ts characterstc s much more dfferent from other postve samples and may be close to negatve samples under some condtons. That s, the correspondng pont x n hgh dmenson space s an outler. Noses n negatve samples have the same characterstc. If we elmnate these noses n samples before tranng SVM, the classfcaton precson wll ncrease largely. As Fgure 3 shows, we can get more reasonable optmal hyperplane after gnorng the nfluence of x when tranng. The optmal hyper-plane Nose x Fg. 3. The optmal hyperplane when gnorng nose sample x In order to construct an ant-nose text classfer, we present a method that flter nose samples by outler detecton n hgh dmensonal space before tranng SVM. Supposng D s a classfed sample set, o, p, q are samples n D, d( p, q ) represents the dstance between samples p and q [15]. Defnton 5. ( dstance of sample p, dst( p) ) d( p, o ) represents the dstance between sample p and sample o n set D. If there are at least samples o' D subject to d( p, o') d( p, o) and at most ( 1) samples o' D subject to d( p, o') < d( p, o), whch called dstance of sample p, dst( p). Defnton 6. ( nearest neghbors of sample p, N ( p )) The sample set n set D whose dstance to p do not exceed dst( p) : N ( p) = { q D\{ p}, d( p, q) dst( p))}. Defnton 7. (Local densty of sample p, den ( p ) ) Local densty of sample p represents recprocal of N ( p) average dst -, that s den ( p) = 1/ avg{ dst( q) q N ( p)}. Defnton 8. (Local outler coeffcent of sample p, LOF ( p ) ) Local outler coeffcent of sample p represents the rato between average densty of N ( p ) and den ( p ), that s LOF ( p) = avg{ den ( q) q N ( p)}/ den ( p). Local outler coeffcent reflects dscrete case of sample p relatve to nearest neghbors around. In order to separate nose samples, we need to calculate LOF () t for each text t n class and,f LOF ( x ) s greater than threshold θ, we conclude that t s an t outler, that s, text t s nose n samples.

At last, we get a reasonable classfcaton functon flterng nose samples. * * H( x) = sgn( ω x + b ) by 5 Valdty test onsderng the problem of classfyng texts, we partton the tranng samples nto set and manually frst. Then, we select n words accordng to secton 4.1, and then remove nose samples accordng to threshold θ by calculatng LOF () t for * * each text n or. At last, classfcaton functon H( x) = sgn( ω x+ b ) s obtaned. We select 1000 test samples and 200 tranng samples as secton 3.1. We test the method usng parameter n ( n s the number of dmenson) and θ = 20%. Table 3. precson of ant-nose method by dfferent parameter n and θ = 20%. n=300 n=800 n=1000 n=1500 true postves 96.7% 97.8% 99.5% 99.8% false postves 97.2% 98.1% 99.7% 99.9% From table 1 and table2, we can conclude SVM ft text categorzaton better n theory, but ts precson s worse than Bayesan method n practce. From table 1 and table 3, we can fnd that precson of classfer ncreased about 6 to11 percent after we apply ant-nose method. And from table2 and table 3, we prove that ant-nose SVM method shows ts advantage n text categorzaton, the precson of classfer ncreased about 3 to 9 percent compared wth Naïve Bayes method. 6 onclusons Ths paper enhances support vector machnes for text categorzaton. Recognzng that SVM has better nave feature than Naïve Bayes method, we conclude that SVM s preferable at least for text categorzaton. But n practce, the classfcaton precson of SVM s lower than Naïve Bayes. The strange results mae us to fnd what nfluence the SVM. There must be some mysteres n SVM when appled nto real world. We found that SVM has no crtera n feature choce, so we construct optmal hyperspace for classfcaton by gvng a defnton of optmal n-dmenson classfyng hyperspace. Moreover, we fnd that the ant-nose ablty of SVM s wea, we separate nose samples by preprocessng and buld text classfer that s traned from nose free samples. In the overall comparson of ant-nose SVM and Naïve Bayes method for1000 test samples, the results over dfferent parameter n for precson ndcate sgnfcantly dfferences n the performance of the ant-nose SVM over Naïve Bayes method. lassfcaton precson of ant-nose SVM ncreased about 3 to 9 percent.

References [1] Yang, Y. An evaluaton of statstcal approaches to text categorzaton. MU Techncal Report, MU-S-97-127, Aprl 1997. [2] Fredman N, Goldszmdt M, Buldng classfer usng Bayesan Networs. In: Proc Natonal onference on Artfcal Intellgence, Menlo Par, A: AAAI Press, 1996:1277~1284. [3] Ion Androutsopoulos, John Koutsas, Konstantnos V. handrnos, George Palouras and onstantne D. Spyropoulos. An Evaluaton of Nave Bayesan Ant-Spam Flterng. 2000. [4] ross Valdaton for the nave Bayes lassfer of SPAM. http://stat-www.bereley.edu/users/nolan/stat133/spr04/projects/spampart2.pdf, 2004.3. [5] Lewm D.D. and Rnguuette, M. A comparson of two learnng algorthms for text categorzaton. In Thrds Annual Symposum on Document Analyss and Informaton Retreval, 81-93, 1994. [6] Sholom M. Wess, etc, Maxmzng Text-Mnng Performance, IEEE Intellgent Systems 2-8, July/August,1999. [7] Mchelne K, Lara W, Wang G, et al. Generalzaton and decson tree nducton: effcent classfcaton n data mnng. ftp://ftp.fas.sfu.ca/pub/cs/han/dd/rde97.ps.1997-02-13 [8]ZhouZ, hens, henz.fann: A fast adaptve neural networ classfer. Internatonal Journal of Knowledge and Informaton Systems,2000,2(1):115~129 [9] LeeJ, TsaJ. On-lne fault detecton usng ntegrated neural networs. In: Proc of Applcatons of Artfcal Neural Networs SPIE,1992.436~446 [10]J.Kven, M.Warmuth, and P.Auer. The percepton algorthm vs. wndow: Lnear vs. logarthmc mstae bounds when few nput varables are relevant. In onference on omputatonal Learnng Theory, 1995 [11] Thorsten Joachms. Text ategorzaton wth Support Vector Machnes: Learnng wth Many Relevant Features. Proceedngs of EML-98, 10th European onference on Machne Learnng, 1997. [12] A. Basu,. Watters, and M. Shepherd. Support Vector Machnes for Text ategorzaton. Proceedngs of the 36th Hawa Internatonal onference on System Scences (HISS 03). [13] Burges., A Tutoral on Support Vector Machnes for Pattern Recognton, Journal of data Mnng and Knowledge Dscovery, 2(2),121-167, 1998. [14] Judea Pearl. Probablstc Reasonng n Intellgent Systems: Networs of Plausble Inference. ISBD 0-934613-73-7. [15] XU LongFe, XIONG JunL et al. Study on Algorthm for Rough Set based Outler Detecton n hgh Dmenson Space, omputer Scence, 2003 VOL.30, No.10 (n hnese).