Classification Methods

Similar documents
Support Vector Machines

The Research of Support Vector Machine in Agricultural Data Classification

Machine Learning 9. week

Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Machine Learning. Topic 6: Clustering

Cluster Analysis of Electrical Behavior

Associative Based Classification Algorithm For Diabetes Disease Prediction

Feature Reduction and Selection

Announcements. Supervised Learning

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Lecture 5: Multilayer Perceptrons

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

A User Selection Method in Advertising System

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

An Optimal Algorithm for Prufer Codes *

An Entropy-Based Approach to Integrated Information Needs Assessment

Machine Learning: Algorithms and Applications

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Classifier Selection Based on Data Complexity Measures *

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classification / Regression Support Vector Machines

S1 Note. Basis functions.

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Using Neural Networks and Support Vector Machines in Data Mining

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Edge Detection in Noisy Images Using the Support Vector Machines

Concurrent Apriori Data Mining Algorithms

CSCI 5417 Information Retrieval Systems Jim Martin!

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Smoothing Spline ANOVA for variable screening

Support Vector Machines. CS534 - Machine Learning

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

SVM-based Learning for Multiple Model Estimation

CS 534: Computer Vision Model Fitting

A Statistical Model Selection Strategy Applied to Neural Networks

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Problem Set 3 Solutions

Face Recognition Method Based on Within-class Clustering SVM

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Hierarchical clustering for gene expression data analysis

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Solving two-person zero-sum game by Matlab

Three supervised learning methods on pen digits character recognition dataset

Backpropagation: In Search of Performance Parameters

Feature Selection as an Improving Step for Decision Tree Construction

Face Recognition Based on SVM and 2DPCA

Unsupervised Learning

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

Unsupervised Learning and Clustering

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

A classification scheme for applications with ambiguous data

y and the total sum of

Biostatistics 615/815

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

An Anti-Noise Text Categorization Method based on Support Vector Machines *

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

LECTURE : MANIFOLD LEARNING

GSLM Operations Research II Fall 13/14

Load Balancing for Hex-Cell Interconnection Network

Discriminative classifiers for object classification. Last time

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1.

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Lecture 4: Principal components

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

The Codesign Challenge

A Lazy Ensemble Learning Method to Classification

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

Support Vector Machines for Business Applications

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Japanese Dependency Analysis Based on Improved SVM and KNN

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A Binarization Algorithm specialized on Document Images and Photos

Collaboratively Regularized Nearest Points for Set Based Recognition

An Improvement to Naive Bayes for Text Classification

Network Coding as a Dynamical System

Meta-heuristics for Multidimensional Knapsack Problems

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CLASSIFICATION OF ULTRASONIC SIGNALS

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

Correlative features for the classification of textural images

Case Mining from Large Databases

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

The Application Model of BP Neural Network for Health Big Data Shi-xin HUANG 1, Ya-ling LUO 2, *, Xue-qing ZHOU 3 and Tian-yao CHEN 4

Transcription:

1 Classfcaton Methods Ajun An York Unversty, Canada C INTRODUCTION Generally speakng, classfcaton s the acton of assgnng an object to a category accordng to the characterstcs of the object. In data mnng, classfcaton refers to the task of analyzng a set of pre-classfed data objects to learn a model (or a functon) that can be used to classfy an unseen data object nto one of several predefned classes. A data object, referred to as an example, s descrbed by a set of attrbutes or varables. One of the attrbutes descrbes the class that an example belongs to and s thus called the class attrbute or class varable. Other attrbutes are often called ndependent or predctor attrbutes (or varables). The set of examples used to learn the classfcaton model s called the tranng data set. Tasks related to classfcaton nclude regresson, whch bulds a model from tranng data to predct numercal values, and clusterng, whch groups examples to form categores. Classfcaton belongs to the category of supervsed learnng, dstngushed from unsupervsed learnng. In supervsed learnng, the tranng data conssts of pars of nput data (typcally vectors), and desred outputs, whle n unsupervsed learnng there s no a pror output. Classfcaton has varous applcatons, such as learnng from a patent database to dagnose a dsease based on the symptoms of a patent, analyzng credt card transactons to dentfy fraudulent transactons, automatc recognton of letters or dgts based on handwrtng samples, and dstngushng hghly actve compounds from nactve ones based on the structures of compounds for drug dscovery. BACKGROUND Classfcaton has been studed n statstcs and machne learnng. In statstcs, classfcaton s also referred to as dscrmnaton. Early work on classfcaton focused on dscrmnant analyss, whch constructs a set of dscrmnant functons, such as lnear functons of the predctor varables, based on a set of tranng examples to dscrmnate among the groups defned by the class varable. Modern studes explore more flexble classes of models, such as provdng an estmate of the jon dstrbuton of the features wthn each class (e.g. Bayesan classfcaton), classfyng an example based on dstances n the feature space (e.g. the k-nearest neghbor method), and constructng a classfcaton tree that classfes examples based on tests on one or more predctor varables (.e., classfcaton tree analyss). In the feld of machne learnng, attenton has more focused on generatng classfcaton expressons that are easly understood by humans. The most popular machne learnng technque s decson tree learnng, whch learns the same tree structure as classfcaton trees but uses dfferent crtera durng the learnng process. The technque was developed n parallel wth the classfcaton tree analyss n statstcs. Other machne learnng technques nclude classfcaton rule learnng, neural networks, Bayesan classfcaton, nstance-based learnng, genetc algorthms, the rough set approach and support vector machnes. These technques mmc human reasonng n dfferent aspects to provde nsght nto the learnng process. The data mnng communty nherts the classfcaton technques developed n statstcs and machne learnng, and apples them to varous real world problems. Most statstcal and machne learnng algorthms are memory-based, n whch the whole tranng data set s loaded nto the man memory before learnng starts. In data mnng, much effort has been spent on scalng up the classfcaton algorthms to deal wth large data sets. There s also a new classfcaton technque, called assocaton-based classfcaton, whch s based on assocaton rule learnng. MAIN THRUST Major classfcaton technques are descrbed below. The technques dffer n the learnng mechansm and n the representaton of the learned model. Decson Tree Learnng Decson tree learnng s one of the most popular classfcaton algorthms. It nduces a decson tree from data. A decson tree s a tree structured predcton model where each nternal node denotes a test on an attrbute, each outgong branch represents an outcome of the test, and each leaf node s labeled wth a class or Copyrght 2005, Idea Group Inc., dstrbutng n prnt or electronc forms wthout wrtten permsson of IGI s prohbted.

Fgure 1. A decson tree wth tests on attrbutes X and Y Y=A Y =? Y=B class dstrbuton. A smple decson tree s shown n Fgure 1. Wth a decson tree, an object s classfed by followng a path from the root to a leaf, takng the edges correspondng to the values of the attrbutes n the object. A typcal decson tree learnng algorthm adopts a top-down recursve dvde-and conquer strategy to construct a decson tree. Startng from a root node representng the whole tranng data, the data s splt nto two or more subsets based on the values of an attrbute chosen accordng to a splttng crteron. For each subset a chld node s created and the subset s assocated wth the chld. The process s then separately repeated on the data n each of the chld nodes, and so on, untl a termnaton crteron s satsfed. Many decson tree learnng algorthms exst. They dffer manly n attrbute-selecton crtera, such as nformaton gan, gan rato (Qunlan, 1993), gn ndex (Breman, Fredman, Olshen, & Stone, 1984), etc., termnaton crtera and post-prunng strateges. Post-prunng s a technque that removes some branches of the tree after the tree s constructed to prevent the tree from over-fttng the tranng data. Representatve decson algorthms nclude CART (Breman et al., 1984) and C4.5 (Qunlan, 1993). There are also studes on fast and scalable constructon of decson trees. Representatve algorthms of such knd nclude RanForest (Gehrke, Ramakrshnan, & Gant, 1998) and SPRINT (Shafer, Agrawal, & Mehta., 1996). Decson Rule Learnng X =? X<1 X 1 Y=C Class 1 Class 2 Class 1 Class 2 Decson rules are a set of f-then rules. They are the most expressve and human readable representaton of classfcaton models (Mtchell, 1997). An example of decson rules s f X<1 and Y=B, then the example belongs to Class 2. Ths type of rules s referred to as propostonal rules. Rules can be generated by translatng a decson tree nto a set of rules one rule for each leaf node n the tree. A second way to generate rules s to learn rules drectly from the tranng data. There s a varety of rule nducton algorthms. The algorthms nduce rules by searchng n a hypothess space for a hypothess that best matches the tranng data. The algorthms dffer n the search method (e.g. general-tospecfc, specfc-to-general, or two-way search), the search heurstcs that control the search, and the prunng method used. The most wdespread approach to rule nducton s sequental coverng, n whch a greedy general-to-specfc search s conducted to learn a dsjunctve set of conjunctve rules. It s called sequental coverng because t sequentally learns a set of rules that together cover the set of postve examples for a class. Algorthms belongng to ths category nclude CN2 (Clark & Boswell, 1991), RIPPER (Cohen, 1995) and ELEM2 (An & Cercone, 1998). Nave Bayesan Classfer The nave Bayesan classfer s based on Bayes theorem. Suppose that there are m classes, C 1, C 2,, C m. The classfer predcts an unseen example X as belongng to the class havng the hghest posteror probablty condtoned on X. In other words, X s assgned to class C f and only f C X) > C j X) for 1 j m, j. By Bayes theorem, we have X C ) C ) C X ) =. X ) As X) s constant for all classes, only P ( X C ) C ) needs to be maxmzed. Gven a set of tranng data, C ) can be estmated by countng how often each class occurs n the tranng data. To reduce the computatonal expense n estmatng X C ) for all possble Xs, the classfer makes a naïve assumpton that the attrbutes used n descrbng X are condtonally ndependent of each other gven the class of X. Thus, gven the attrbute values (x 1, x 2, x n ) that descrbe X, we have n X C ) = x j C ). j = 1 The probabltes x 1 C ), x 2 C ),, x n C ) can be estmated from the tranng data. The naïve Bayesan classfer s smple to use and effcent to learn. It requres only one scan of the tranng data. Despte the fact that the ndependence assumpton s often volated n practce, naïve Bayes often competes well wth more sophstcated classf- 2

ers. Recent theoretcal analyss has shown why the nave Bayesan classfer s so robust (Domngos & Pazzan, 1997; Rsh, 2001). Bayesan Belef Networks A Bayesan belef network, also known as Bayesan network and belef network, s a drected acyclc graph whose nodes represent varables and whose arcs represent dependence relatons among the varables. If there s an arc from node A to another node B, then we say that A s a parent of B and B s a descendent of A. Each varable s condtonally ndependent of ts nondescendents n the graph, gven ts parents. The varables may correspond to actual attrbutes gven n the data or to hdden varables beleved to form a relatonshp. A varable n the network can be selected as the class attrbute. The classfcaton process can return a probablty dstrbuton for the class attrbute based on the network structure and some condtonal probabltes estmated from the tranng data, whch predcts the probablty of each class. The Bayesan network provdes an ntermedate approach between the naïve Bayesan classfcaton and the Bayesan classfcaton wthout any ndependence assumptons. It descrbes dependences among attrbutes, but allows condtonal ndependence among subsets of attrbutes. The tranng of a belef network depends on the senaro. If the network structure s known and the varables are observable, tranng the network only conssts of estmatng some condtonal probabltes from the tranng data, whch s straghtforward. If the network structure s gven and some of the varables are hdden, a method of gradent decent can be used to tran the network (Russell, Bnder, Koller, & Kanazawa, 1995). Algorthms also exst for learnng the netword structure from tranng data gven observable varables (Buntme, 1994; Cooper & Herskovts, 1992; Heckerman, Geger, & Chckerng, 1995). The k-nearest Neghbour Classfer The k-nearest neghbour classfer classfes an unknown example to the most common class among ts k nearest neghbors n the tranng data. It assumes all the examples correspond to ponts n a n-dmensonal space. A neghbour s deemed nearest f t has the smallest dstance, n the Eucldan sense, n the n-dmensonal feature space. When k = 1, the unknown example s classfed nto the class of ts closest neghbour n the tranng set. The k-nearest neghbour method stores all the tranng examples and postpones learnng untl a new example needs to be classfed. Ths type of learnng s called nstance-based or lazy learnng. The k-nearest neghbour classfer s ntutve, easy to mplement and effectve n practce. It can construct a dfferent approxmaton to the target functon for each new example to be classfed, whch s advantageous when the target functon s very complex, but can be dscrbed by a collecton of less complex local approxmatons (Mtchell, 1997). However, ts cost of classfyng new examples can be hgh due to the fact that almost all the computaton s done at the classfcaton tme. Some refnements to the k-nearest neghbor method nclude weghtng the attrbutes n the dstance computaton and weghtng the contrbuton of each of the k neghbors durng classfcaton accordng to ther dstance to the example to be classfed. Neural Networks Neural networks, also referred to as artfcal neural networks, are studed to smulate the human bran although brans are much more complex than any artfcal neural network developed so far. A neural network s composed of a few layers of nterconnected computng unts (neurons or nodes). Each unt computes a smple functon. The nput of the unts n one layer are the outputs of the unts n the prevous layer. Each connecton between unts s assocated wth a weght. Parallel computng can be performed among the unts n each layer. The unts n the frst layer take nput and are called the nput unts. The unts n the last layer produces the output of the networks and are called the output unts. When the network s n operaton, a value s appled to each nput unt, whch then passes ts gven value to the connectons leadng out from t, and on each connecton the value s multpled by the weght assocated wth that connecton. Each unt n the next layer then receves a value whch s the sum of the values produced by the connectons leadng nto t, and n each unt a smple computaton s performed on the value - a sgmod functon s typcal. Ths process s then repeated, wth the results beng passed through subsequent layers of nodes untl the output nodes are reached. Neural networks can be used for both regresson and classfcaton. To model a classfcaton functon, we can use one output unt per class. An example can be classfed nto the class correspondng to the output unt wth the largest output value. Neural networks dffer n the way n whch the neurons are connected, n the way the neurons process ther nput, and n the propogaton and learnng methods used (Nurnberger, Pedrycz, & Kruse, 2002). Learnng a neural network s usually restrcted to modfyng the weghts based on the tranng data; the structure of the ntal network s usually left unchanged durng the learnng process. A typcal network structure s the C 3

multlayer feed-forward neural network, n whch none of the connectons cycles back to a unt of a prevous layer. The most wdely used method for tranng a feedforward neural network s backpropagaton (Rumelhart, Hnton, & Wllams, 1986). Support Vector Machnes The support vector machne (SVM) s a recently developed technque for multdmensonal functon approxmaton. The objectve of support vector machnes s to determne a classfer or regresson functon whch mnmzes the emprcal rsk (that s, the tranng set error) and the confdence nterval (whch corresponds to the generalzaton or test set error) (Vapnk, 1998). Gven a set of N lnearly separable tranng examples S = { x n R = 1,2,..., N}, where each example belongs to one of the two classes, represented by y { + 1, 1}, the SVM learnng method seeks the optmal hyperplane w x + b = 0, as the decson surface, whch separates the postve and negatve examples wth the largest margn. The decson functon for classfyng lnearly separable data s: f ( x ) = sgn( w x + b), where w and b are found from the tranng set by solvng a constraned quadratc optmzaton problem. The fnal decson functon s N f ( x) = sgn α y ( x x) + b. = 1 The functon depends on the tranng examples for whch α s non-zero. These examples are called support vectors. Often the number of support vectors s only a small fracton of the orgnal dataset. The basc SVM formulaton can be extended to the nonlnear case by usng nonlnear kernels that map the nput space to a hgh dmensonal feature space. In ths hgh dmensonal feature space, lnear classfcaton can be performed. The SVM classfer has become very popular due to ts hgh performances n practcal applcatons such as text classfcaton and pattern recognton. FUTURE TRENDS Classfcaton s a major data mnng task. As data mnng becomes more popular, classfcaton technques are ncreasngly appled to provde decson support n busness, bomedcne, fnancal analyss, telecommuncatons and so on. For example, there are recent applcatons of classfcaton technques to dentfy fraudulent usage of credt cards based on credt card transacton databases; and varous classfcaton technques have been explored to dentfy hghly actve compounds for drug dscovery. To better solve applcaton-specfc problems, there has been a trend toward the development of more applcaton-specfc data mnng systems (Han & Kamber, 2001). Tradtonal classfcaton algorthms assume that the whole tranng data can ft nto the man memory. As automatc data collecton becomes a daly practce n many busnesses, large volumes of data that exceed the memory capacty become avalable to the learnng systems. Scalable classfcaton algorthms become essental. Although some scalable algorthms for decson tree learnng have been proposed, there s stll a need to develop scalable and effcent algorthms for other types of classfcaton technques, such as decson rule learnng. Prevously, the study of classfcaton technques focused on explorng varous learnng mechansms to mprove the classfcaton accuracy on unseen examples. However, recent study on mbalanced data sets has shown that classfcaton accuracy s not an approprate measure to evaluate the classfcaton performance when the data set s extremely unbalanced, n whch almost all the examples belong to one or more, larger classes and far fewer examples belong to a smaller, usually more nterestng class. Snce many real world data sets are unbalanced, there has been a trend toward adjustng exstng classfcaton algorthms to better dentfy examples n the rare class. Another ssue that has become more and more mportant n data mnng s prvacy protecton. As data mnng tools are appled to large databases of personal records, prvacy concerns are rsng. Prvacy-preservng data mnng s currently one of the hottest research topcs n data mnng and wll reman so n the near future. CONCLUSION Classfcaton s a form of data analyss that extracts a model from data to classfy future data. It has been studed n parallel n statstcs and machne learnng, and s currently a major technque n data mnng wth a broad applcaton spectrum. Snce many applcaton problems can be formulated as a classfcaton problem and the volume of the avalable data has become overwhelmng, developng scalable, effcent, doman-specfc, and prvacy-preservng classfcaton algorthms s essental. 4

REFERENCES An, A., & Cercone, N. (1998). ELEM2: A learnng system for more accurate classfcatons. Proceedngs of the 12th Canadan Conference on Artfcal Intellgence, 426-441. Breman, L., Fredman, J., Olshen, R., & Stone, C. (1984). Classfcaton and regresson trees, Wadsworth Internatonal Group. Buntne, W.L. (1994). Operatons for learnng wth graphcal models. Journal of Artfcal Intellgence Research, 2, 159-225. Castllo, E., Gutérrez, J.M., & Had, A.S. (1997). Expert systems and probablstc network models. New York: Sprnger-Verlag. Clark P., & Boswell, R. (1991). Rule nducton wth CN2: Some recent mprovements. Proceedngs of the 5 th European Workng Sesson on Learnng, 151-163. Cohen, W.W. (1995). Fast effectve rule nducton. Proceedngs of the 11th Internatonal Conference on Machne Learnng, 115-123, Morgan Kaufmann. Cooper, G., & Herskovts, E. (1992). A Bayesan method for the nducton of probablstc networks from data. Machne Learnng, 9, 309-347. Domngos, P., & Pazzan, M. (1997). On the optmalty of the smple Bayesan classfer under zero-one loss. Machne Learnng, 29, 103-130. Gehrke, J., Ramakrshnan, R., & Gant, V. (1998). RanForest - A framework for fast decson tree constructon of large datasets. Proceedngs of the 24 th Internatonal Conference on Very Large Data Bases. Han, J., & Kamber, M. (2001). Data mnng Concepts and technques. Morgan Kaufmann. Heckerman, D., Geger, D., & Chckerng, D.M. (1995) Learnng bayesan networks: The combnaton of knowledge and statstcal data. Machne Learnng, 20, 197-243. Mtchell, T.M. (1997). Machne learnng. McGraw-Hll. Nurnberger, A., Pedrycz, W., & Kruse, R. (2002). Neural network approaches. In Klosgen & Zytkow (Eds.), Handbook of data mnng and knowledge dscovery. Oxford Unversty Press. Pearl, J. (1986). Fuson, propagaton, and structurng n belef networks. Artfcal Intellgence, 29(3), 241-288. Qunlan, J.R. (1993). C4.5: Programs for machne learnng. Morgan Kaufmann. Rsh, I. (2001). An emprcal study of the nave Bayes classfer. Proceedngs of IJCAI 2001 Workshop on Emprcal Methods n Artfcal Intellgence. Rumelhart, D.E., Hnton, G.E., & Wllams, R.J. (1986). Learnng representatons by back-propagatng errors. Nature, 323, 533-536. Russell, S., Bnder, J., Koller, D., & Kanazawa, K. (1995). Local learnng n probablstc networks wth hdden varables. Proceedngs of the 14 th Jont Internatonal Conference on Artfcal Intellgence, 2, 1146-1152. Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classfer for data mnng. Proceedngs of the 22 th Internatonal Conference on Very Large Data Bases. Vapnk, V. (1998). Statstcal learnng theory. New York: John Wley & Sons. KEY TERMS Backpropagaton: A neural network tranng algorthm for feedforward networks where the errors at the output layer are propagated back to the prevous layer to update connecton weghts n learnng. If the prevous layer s not the nput layer, then the errors at ths hdden layer are propagated back to the layer before. Dsjunctve Set of Conjunctve Rules: A conjunctve rule s a propostonal rule whose antecedent conssts of a conjuncton of attrbute-value pars. A dsjunctve set of conjunctve rules conssts of a set of conjunctve rules wth the same consequent. It s called dsjunctve because the rules n the set can be combned nto a sngle dsjunctve rule whose antecedent conssts of a dsjuncton of conjunctons. Generc Algorthm: An algorthm for optmzng a bnary strng based on an evolutonary mechansm that uses replcaton, deleton, and mutaton operators carred out over many generatons. Informaton Gan: Gven a set E of classfed examples and a partton P = {E 1,..., E n } of E, the nformaton gan s defned as E entropy( E), n entropy( E ) * = 1 E where X s the number of examples n X, and m entropy( X ) = p j log 2 ( p j ) (assumng there are m classes j= 1 5 C

n X and p j denotes the probablty of the jth class n X). Intutvely, the nformaton gan measures the decrease of the weghted average mpurty of the parttons E 1,..., E n, compared wth the mpurty of the complete set of examples E. Machne Learnng: The study of computer algorthms that develop new knowledge and mprove ts performance automatcally through past experence. Rough Set Data Analyss: A method for modelng uncertan nformaton n data by formng lower and upper approxmatons of a class. It can be used to reduce the feature set and to generate decson rules. Sgmod Functon: A mathematcal functon defned by the formula 1 t) = 1 + e Its name s due to the sgmod shape of ts graph. Ths functon s also called the standard logstc functon. t 6