Qualitative classification and evaluation in possibilistic decision trees

Qualitative classification evaluation in possibilistic decision trees Nahla Ben Amor Institut Supérieur de Gestion de Tunis, 41 Avenue de la liberté, 2000 Le Bardo, Tunis, Tunisia E-mail: nahlabenamor@gmxfr Salem Benferhat CRIL - CNRS, Université d Artois, Rue Jean Souvraz SP 18 62307 Lens, Cedex France E-mail: benferhat@criluniv-artoisfr Zied Elouedi Institut Supérieur de Gestion de Tunis, 41 Avenue de la liberté, 2000 Le Bardo, Tunis, Tunisia E-mail: ziedelouedi@gmxfr Abstract This paper presents a method for classifying objects in an uncertain context using decision trees Uncertainty is related to attributes values of objects to classify is hled in a qualititative possibilistic framework Then, an evaluation method to judge the classification efficiency, in an uncertain context, is proposed I INTRODUCTION Decision trees are efficient methods used in classification problems They consist of decision nodes for testing attributes, edges for branching attribute values leaves for labeling classes [9], [7] The decision tree technique is composed of two major procedures [2], [11]: 1) Building the tree: A decision tree can be built based on a given training set It consists in finding for each decision node the appropriate test attribute by using an attribute selection measure also to define the class labeling each leaf satisfying one of the stopping criteria 2) Classifying objects: We start by the root of the decision tree, then we test the attribute specified by this node According to the result of the test, we move down the tree branch relative to the attribute value of the given object This process will be repeated until a leaf is encountered This leaf is labeled by a class As pointed out in several researches [1], [4], [5], [6], [10], the classical methods of induction of decision trees do not deal with uncertain data Ignoring it can affect the value of the results of classification In order to adapt decision trees to uncertainty imprecision, we first propose different manners to classify objects with uncertain/missing attributes using qualitative possibility theory Then, we propose a criterion allowing to judge the efficiency of the classifier in an uncertain context We illustrate our approach with a same running example from an intrusion detection system area The paper will be organized as follows: Section 2 presents an overview of the possibility theory Section 3 recalls the basics of possibilistic decision trees In Section 4, we describe our leximin/leximax classification in possibilistic decision trees In Section 5, the evaluation of the classification efficiency of possibilistic decision trees will be detailed II POSSIBILITY THEORY This Section gives a brief recalling on possibility theory (for more details see [3]) Uncertainty is here assumed to be represented qualitatively by a finite totally ordered scale denoted by such that If is a set of uncertainty degrees, we define (resp ) such that such that (resp ) The basic concept of possibility theory, when uncertainty is represented qualitatively, is the notion of Qualitative Possibility Distribution (QPD), simply denoted by A QPD is a function which associates to each element of the universe of discourse an element from, ( encodes our beliefs on a real world) By convention, means that it is completely possible is the real world, means that cannot be the real world, means that is at least as possible as to be the real world A QPD is said to be normalized if there exists at least one state which is totally possible (ie ) We define the possibility measure of any event by: This measure evaluates at which level our knowledge represented by III POSSIBILISTIC DECISION TREES (1) is consistent with In this section, we do not detail the construction of decision trees which is based on a given training set where attribute values classes are defined precisely (for more details see [11]) We are rather interested on how to classify objects characterized by uncertain attributes values where the uncertainty is presented by qualitative possibility distributions We assign for each attribute a possibility distribution expressing the uncertainty in a qualitative way by encoding it in the interval Let be different attributes of the problem The instance to classify is described by a vector of possibility distributions An attribute is precisely

defined if there exists exactly one value such that, for all other values A missing data regarding an attribute, is represented by a uniform possibility distribution (ie, In stard possibility theory, the basic operators min/max are used in order to choose the more plausible path in the tree At first, we should compute the possibility degree of each path (from a root to a leaf class) by applying the minimum operator on its attributes values Then, the most plausible path is the one presenting the highest possibility degree, in other words, we should apply the maximum operator on these paths degrees Hence the class of the object to classify is the one labeling the leaf corresponding to this path Example 1: In order to illustrate the different notions presented in this paper, we will consider an example in the intrusion detection field where we hle formatted connections corresponding to a TCP/IP dump rows Note that, for the sake of simplicity, each connection is described by only four attributes which are: service, flag, count, wrong fragment The attributes are defined as follows: - We also hle three classes: ; where Normal corresponds to a normal connection while DOS Probing are relative to two categories of attacks N Fig 1 Service http domain u private count P SF flag REJ RSTO wrong fragment count wrong fragment P N D D N P N Example of decision tree in intrusion detection field Assume that the connection to classify is with the possibility distributions given in Table I According to the decision tree (see Figure 1) we have nine paths, then applying the minimum operator on the different degrees relative to each path, we get: TABLE I POSSIBILITY DISTRIBUTIONS ON http 1 SF 1 domain u, 1 REJ private 1 RSTO 1 According to the maximum operator, the most plausible paths are 3 9, thus the connection will be classified as Probing or DOS with a possibility degree 1 It is clear that the use of the maximum operator makes it difficult to choose between the equally plausible paths IV LEXIMIN/LEXIMAX CLASSIFICATION IN POSSIBILISTIC DECISION TREES The min-max combination mode is not satisfactory since it is somewhat cautious which makes the number of cidate classes important especially when the number of missing attributes is important Furthermore, min/max operators are not discriminatory Indeed, we can check that, for any attribute, for any value of such that, replacing by does not change the selected cidate classes This is explained by the fact that if are normalized, then there exists at least one path from the root to a leaf class such that the possibility degree of each node s value in this path is equal to 1 Hence, with min/max combination mode, only paths where possibility degrees of attributes values are equal to 1 are considered One idea to overcome drawbacks of the min/max combination is to extend these two operators by using leximin leximax criteria which are natural extensions of the minimum maximum operators used in the qualitative

setting [8] defined by: Definition 1:!!expliquer be two vectors, let be two permutations of indices such that Then, is said to be leximin-preferred (resp leximaxpreferred) to, denoted by (resp ), if only if there exists such that (resp ) is said to be leximin-equal (resp leximax-equal) to, denoted by (resp ), if only if, Let be the set of all different paths from the root to leaves For each class, we associate a vector containing paths having as a leaf classified in a leximinorder To apply this criterion, all paths should be described by the same attributes already defined in the training set However, since paths are pruned, the idea will be to assign a degree 1 to the missing values The justification of adding the degree 1 can be explained as follows: In some path, a class is obtained without an attribute, this in fact means that can be obtained independently of the value of In other terms, can be obtained from a path composed by the most plausible instance of (namely those having the degree 1 since only normalized distributions are considered) Definition 2: Let be two vectors relative to paths leading to Let be two permutations of indices such that is said to be leximin-leximax preferred to, denoted by, - if there exists such that - or if, (ie is supported by a greater number of paths than ) is said to be leximin-leximax equal to, denoted by, if only if, Definition 3: Let be a set of classes, the class is leximin-leximax preferred iff there is no class, such that The selection mode based on the leximin/leximax operators proceeds in two steps: 1) Establish a total pre-order of all paths using leximin operator Then, select a first set of cidate classes corresponding to leaves of best paths in the total preorder Let be this set of classes 2) Refine by selecting its leximax-preferred class(s) using Definition 3 Example 2: Let us continue the previous example According to the leximin criterion, we get the following total preorder between different paths: Thus which are the classes labeling, respectively In other terms the connection will be classified as a Probing or a DOS attack Then, since Thus, it is possible to have a more precise result the connection will be classified as a Probing attack V EVALUATION OF POSSIBILISTIC DECISION TREES When dealing with an uncertain context, the evaluation of a classifier namely a possibilistic decision tree is not so obvious A Percent of Correct Classification In the classical case, the corresponds to the proportion of the number of well classified objects on the whole number of objects However, as within a possibilistic decision tree, a new object may not be classified in a unique class, it will be necessary to adopt the to the uncertainty pervading classes Thus, the idea is to choose for each object to classify the class having the highest degree of possibility degree If more than one class is obtained, then one of them is chosen romly The obtained class is considered as the class of the testing object Hence, the relative to the whole testing set is computed by making comparison, for each testing instance, between its real class (known by us) the class obtained by the induced tree number of well classified objects number of testing objects where the number of well classified objects is computed as the sum of testing objects for which the class obtained by the possibilistic decision tree (the most plausible class) is the same as their real class B Distance criterion The limitation of the adaptative is that it ignores the order that exists between the different classes that may correspond to the chosen class It only considers the most plausible class So, we propose a criterion allowing to take into account the order of the classes characterizing the object to classify More exactly, we propose to compare the ranking assigned to classes with the real class of the given testing object Such comparison is based on a kind of distance At first, we should define a qualitative possibility distribution assigned to the object as follows: (2)

Assume we hle n classes (,,, ), then: if if does not appear in the order of the classes relative to the object to classify (3) Where represents the decreasing classing of over the other Next, we define the distance criterion for a testing object (where its possibility distribution is ) with respect to its real class denoted as follows: Assume that the real class of the object DOS (D) Using the Equation 4, we get: Then, we get the distance : is the attack where if otherwise This distance verifies the following property: when is close to 2, the classifier is bad, whereas when it falls to 0, it is considered as a good classifier In order to give to this distance a signification more close to the, we propose to make changes on ( it will be denoted ), so as it satisfies the following property: Next, we have to compute the average total distance relative to all the classified testing instances denoted So, we get: classified objects number of classified objects Thus, will be considered as a calibrated on the whole testing set C Example Let s continue with our example where we deal with three classes namely, To classify the connection given in Example 2, we get (according to the induced tree) the following order: Hence, the corresponding possibility distribution (see Equation 2) will be: (4) (5) (6) (7) (8) So, Thus, we get 39% of chance that the induced tree detects the real class of whereas, applying directly the classical PCC on the most plausible class leads directly to an erroneous result Obviously, we can apply this distance criterion on all the testing set using Equation 8 VI CONCLUSION In this paper, we have presented two contributions The first one concerns the classification, using decision trees, of objects characterized by uncertain attribute values where uncertainty is represented in a qualititative possibilistic framework Indeed, to overcome limitations of the stard min-max combination, we have proposed a lexmin/leximax combination mode in the classification phase In the second part, we have proposed a new criterion to judge the efficiency of classifiers under an uncertain context, namely the qualitative possibilistic decision trees This criterion takes into account the total pre-order of classes relative to each testing instance not only the best one as in the classical Percent of Correct Classification A future work will be to introduce a semantic distance to this criterion allowing to adjust the degree of similarity between classes REFERENCES [1] Ben Amor N, Benferhat S, Elouedi Z, Mellouli K: Decision Trees Qualitative Possibilistic Inference: Application to the Intrusion Detection Problem Proceedings of European Conference of Symbolic Quantitative Approaches to Reasoning Uncertainty (ECSQARU 2003), 419-431, 2003 [2] Breiman, L, Friedman, J H, Olshen, R A, Stone, C J: Classification regression trees Monterey, CA, Wadsworth & Brooks, 1984 [3] D Dubois H Prade: Possibility theory: An approach to computerized Processing of uncertainty Plenium Press, New York, 1988 [4] Denoeux T, Skarstein-Bjanger M: Induction of decision trees for partially classified data Proceedings of the IEEE International Conference on Systems, Man, Cybernetics, Nashville, USA, 2923-2928, 2000 [5] Elouedi Z, Mellouli K, Smets P: Belief decision trees: Theoretical foundations International Journal of Approximate Reasoning 28, 91-124, 2001 [6] Hullermeier E, Possibilistic induction in decision-tree learning, ECML 02, 2002 [7] Mitchell, T M: Decision tree learning Chapter 3 of Machine Learning, Co-published by the MIT Press the McGraw-Hill Compagnies, Inc, 1997

[8] Moulin H: Axioms for cooperative decision-making Cambridge University Press, 1988 [9] Quinlan, J R: Induction of decision trees, Machine Learning 1, 1-106, 1986 [10] Quinlan, J R: Probabilistic decision trees, Machine Learning, Vol 3, Chap 5, Morgan Kaufmann, 267-301, 1990 [11] Quinlan, J R: C45: Programs for machine learning Morgan Kaufmann San Mateo Ca, 1993