International Journal of Approximate Reasoning

Size: px

Start display at page:

Download "International Journal of Approximate Reasoning"

Melvin Shepherd
5 years ago
Views:

International Journal of Approximate Reasoning 55 (2014) 156 166 Contents lists available at SciVerse ScienceDirect International Journal of Approximate Reasoning journal homepage:www.elsevier.

University of Science and Technology, Nanjing 210094, China b School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China c State Key Laboratory for Novel

1 International Journal of Approximate Reasoning 55 (2014) Contents lists available at SciVerse ScienceDirect International Journal of Approximate Reasoning journal homepage: On an optimization representation of decision-theoretic rough set model Xiuyi Jia a,, Zhenmin Tang a, Wenhe Liao b, Lin Shang c, a School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing , China b School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing , China c State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China ARTICLE INFO Article history: Availableonline14March2013 Keywords: Optimization representation Attribute reduction Parameters learning Decision-theoretic rough set model ABSTRACT Decision-theoretic rough set model can derive several probabilistic rough set models by providing proper cost functions. Learning cost functions from data automatically is the key to improving the applicability of decision-theoretic rough set model. Many region-related attribute reductions are not appropriate for probabilistic rough set models as the monotonic property of regions does not always hold. In this paper, we propose an optimization representation of decision-theoretic rough set model. An optimization problem is proposed by considering the minimization of the decision cost. Two significant inferences can be drawn from the solution of the optimization problem. Firstly, cost functions and thresholds used in decision-theoretic rough set model can be learned from the given data automatically. An adaptive learning algorithm and a genetic algorithm are designed. Secondly, a minimum cost attribute reduction can be defined. The attribute reduction is interpreted as finding the minimal attribute set to make the decision cost minimum. A heuristic approach and a particle swarm optimization approach are also proposed. The optimization representation can bring some new insights into the research on decision-theoretic rough set model Elsevier Inc. All rights reserved. 1. Introduction As a kind of probabilistic rough set model, decision-theoretic rough set model (DTRS) [35 38,44] can derive current several probabilistic rough set models when proper cost functions are used, such as 0.5 probabilistic rough set model [25,27], variable precision rough set model [53] and Bayesian rough set models [30,42]. For decision-theoretic rough set model, its an important contribution to the rough set theory is that it provides a theoretic framework for calculating the thresholds required in probabilistic rough set models. Current studies on decision-theoretic rough set model can be divided into two groups. One group concentrated on the intension and the extension of the model. Yao [37,38] investigated how to derive other probabilistic rough set models from decision-theoretic rough set model. Lingras et al. [15], Liu et al.[17 20] and Zhou [51] studied the multiple-category decisiontheoretic rough set model from different viewpoint, respectively. Zhou and Li [52,12] proposed a multi-view decision model based on decision-theoretic rough set model. Users could make optimistic decision, pessimistic decision, and equable decision by adopting different values on the costs. For the attribute reduction in decision-theoretic rough set model, Yao and Zhao [39] and Zhao et al. [48] defined a general attribute reduction and analyzed several evaluation criteria. Li et al. [11] also made a further investigation on the monotonicity property of attribute reduction in decision-theoretic rough set model. Yao and Zhou [42] introduced a Naive Bayesian decision-theoretic rough set model by using Bayes s theorem to estimate the condition probabilities of objects. Yao and his colleagues [1,3] proposed a game-theoretic rough set model by implying game theory into decision-theoretic rough set model. Qian et al. [29] studied the multigranulation decision-theoretic rough set models. Corresponding authors. addresses: jiaxy@njust.edu.cn (X. Jia), tzm.cs@njust.edu.cn (Z. Tang), cnwho@njust.edu.cn (W. Liao), shanglin@nju.edu.cn (L. Shang) X/$ - see front matter 2013 Elsevier Inc. All rights reserved.

2 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) The other group concentrated on the application of decision-theoretic rough set model. Li et al. [13] proposed an instancecentric hierarchical classification framework based on decision-theoretic rough set model, and applied it to the text classification problem. Both Lingras et al. [16] and Yu et al. [45,46] applied decision-theoretic rough set model into the clustering problem. As filtering spam is a typical three-way decision problem, many authors tried to solve it by adopting decisiontheoretic rough set model. Zhao et al. [49] introduced decision-theoretic rough set model to filtering spam problem first, and three-way decisions corresponds to three kinds of s. Zhou et al. [50] proposed a practicalapproach to thefiltering problem by combining Naive Baysian classifier and decision-theoretic rough set model. Jia et al. [8] integrated several classifiers into the three-way decisions framework and studied the efficiency of three-way decisions approach to filtering spam . Based on Bayesian decision procedure, decision-theoretic rough set model provides systematic methods for deriving the required thresholds on probabilities for defining the three regions: positive region, boundary region and negative region. For the semantics interpretation of the three regions, Yao [40,41] proposed a three-way decisions framework which consists of positive, boundary and negative rules. In decision-theoretic rough set model, all decisions are made on the basis of minimizing expected cost. The expected cost, which also be called decision cost, is a kind of classification cost and it is a core concept in decision-theoretic rough set model. In this paper, we propose an optimization representation of decision-theoretic rough set model. An optimization problem can be constructed with the objective of minimizing the decision cost. We can deal with two problems at least by solving the optimization problem, one is that we can learn the thresholds and proper cost functions from given data without any preliminary knowledge, and the other is that we can define a new attribute reduction. The attribute reduction can be interpreted as finding the minimal attribute set to make the whole decision cost minimum, which is more intuitive and reasonable. With proper cost functions, we can derive different thresholds and get the corresponding probabilistic rough set models. The cost functions play an important role in decision-theoretic rough set model. In general, the cost functions are given by experts, but it weakens the applicability of decision-theoretic rough set model under the situation of lack of preliminary knowledge. In current researches, few contributions are on learning cost functions from data. Based on game theory, Herbert and Yao [4] proposed an approach to governing the modification of cost functions in order to improve some measures. Users need to provide some measures first and define an acceptable levels of tolerance to stop the repeating procedure. Compared to their method, our method does not need users participation, and it is automatic and easy to implement. As to the non-monotonic property of the regions in decision-theoretic rough set model, interpretation difficulties exist in those attribute reductions which are defined on the basis of preserving specific regions [48]. The minimum cost attribute reduction defined in this paper does not concentrate on preserving any region. Instead, the goal of the reduction is to help users make better decisions, which means less decision cost. The rest of the paper is organized as follows. In Section 2, we review the main ideas of decision theoretic rough set model. In Section 3, we give a detailed explanation of the optimization representation, and by solving an optimization problem, we can learn the thresholds and the cost functions from data. An adaptive learning algorithm and a genetic approach are proposed. We also define a new attribute reduction and design two feasible approaches. Section 4 gives experimental results and discusses some remarks on the optimization representation. Section 5 concludes. 2. Basic notions of decision-theoretic rough set model In this section, we present some basic definitions of decision-theoretic rough set model [41]. Definition 1. A decision table is the following tuple: S = (U, At = C {D}, {V a a At}, {I a a At}), (1) where U is a finite nonempty set of objects, At is a finite nonempty set of attributes, C is a set of condition attributes describing the objects, and D is a decision attribute that indicates the classes of objects. V a is a nonempty set of values of a At, and I a : U V a is an information function that maps an object in U to exactly one value in V a. In the decision table, an object x is described by its equivalence class under a set of attributes A At: [x] A ={y U a A(I a (x) = I a (y))}. Letπ D ={D 1, D 2,...,D m } be a partition of the universe U defined by the decision attribute D. Let ={ω 1,...,ω s } be a finite set of s states and let A ={a 1,...,a m } be a finite set of m possible actions. Let λ(a i ω j ) denote the cost, for taking action a i when the state is ω j.letp(ω j x) be the conditional probability of an object x being in state ω j, suppose action a i is taken. The expected cost associated with taking action a i is given by: s R(a i x) = λ(a i ω j ) p(ω j x). j=1 (2) In decision-theoretic rough set model, the set of states ={X, X c }, indicating that an object is in a decision class X and not in X, respectively. The probabilities for these two complement states can be denoted as p(x [x]) = X [x] [x]

3 158 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) and p(x c [x] = 1 p(x [x])). With respect to the three regions, the set of actions with respect to a state is given by A ={a P, a B, a N }, where a P, a B, and a N represent the three actions in classifying an object x, namely, deciding x POS(X), deciding x BND(X), and deciding x NEG(X), respectively. Let λ PP, λ BP and λ NP denote the costs incurred for taking actions a P, a B and a N, respectively, when an object belongs to X, and λ PN, λ BN and λ NN denote the costs incurred for taking the same actions when the object does not belong to X. Given the cost functions, the expected costs associated with taking different actions for objects in [x] can be expressed as: R P = R(a P [x]) = λ PP p(x [x]) + λ PN p(x c [x]), R B = R(a B [x]) = λ BP p(x [x]) + λ BN p(x c [x]), R N = R(a N [x]) = λ NP p(x [x]) + λ NN p(x c [x]). (3) The Bayesian decision procedure suggests the following minimum-cost decision rules: (P) If R P R B and R P R N,decidex POS(X); (B) If R B R P and R B R N,decidex BND(X); (N) If R N R P and R N R B,decidex NEG(X). Consider a special kind of cost functions with λ PP λ BP <λ NP and λ NN λ BN <λ PN. That is, the cost of classifying an object x being in X into the positive region POS(X) is less than or equal to the cost of classifying x into the boundary region BND(X), and both of these costs are strictly less than the cost of classifying x into the negative region NEG(X). The reverse order of costs is used for classifying an object not in X. The decision rules can be reexpressed as: (P) If p(x [x]) α and p(x [x]) γ,decidex POS(X); (B) If p(x [x]) α and p(x [x]) β, decidex BND(X); (N) If p(x [x]) β and p(x [x]) γ,decidex NEG(X), where the parameters α, β, and γ are defined as: (λ PN λ BN ) α = (λ PN λ BN ) + (λ BP λ PP ), (λ BN λ NN ) β = (λ BN λ NN ) + (λ NP λ BP ), (λ PN λ NN ) γ = (λ PN λ NN ) + (λ NP λ PP ). (4) Each rule is defined by two out of the three parameters. The conditions of rule (B) suggest that α>βmay be a reasonable constraint; it will ensure a well-defined boundary region. If we obtain the following condition on the cost functions [41]: (λ NP λ BP ) (λ BN λ NN ) > (λ BP λ PP ) (λ PN λ BN ), (5) then 0 β<γ<α 1. In this case, after tie-breaking, the following simplified rules are obtained: (P1) If p(x [x]) α, decidex POS(X); (B1) If β<p(x [x]) <α,decidex BND(X); (N1) If p(x [x]) β, decidex NEG(X). More different conditions on cost functions were discussed in [41,43]. By using the thresholds, one can divide the universe U into three regions of a decision partition π D based on (α, β): POS (α,β) (π D π A ) ={x U p(d max ([x] A ) [x] A ) α}, BND (α,β) (π D π A ) ={x U β <p(d max ([x] A ) [x] A )<α}, NEG (α,β) (π D π A ) ={x U p(d max ([x] A ) [x] A ) β}, (6) { } where D max ([x] A ) = [x]a D arg max i D i π D [x] A.

4 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Unlike rules in the classical rough set theory, all three types of rules may be uncertain. They represent the levels of tolerance in making incorrect decisions. Each rule brings corresponding cost as to its error rate. Consider the special case where we assuming zero cost for a correct classification, namely, λ PP = λ NN = 0, and let p = p(d max ([x] A ) [x] A ),the decision costs of all rules are easily defined as [41]: positive rule : (1 p) λ PN, boundary rule : p λ BP + (1 p) λ BN, negative rule : p λ NP. (7) For a given decision table, the decision cost of the table is defined as: COST = COST POS + COST BND + COST NEG. (8) The formulation can be expressed as: COST = (1 p i ) λ PN + (p j λ BP + (1 p j ) λ BN ) + p k λ NP, (9) p i α β<p j <α p k β where p i = p(d max ([x i ] A ) [x i ] A ). 3. An optimization representation of decision-theoretic rough set model In Section 2, we have obtained the expression of the cost of a decision table. According to the Bayesian decision principle, it is better to get a smaller value of the cost, so we propose an optimization problem with the objective of minimizing the value of the cost, noted as min COST. (10) Reviewing the construction procedure of decision-theoretic rough set model, we can find that the formulation is actually an optimization representation of decision-theoretic rough set model [7]. The optimization representation seems too easy, but some significance inferences can be drawn from it. In the follows, we will show how the optimization representation works from mathematical and semantical perspectives Mathematical and semantical perspectives analysis From Eq. (9), we know that the decision cost can be formulated by the probabilities of given objects and the cost functions. From a mathematical perspective, we can obtain at least two exciting results through solving the optimization problem. Firstly, when the probabilities of objects are known, for example, learned from other classifiers, the decision cost is related to the cost functions only, then we can calculate the cost functions from the optimization formulation. Secondly, when the cost functions are provided by experts first, the decision cost is related to the probabilities of objects, then we can find the corresponding probabilities which make the decision cost minimum. The mathematics perspective analysis shows us the feasibility of obtaining results from the optimization representation. From semantics perspective, we will show the reasonability of the obtained results. For the first result, the cost functions can be replaced by required thresholds, which will be explained in detail in the following section, then the result shows how to learn cost functions or required thresholds from data without any preliminary knowledge. For the second result, the probabilities are usually computed on the basis of attribute reduction in the rough set theory, then the result prompt us to define a new attribute reduction, and the attribute reduction can be interpreted as finding the minimal attribute set to make the whole decision costs minimum Learning required thresholds and cost functions from data Based on Eqs. (9) and (10), we can construct an optimization problem: min α,β,γ s.t. (1 p i ) λ PN + (p j λ BP + (1 p j ) λ BN ) + p k λ NP, p i α β<p j <α p k β 0 <β<γ <α<1. (11) In Eq. (4), three thresholds (α,β,γ)are presented by six cost functions. Assume λ PP = λ NN = 0, which means making right decisions does not bring any cost. Now we can present the rest four cost functions by the three thresholds, reversely.

5 160 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Table 1 An example of replacement step in Alcofa when p i = 0.9andp i = 0.2. Current (α,γ,β) (α,γ,β ) = (0.8, 0.5, 0.3) p i = 0.9 p i = 0.2 Replacement of α (0.9, 0.5, 0.3) (0.2, (1 δ), 0.2 (1 δ)) 2 Replacement of γ (0.9 (1 + δ), 0.9, 0.3) (0.8, 0.2, 0.2 (1 δ)) Replacement of β (0.9 (1 + δ), 0.9 (1+δ)+0.9, 0.9) 2 (0.8, 0.5, 0.2) λ PN = λ PN ; λ NP = 1 γ λ PN ; γ β (α γ) λ BN = γ (α β) λ PN; λ BP = (1 α) (γ β) γ (α β) λ PN. (12) Considered with Eq. (12) and scaled λ PN to 1, the optimization problem in Eq. (11) can be reexpressed as: min α,β,γ s.t. (1 p i ) + ( p j p i α β<p j <α 0 <β<γ <α<1. (1 α) (γ β) γ (α β) + (1 p j ) ) β (α γ) + p k 1 γ, γ (α β) p k β γ (13) We can obtain the three thresholds and all cost functions through solving the optimization problem now. It is not easy to get the optimum result as the search space for α, β, γ is (0, 1) and all three values are continuous. Instead, we will propose two approaches to computing the approximate result. One is an adaptive learning approach and the other is a genetic approach An adaptive learning algorithm First, we propose an Adaptive Learning COst Functions Algorithm (Alcofa), which is a kind of heuristic approach, to obtain an approximate result. We assume the search space is the set of probabilities of all objects, then the values for α, β, γ are restricted to a finite set. The basic idea of Alcofa is explained as follows. The required thresholds are related to three special objects, then the values of thresholds are equal to the three special objects probabilities. The goal of this algorithm is to find the corresponding three probability values. Assume the current thresholds (α,β,γ)are learned from objects X ={x 1,...,x i 1 } in the training set, for the next object x i coming from the training set, the probability p i is added to compute the overall cost COST X {xi } based on thresholds (α,β,γ),denotedas Min COST.Thenreplacethethreethresholdsbyp i next in turn to find the new minimal overall cost COST X {x i }.IfCOST X {x i } < Min COST, current thresholds (α,β,γ)are updated to thresholds (α,β,γ);else,(α,β,γ)are unchanged. Same procedures are applied to deal the next object x i+1, the loop is ended until all objects in the training set are finished. The final thresholds are the result for getting the minimum overall cost. The final result is not related to the initialization of the three thresholds as it is an adaptive learning algorithm. We can set any value to them under the restriction of 0 <β<γ <α<1. The key step of the algorithm is the replacement of one threshold by the probability. We will use an example to explain how to run this step. Assume current thresholds learned from X ={x 1,...,x i 1 } are: α = 0.8, γ = 0.5, β = 0.3, and the next object is x i with probability p i.whenx i comes, its value p i will be used three times to replace α, γ, and β, respectively. Table 1 shows the replacement result when p i = 0.9 and p i = 0.2. If we use p i = 0.9 toreplaceα, asγ < p i and β<p i, γ and β are unchanged, the new thresholds are (0.9, 0, 5, 0.3). We should notice some especial situations when the thresholds do not remain β<γ<α anymore, like using p i = 0.9 toreplaceγ.ifwesetp i = 0.9 toreplaceγ and keep α unchanged, the condition γ<α will not be satisfied. To conquer this kind of problem, we also change the α value, and probability value multi a coefficient (1 + δ) is applied in Table 1. Ifwesetp i = 0.2 to replace γ, β will be changed to 0.2 (1 δ). Ifγ has to be changed, γ = α +β. Actually, users can define different changing policies to suit their 2 applications. Figure 1 shows the description of the adaptive learning algorithm [7]. The computational complexity of Alcofa is O(n 2 ). In order to make the computation faster, we can also reduce the size of the search space, for example, let 0.5 <α<0.99 and 0.01 <β<0.5. For the probability of object x i : p i, we can compute it by using the rough set method or get it from other classifiers, e.g. Naive Bayesian classifier, which makes the algorithm more robust and practical. Yao and Zhou [42] alsohaveproposeda Naive Bayesian rough set model by combining Naive Bayesian classifier with decision-theoretic rough set model.

6 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Fig. 1. Adaptive learning cost functions algorithm (Alcofa). Fig. 2. A genetic approach to learning thresholds A genetic approach to learning thresholds Because the learning thresholds problem has been described as an optimization problem, most existed optimization algorithms can be used to solve this problem, such as genetic algorithm [2], ant colony algorithm [5,10], simulated annealing algorithm [6,14], and so on. In this section, we will introduce a genetic approach to this optimization problem briefly. The fitness function is intuitively to be defined as the decision cost: f = COST. (14) The search space is also the set of probabilities of all objects. Each individual is a triple of probability values, in which β is the minimum one and α is the maximum one. The crossover procedure can be designed as the exchange of α value. For example, there are two individuals (α 1,γ 1,β 1 ) and (α 2,γ 2,β 2 ), the children of them will be (α 2,γ 1,β 1 ) and (α 1,γ 2,β 2 ). In the mutation procedure, the individual will probably change its β value. The new β comes from the search space and it also satisfy the condition β<γ<α. Then the genetic approach is described as in Fig. 2. We want to make an additional explanation here that the genetic approach and the following two approaches proposed in this paper are just simple feasible frameworks, users can design their appropriate approaches according to different applications.

7 162 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Table 2 A decision table. C D c 1 c 2 c 3 c 4 c 5 c 6 o d 1 o d 1 o d 2 o d 2 o d 2 o d 3 o d 3 o d 3 o d Attribute reduction based on minimal decision cost Attribute reduction, as an important application in rough set theory, can be interpreted as a process of finding the minimal set of attributes which can preserve one or several criteria unchanged or improve them [21 24,31,33,34,47]. There are two conditions should be considered to define a reduct: jointly sufficient condition and individually necessary condition [26]. In classical rough set model, attribute reduction generally keeps the positive region unchanged, as the region does not decrease with the addition of attributes. In probabilistic rough set models, this kind of definition does not always work A non-monotonicity property in probabilistic models Table 2 is a decision table, which have 9 objects with 3 different decision labels: {d 1, d 2, d 3 }. In Pawlak rough set model, the positive region of this decision table is POS(π D π {c1,c 2,c 5 }) ={o 1, o 3, o 4, o 7 }. We can also obtain the following result easily: POS(π D π {c1,c 2,c 5 }) ={o 1, o 3, o 4, o 7 },POS(π D π {c1,c 2 }) ={o 3 },POS(π D π {c2,c 5 }) ={o 1, o 3, o 4 }, and POS(π D π {c1,c 5 }) = {o 4 }. According to the classical attribute reduction definition, {c 1, c 2, c 5 } is an attribute reduct. Removing an attribute from the condition attribute set will lead to a same or a smaller positive region. The monotonicity property of the positive region holds in Pawlak rough set model. But in probabilistic rough set model, the property does not always hold. In decision-theoretic rough set model, assuming λ PP = 0, λ BP = 1, λ NP = 3, λ PN = 6, λ BN = 3, and λ NN = 0, then we can obtain α = 0.75 and β = 0.6. The positive region of Table 2 is POS(π D π {c1,c 2,c 5 }) ={o 1, o 3, o 4, o 7 }.Nowwe still check the subset of condition attributes {c 1, c 2, c 5 }. It is also easy to obtain that POS(π D π {c1,c 2,c 5 }) ={o 1, o 3, o 4, o 7 }, POS(π D π {c1,c 2 }) ={o 3 },POS(π D π {c1,c 5 }) ={o 4, o 7 }, and POS(π D π {c2,c 5 }) ={o 1, o 2, o 3, o 4, o 6, o 7, o 8 }.Nowwecan not say {c 1, c 2, c 5 } is an attribute reduct of the decision table, as the subset {c 2, c 5 } makes the positive region larger. In probabilistic rough set models, the probabilistic positive region is non-monotonic regarding set inclusion of attributes [48]. The decrease of an attribute can result a decrease or an increase or a constant of a probabilistic region(positive, boundary, or negative region) Minimum cost attribute reduction On the basis of the non-monotonicity property of the regions, we can define positive region preservation attribute reduction, positive region extension attribute reduction, or non-negative region extension attribute reduction. It is not easy to decide which kind of attribute reduction is the best. For different applications, users may prefer different regions. In this section, we define a new attribute reduction which is irrelevant to those regions. Based on attribute set A C, we can represent the cost formulation as: COST A = (1 p i ) λ PN + (p j λ BP + (1 p j ) λ BN ) x i POS (α,β) (π D π A ) x j BND (α,β) (π D π A ) + x k NEG (α,β) (π D π A ) p k λ NP. (15) Assume the cost functions are provided by experts, then the objective of the optimization problem is related to the object probabilities only, which can be computed on the basis of the attribute set. The problem can be described as finding proper attribute set to make the whole decision cost minimum. Based on that, a minimum decision cost attribute reduction is proposed as following: Definition 2. In a decision table S = (U, At = C {D}, {V a }, {I a }), R C is an attribute reduct if and only if (1) R = arg min R C {COST R }, (2) R R, COST R > COST R.

8 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Fig. 3. A heuristic approach to minimum cost attribute reduction. Compared to other definitions of attribute reduction, the main difference is that our definition is irrelevant to the positive region or non-negative region, and the objective of the reduction is to help users make better decisions, which means the overall decision cost is minimum. After proposing the new attribute reduction, we also introduce two approaches to the definition in this paper. One is a heuristic approach and the other is a particle swarm optimization approach Two approaches to minimum cost attribute reduction From the minimum cost attribute reduction definition, we can see that computing the reduct is actually the procedure of solving the optimization problem. It is not easy to obtain the optimal solution in a linear time as the optimization problem is a combinational problem. So, we design a heuristic and a particle swarm optimization (PSO) approaches to the approximate optimal solution in this paper. For the heuristic approach, we employ addition strategy to complete an approximate result in this paper. The result is approximate as it may not satisfy the individually necessary condition. The basic idea of our heuristic approach is constructing a reduct from an empty set, and then consequently adding condition attributes until it becomes an approximate result. Condition attributes are selected by their fitness functions, while the fitness function of attribute c i is defined as: δ i = COST C {c i } COST C COST C. (16) The fitness function represents the significance of an attribute. The value of the fitness function could be a minus one, which means it is an irrelevant attribute. If the value of the fitness function is zero, the attribute is a redundant attribute. In the procedure of adding attributes, if COST R COST C, we will stop the adding procedure and output R as an approximate result. It can be seen as an approximate result of Definition 2. The heuristic approach detail is described in Fig. 3. For particle swarm optimization approach [28,32], the fitness function is defined as: f = COST R + ( ) θ R. C (17) The goal of the fitness function is to find such an attribute subset, which has minimal decision cost and fewer elements. Assume we have m condition attributes, each particle s position is represented as a binary string of length m. Thevalue 1 for each dimension of the particle s position means the corresponding attribute is selected while 0 not selected. Each particle has a position represented by a position vector p i, and a velocity represented by a velocity vector v i. The best position for each particle is denoted as p b i. The best position vector among the swarm so far is p.thejth dimensional values of p b i and p are p b ij and p j. At each time step, each particle updates its velocity and moves to a new position according to the following equations. v ij (t) = w v ij (t 1) + c 1 r 1 (p b ij p ij(t)) + c 2 r 2 (p j (t 1) p ij(t 1)), (18) 1, if ρ< p ij (t) = 0, otherwise. 1 1+exp( v t ij ), (19) Where c 1 is a coefficient of the self-recognition component, c 2 is a coefficient of the social component. r 1 and r 2 are the random numbers in [0, 1]. The variable w is called as the inertia factor. ρ is a random number in [0, 1]. Figure 4 is the framework of particle swarm optimization approach to the attribute reduction.

9 164 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Fig. 4. A PSO approach to minimum cost attribute reduction. Table 3 Brief description of the data sets. Data sets # of objects # of condition attributes Ionosphere Monks Monks Monks Musk version 1(musk) Blood transfusion service center (transfusion)748 5 Wisconsin diagnostic breast cancer (wdbc) Wisconsin prognostic breast cancer (wpbc) Congressional voting records (voting) Experiments and remarks 4.1. Experimental result In this section, we will show the efficiency of the genetic approach to learning thresholds. There are 9 UCI data sets [54] used in our experiments. Each data set has two classes. Information of data sets are summarized in Table 3. The learned result by the genetic approach on the 9 data sets are summarized in Fig. 5. In each sub-figure, the X-axis represents probabilities of all objects and the Y-axis represents number of objects. We use Fig. 5(a) to explain how the learned thresholds work. From the distribution of objects we can see that a lot of objects are mixed up with probabilities between 0.27 and For those objects have higher probabilities than 0.72 or lower probabilities than 0.27, we can classify them into the corresponding classes directly. For those objects in boundary region, making a deferment decision will be a better choice with minimum decision cost. So, α = 0.72 and β = 0.27 learned as the thresholds by the genetic approach is an intuitive and efficient result. About the experimental result on minimum cost attribute reduction, we have explained a lot in our previous work [9] Remarks on optimization representation From the definition of the decision cost, we know that the cost is composed of three parts: cost of positive rules, cost of boundary rules and cost of negative rules. In Eq. (9), the weights of three types of costs are same. In some practical applications, it may not be suitable, for example, a user prefers positive rules and negative rules, which means he wants to make a direct decision. Another user may prefer positive rules and boundary rules. For this kind of situation, we propose a generality of the optimization problem, which will help users get a proper result. COST = ɛ P (1 p i ) λ PN + ɛ B (p j λ BP + (1 p j ) λ BN ) + ɛ N p k λ NP, (20) p i α β<p j <α p k β where ɛ P, ɛ B and ɛ N denote the penalty of each kind of rules, respectively. If we set ɛ P = ɛ N = 1 and ɛ B > 1, the resolution of the optimization problem is prefer to make less boundary rules. More different settings can be discussed in corresponding applications.

10 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) Fig. 5. Learned thresholds by genetic approach on several data sets. In this paper, we introduce two approaches to learning the thresholds and other two approaches to attribute reduction. The purpose of these approaches is to show the feasibility of solving the optimization problem. Many other algorithms can be applied in the optimization problem directly, such as simulated annealing algorithms, ant colony algorithm and other evolutionary algorithms. 5. Conclusion In this paper, we propose an optimization representation of decision-theoretic rough set model. The decision cost is the basis of the model, and all rules are generated based on the minimum decision cost. On the optimization representation, we build an optimization setup in decision-theoretic rough set model with the objective of minimizing the decision cost. Through solving the optimization, we can learn cost functions and thresholds from data without any preliminary knowledge. This is an important attempt on learning cost functions automatically. An adaptive algorithm and a genetic approach are proposed. We also define an attribute reduction based on the optimization presentation. A minimum cost attribute reduction is a process of finding a minimal attribute set which induces minimum costs. A heuristic approach and a particle swarm optimization approach are introduced to show the feasibility of the attribute reduction definition. The most important contribution of this paper is the optimization viewpoint of decision-theoretic rough set model. From this viewpoint, attribute reduction is generalized as an optimization problem and many optimization algorithms can be applied in this problem intuitively. In summary, the optimization representation brings new insights to the decisiontheoretic rough set model or probabilistic rough set models. Acknowledgements This work is supported by the National Natural Science Foundation of China under Grant Nos and References [1] N. Azam, J.T. Yao, Analyzing uncertainties of probabilistic rough setregions with game-theoreticrough sets, International Journal of Approximate Reasoning, this issue. [2] J.H. Dai, Y.X. Li, Heuristic genetic algorithm for minimal reduction decision system based on rough set theory, in: Proceedings of ICMLC2002, 2002, pp. 4 6.

11 166 X. Jia et al. / International Journal of Approximate Reasoning 55 (2014) [3] J.P. Herbert, J.T. Yao, Game-theoretic risk analysis in decision-theoretic rough sets, in: Proceedings of the 3rd International Conference on Rough Sets and Knowledge Technology, LNAI, vol. 5009, 2008, pp [4] J.P. Herbert, J.T. Yao, Learning optimal parameters in decision-theoretic rough sets, in: Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, LNAI, vol. 5589, 2009, pp [5] R. Jensen, Q. Shen, Finding rough set reducts with ant colony optimization, in: Proceedings of the 2003 UK Workshop on Computational Intelligence 2003, pp [6] R. Jensen, Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Transactions on Knowledge and Data Engineering 16 (12) (2004) [7] X.Y. Jia, W.W. Li, L. Shang, J.J. Chen, An optimization viewpoint of decision-theoretic rough set model, in: Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology, LNCS, vol. 6954, 2011, pp [8] X.Y. Jia, K. Zheng, W.W. Li, T.T. Liu, L. Shang, Three-way decisions solution to filter spam an empirical study, in: Proceedings of the 8th International Conference on Rough Sets and Current Trends in Computing, LNAI, vol. 7413, 2012, pp [9] X.Y. Jia,W.H.Liao,Z.M.Tang, L.Shang, Minimum cost attribute reduction in decision-theoretic rough set models,information Sciences 219 (2013) [10] L.J. Ke, Z.R. Feng, Z.G. Ren, An efficient ant colony optimization approach to attribute reduction in rough set theory, Journal of Pattern Recoginition Letters 29 (9) (2008) [11] H.X. Li, X.Z. Zhou, J.B. Zhao, D. Liu, Attribute reductionindecision-theoreticroughsetmodel: afurther investigation, in: Proceedingsofthe 6th International Conference on Rough Sets and Knowledge Technology, LNCS, vol. 6954, 2011, pp [12] H.X. Li, X.Z. Zhou, Risk decision making based on decision-theoretic rough set: a three-way view decision model, International Journal of Computational Intelligence Systems 4 (1) (2011) [13] W. Li, D.Q. Miao, W.L. Wang, N. Zhang, Hierarchical rough decision theoretic framework for text classification, in: Proceedings of ICCI2010, 2010, pp [14] F.T. Lin, C.T. Kao, C.C. Hsu, Applying the genetic approach to simulated annealing in solving some NP-Hard problems, IEEE Transactions on Systems, Man and Cybernetics 23 (6) (1993) [15] P. Lingras, M. Chen, D.Q. Miao, Rough multi-category decision theoretic framework, in: Proceedings of the 3rd International Conference on RoughSetsand Knowledge Technology, LNAI, vol. 5009, 2008, pp [16] P. Lingras, M. Chen, D.Q. Miao, Rough cluster qualtity index based on decision theory, IEEE Transaction on Knowledge and Data Engineering 21 (2009) [17] D. Liu, H.X. Li, X.Z. Zhou, Two decades research on decision-theoretic rough sets, in: Proceedings of ICCI, 2010, pp [18] D. Liu, T.R. Li, D. Ruan, Probabilistic model criteria with decision-theoretic rough sets, Information Sciences 181 (17) (2011) [19] D. Liu, T.R. Li, H.X. Li, A mutliple-category classification approach with decision-theoretic rough sets, Fundamenta Informaticae 115 (2012) [20] D. Liu, T.R. Li, D.C. Liang, Incorporating logistic regression to decision-theoretic rough sets for classifications, International Journal of Approximate Reasoning, this issue. [21] J.S. Mi, W.Z. Wu, W.X. Zhang, Approaches to knowledge reduction based on variable precision rough set model, Information Sciences 159 (2004) [22] D.Q. Miao, Y. Zhao, Y.Y. Yao, H.X. Li, F.F. Xu, Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model, Information Sciences 179 (2009) [23] F. Min, H. He, Y. Qian, W. Zhu, Test-cost-sensitive attribute reduction, Information Sciences 181 (2011) [24] F. Min, W. Zhu, Attribute reduction of data with error ranges and test costs, Information Sciences (2012) [25] Z. Pawlak, S.K.M. Wong, W. Ziarko, Rough sets: probabilistic versus deterministic approach, International Journal of Man-machine Studies 29 (1988) [26] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers., Dordrecht, MA, [27] Z. Pawlak, Rough sets and intelligent data analysis, Information Sciences 147 (2002) [28] R. Poli, Analysis of the publications on the applications of particle swarm optimisation, Journal of Artificial Evolution and Applications 2008 (2008) [29] Y.H. Qian, H. Zhang, Y.L. Sang, J.Y. Liang, Multigranulation decision-theoretic rough sets, International Journal of Approximate Reasoning, this issue. [30] D. Slezak, W. Ziarko, The investigation of the Bayesian rough set model, International Journal of Approximate Reasoning 40 (2005) [31] G.Y. Wang, J. Zhao, J. Wu, A comparative study of algebra viewpoint and information viewpoint in attribute reduction, Foundamenta Informaticae 68 (2005) [32] X.Y. Wang, J. Yang, X.L. Teng, W.J. Xia, R. Jensen, Feature selection based on rough sets and particle swarm optimization, Journal of Pattern Recognition Letters 28 (4) (2007) [33] W.Z. Wu, Attribute reduction based on evidence theory in incomplete decision systems, Information Sciences 178 (2008) [34] J.T. Yao, A ten-year review of granular computing, in: Proceedings of IEEE GrC2007, Sillicon Valley, CA, USA, 2007, pp [35] Y.Y. Yao, S.K.M. Wong, P. Lingras, Adecision-theoreticrough setmodel, in: Proceedingsofthe 5th International SymposiumonMethodologiesfor Intelligent Systems, 1990, pp [36] Y.Y. Yao, S.K.M. Wong, A decision theoretic framework for approximating concepts, International Journal of Man-machine Studies 37 (6) (1992) [37] Y.Y. Yao, Probabilistic approach to rough sets, Expert Systems 20 (2003) [38] Y.Y. Yao, Probabilistic rough set approximations, International Journal of Approximate Reasoning 49 (2008) [39] Y.Y. Yao, Y. Zhao, Attribute reductions in decision-theoretic rough set models, Information Sciences 178 (2008) [40] Y.Y. Yao, Three-way decision: an interpretation of rules in rough set theory, in: Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, LNAI, vol. 5589, 2009, pp [41] Y.Y. Yao, Three-way decisions with probabilitic rough sets, Information Sciences 180 (2010) [42] Y.Y. Yao, B. Zhou, Naive Bayesian rough sets, in: Proceedings of the 5th International Conference on Rough Sets and Knowledge Technology, LNAI, vol. 6401, 2010, pp [43] Y.Y. Yao, The superiority of three-way decisions in probabilistic rough set models, Information Sciences 181 (6) (2011) [44] Y.Y. Yao, An outline of a theory of three-way decisions, in: Proceedings of the 8th International Conference on Rough Sets and Current Trends in Computing, LNAI, vol. 7413, 2012, pp [45] H. Yu, S.S. Chu, D.C. Yang, Autonomous knowledge-oriented clustering using decision-theoretic rough set theory, Fundamenta Informaticae 115 (2012) [46] H. Yu, Z.G. Liu, G.Y. Wang, Anautomaticmethod to determine the numberofclustersusingdecision-theoreticroughset, International Journal of Approximate Reasoning, this issue. [47] W.X. Zhang, J.S. Mi, W.Z. Wu, Approaches to knowledge reductions in inconsistent systems, International Journal of Intelligent Systems 18 (2003) [48] Y. Zhao, S.K.M. Wong, Y.Y. Yao, A note on attribute reduction in the decision-theoretic rough set model, in: Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing, LNAI, vol. 5306, 2008, pp [49] W.Q. Zhao, Y.L. Zhu, W. Gao, Information filtering model based on decision-theoretic rough set theory, Computers Engineering and Applications 43 (7) (2007) (in Chinese). [50] B. Zhou, Y.Y. Yao, J.G. Luo, A three-way decision approach to spam filtering, in: Proceedings of the 23rd Canadian Conference on Artificial Intelligence, LNAI, vol. 6085, 2010, pp [51] B. Zhou, A new formulation of multi-category decision-theoretic rough sets, in: Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology, LNAI, vol. 6954, 2011, pp [52] X.Z. Zhou, H.X. Li, A multi-view decision model based on decision-theoretic rough set, in: Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, LNCS, vol. 5589, 2009, pp [53] W. Ziarko, Variable precision rough set model, Journal of Computer and System Science 46 (1993) [54] UCI Machine Learning Repository, <

Thresholds Determination for Probabilistic Rough Sets with Genetic Algorithms

Thresholds Determination for Probabilistic Rough Sets with Genetic Algorithms Babar Majeed, Nouman Azam, JingTao Yao Department of Computer Science University of Regina {majeed2b,azam200n,jtyao}@cs.uregina.ca