An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set

Size: px

Start display at page:

Download "An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set"

Jasper Jack Perkins
5 years ago
Views:

1 Chinese Journal of Electronics Vol.24, No.1, Jan An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set QIAN Wenbin 1,2, SHU Wenhao 3, YANG Bingru 2 and ZHANG Changsheng 2 (1. School of Software, Jiangxi Agriculture University, Nanchang , China) (2. Beijing Key Laboratory of Knowledge Engineering of Material Science, Beijing , China) (3. School of Computer and Information Technology, Beijing Jiaotong University, Beijing , China) Abstract Feature selection is a challenging problem in pattern recognition and machine learning. In real-life applications, feature set in the decision systems may vary over time. There are few studies on feature selection with the variation of feature set. This paper focuses on this issue, an incremental feature selection algorithm in dynamic decision systems is developed based on dependency function. The incremental algorithm avoids some recomputations, rather than retrain the dynamic decision system as new one to compute the feature subset from scratch. We firstly employ an incremental manner to update the new dependency function, then we incorporate the calculated dependency function into the incremental feature selection algorithm. Compared with the direct (non-incremental) algorithm, the computational efficiency of the proposed algorithm is improved. The experimental results on different data sets from UCI show that the proposed algorithm is effective and efficient. Key words Rough sets, Feature selection, Attribute reduction, Incremental algorithm, Dynamic data set. I. Introduction In the early eighties, Pawlak [1] introduced the theory of rough sets as an extension of set theory to deal with uncertain and inconsistent information. In rough set theory, two definable subsets called lower and upper approximations are used to describe a crisp subset of a universe. By using two definable subsets, the knowledge hidden in decision systems can be discovered. One of the many successful applications of rough set theory has been to the feature selection, also called attribute reduction [2,3]. The rough set ideology of using only the supplied data and no other information has many benefits in feature selection, where most other methods require supplementary knowledge. The main aim of feature subset selection in rough set theory is to determine a minimal feature subset (also called reduct) from a problem domain while retaining a suitably high accuracy in representing the original features. Feature selection allows to delete some of the irrelevant features and better performance may be achieved by discarding such features. A straightforward way is to exhaustively calculate the quality of feature subsets to find an optimal subset. However, this is not feasible even given a moderate size of candidate features because of the exponential complexity. Thus many heuristic feature selection algorithms have been designed for classification learning. Generally speaking, an efficient feature selection method reflects in two aspects: feature evaluation is used to evaluate the quality of candidate features [4] and search strategies are to find optimal solutions in terms of the used evaluation functions [5]. So far, a number of heuristic feature selection algorithms have employed the dependency function, which is based on lower approximations as an evaluation step in the feature selection process. The feature subset acquired by feature selection-based on dependency function algorithms can induce certain rules [4]. These approaches have been adopted as the certainty that is embodied in the lower approximation is associated with greater importance in scientific analysis. In this paper, we also adopt dependency function as evaluation criterion to feature selection, and the feature selection algorithm is of greedy and forward search, keeping selecting features with high significance until the dependence no longer increases. Many rough set-based approaches to feature selection can be found in Refs.[2, 6 11]. A hybrid feature selection approach based on feature weighting can be found in Ref.[8], the selected feature subset is based on feature ranking and greedy forward selection. An approach to computing the minimal set of features that functionally determine a decision attribute is proposed in Ref.[6], but the proposed algorithm is usually com- Manuscript Received May 2013; Accepted July This work is supported in part by the National Natural Science Foundation of China (No , No ), the Key Project of Ministry of Science and Technology of China (No.2010IM020900), the 2012 Ladder Plan Project of Beijing Key Laboratory of Knowledge Engineering for Materials Science (No.Z ), and Zhejiang Provincial Natural Science Foundation of China (No.LY13F020024).

2 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 129 putationally expensive, especially for dealing with large-scale data sets. Skowron [12] designed a feature selection algorithm by computing disjunctive normal form to find all exact feature subsets of a given system and proved that finding the minimal feature subset of a decision system is NP-hard. From the viewpoint of indiscernibility and discernibility relation, a feature selection algorithm based on hybrid relation is provided in Ref.[7]. From the view of algebra and information entropy, a comparative of three feature selection methods is made [13]. The feature selection algorithms in consistent and inconsistent decision tables are investigated [9]. To accelerate a heuristic process of feature selection, a theoretic framework named positive approximation is proposed [2]. Based on the proposed accelerator, the efficiency of some general feature selection algorithms is improved. In real-world application, the data sets are usually varying dynamically, correspondingly, the feature subset needs updating for knowledge discovery and other related tasks under a dynamic environment [14]. To improve the computational efficiency, some new analytic techniques are highly desirable in practice. To deal with dynamic data sets, there exists some researches on feature selection in an incremental technique. The methods of feature selection based on rough set theory can be classified into three types: one based on the entropy [15,16],one basedonthepositiveregion [17], and another based on the discernibility matrix [18 20]. From the notion of information theory, when multiple objects are added to a decision system, the information-theoretic based feature selection was developed in an incremental manner [15]. In addition, they also developed an incremental feature selection algorithm based on three representative information entropies for decision systems with dynamically-increasing features [16], but the computation of entropy is not computationally costless. Based on positive region, an incremental feature selection algorithm was presented in Ref.[17] when single object varies in the decision system. From the idea of discernibility ability, an incremental feature selection algorithm based on the discernibility matrix was proposed in Ref.[19]. Based on 0-1 integer programming, when multiple objects are added into an information system, an incremental feature selection algorithm is presented [20]. However, the incremental feature selection algorithm for updating feature subset when feature set varies dynamically in decision systems have not yet been developed so far. Therefore, this paper focuses on this issue. The remainder of this paper is organized as follows: Section II reviews the basic concepts of the rough sets. An incremental computation of new dependency function with the variation of feature set is presented in Section III. We develop an incremental feature selection algorithm for computing a new feature subset under the variation of feature set in decision systems in Section IV. In Section V, the performance of the proposed algorithm is evaluated on different UCI data set. This paper ends with conclusions and future works in Section VI. II. Preliminary Knowledge on Rough Sets 1. Basic concepts Basic concepts in rough sets are reviewed in this section. The theory of rough sets begins with the notion of an approximation space, which is a pair U, A, whereu is a finite nonempty set of objects and is called the universe and A is a non-empty, finite set of features, also called attributes in the universe. V a is the value domain of feature a, V = S a A Va, and f is an information function f : U A V. An approximation space is also called an information system. Any subset P of knowledge A defines a binary equivalence (also called indiscernibility) relation IND(P ) on U as follows: IND(P )={(x, y) U U a P, f(x, a) =f(y, a)} If (x, y) IND(P ), then x and y are indiscernible by features from P. The partition of U generated by IND(P )is denoted as U/IND(P )(just as U/P). The equivalence class containing x is denoted as U/IND(P )={[x] P x U} where [x] P is the equivalence class containing x with respect to P. The elements in [x] P are indiscernible or equivalent with respect to P, i.e., [x] P = {y U (x, y) IND(P )}. Given any subset X U, P A, in general, it may not be possible to describe X precisely in <U,A>. One may characterize X by a pair of lower and upper approximations of X in terms of indiscernibility relation IND(P ) defined respectively: P (X) = S {x U [x] P X} and P (X) ={x U [x] P X φ} where P (X) and P (X) are called the lower approximation and upper approximation with respect to P, respectively. The lower approximation P (X) is the union of all elementary sets which are subsets of X, and the upper approximation P (X) is the union of all elementary sets which have a nonempty intersection with X. The tuple P (X), P (X) is the representation of an ordinary set X in the approximation space U, A. The lower approximation can interpreted as the collection of those elements that definitely belong to X, while the upper approximation can interpreted as the collection of those elements that possibly belong to X. The lower approximation is also called positive region, denoted as POS P (X). An information system U, A is called a decision system if the feature set A = C D and C D = φ, wherec is the condition feature set and D is the decision feature set. for any subset P of feature set A, The dependency function between P and D can be defined as γ P (D) = POS P (D) / U, where POS P (D) = P (X i), X i is the ith equivalence class induced by D, and. denotes the cardinality of a set. The γ measure is used by many researchers based on its simple quantitative evaluation of the positive region, and it measures the approximation power of a condition feature set with respect to the decision feature D. In data mining, especially in feature selection, it is important to find the dependence relations between feature sets. 2. Feature selection based on dependency function In rough set theory, the feature selection based on dependency function is defined as follows. Definition 1 Let DS =(U, A = C D) beadecision system, and let B C, B is a selected feature subset of the decision system iff γ B(D) =γ C(D) andγ B(D) >γ B {a} (D) for a B.

3 130 Chinese Journal of Electronics 2015 In this definition, the first term guarantees the feature subset has the same distinguishing ability as the whole set of features; the second one guarantees that all of the features in the selected feature subset are indispensable, which has no superfluous features. To accelerate the process of feature selection, the selection of survival features can be achieved through the comparison of feature significance measures. Definition 2 Let DS =(U, A = C D) beadecision system, for B C and a B. The significance measure of attribute a is defined by sig 1(a, B, D) =γ B(D) γ B {a} (D) Obviously, 0 sig 1(a, B, D) 1. If sig 1(a, B, D) = 0, then the feature a is dispensable; otherwise it is indispensable. From another standpoint, we can also define the significance measure of a feature as follows. Definition 3 Let DS =(U, A = C D) beadecision system, for B C and a/ B. The significance of feature a is defined by sig 2(a, B, D) =γ B {a} (D) γ B(D) This significance measure sig 2(a, B, D) is monotonic. The higher the change in positive region, the more significant the feature is. Therefore, to speed up the process of feature subset selection, we often use it to construct a sequence by sorting the features in order. III. Updating Scheme of Dependency Function Under the Variation of Feature Set In a dependency function-based feature selection algorithm, the key work is to compute dependency function. If a feature set in decision systems varies over time, a naive approach of calculating new dependency function from scratch. To avoid recomputing the dependency function, we will discuss to compute new dependency function in an incremental manner, such that the efficiency is improved. Generally speaking, the variation of feature set in the systems includes two aspects: a new feature set is added and a feature set is deleted. In the following, we give the computations of new dependency function for the two aspects. We firstly present an equivalent computation on calculating dependency function, which will be used in the following proofs. Lemma 1 Let DS = (U, A = C D) be a decision system, for B C and x U, then γ B(D) = x U [x]b /D =1 {x} / U. Proof We prove this lemma directly according to the definition of dependency function. Lemma 2 Let DS = (U, A = C D) beadecision system, for P, Q C and x U, thenu/(p Q) U/P, U/(P Q) U/Q. Proof According to the definition of equivalence classes, denoted by U/(P Q) ={X 1,X 2,...,X m}, U/P = {P 1,P 2,...,P s}. SinceP P Q, for X i =[x] P Q U/(P Q)(1 i m), it holds that X i =[x] P Q = {y U (x, y) IND(P Q)} P j =[x] P = {y U (x, y) IND(P )}(1 j s), obviously, there is U/(P Q) U/P. Similarly, proving U/(P Q) U/Q canbedoneinthesameway.thiscompletes the proof. Given two lemmas above, we will utilize them to proof the incremental computations on new dependency function when an feature set is added into or deletes from a decision system. If a feature set is added into the decision systems, the incremental computation of new dependency function is shown as Theorem 1. Here are some explanations which will be used in Theorem 1, given a decision system DS =(U, A = C D), for P C, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}, if a feature set Q is added into the system, suppose U/(P Q) ={X 1,X 2,...,X k,x k+1 1,X k+1 2,..., X k+1 l k+1,x k+2 1,X k+2 2,...,X k+2 l k+2, X1 m,x2 m,...,xl m m }, where X i(i =1, 2,,k) denotes the equivalence class unchanged, and the changed equivalence class X i(i = k +1,k+2,...,m) is divided into X1,X i 2,...,X i l i i, i.e., X i = l i j=1 Xj. i Theorem 1 Let DS =(U, A = C D) beadecision system, U = {x 1,x 2,...,x z}, forp C, Q C =, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}, if the original positive region of DS is γ P (D), suppose U/(P Q) = {X 1, X 2,...,X k, X k+1 1, X k+1 2,,...,X k+1 l k+1, X k+2 1, X k+2 2,..., X k+2 l k+2,x1 m,x2 m,...,xl m m }, then the new dependency function by adding Q to P is γ P Q(D) = γ P (D) + {x Xj X i j/d i =1} / U (k +1 i m, 1 j l i). Proof According to the definition of dependency function, we firstly proof that POS P Q(D) =POS P (D) {x Xj X i j/d i =1}(k +1 i m, 1 j l i). Suppose there exists x s POS P (D)(1 s z), according to the definition of positive region, it is obvious that [x s] P Y l,where Y l U/D(1 l n), by Lemma 2, there is [x s] P Q [x s] P, it holds that [x s] P Q Y l, thus x s POS P Q(D). If x s / POS P (D), but x s POS P Q(D) =POS P (D) {x Xj X i j/d i = 1}(k +1 i m, 1 j l i), it follows that [x s] P Q Y l U/D(1 l n). According to the definition of equivalence classes, [x s] P Q = {y (x s,y) IND(P Q),y U}, thusx s POS P Q(D). Therefore, there is POS P (D) {x Xj X i j/d i =1}(k +1 i m, 1 j l i) POS P Q(D). On the other hand, x s POS P Q(D)(1 s z), by Lemma 2, it is obvious that POS P (D) POS P Q(D), thus x s POS P (D). If x s / POS P (D), but x s POS P Q(D), it is obvious that [x s] P Q Y l U/D(1 l n), i.e., x s Xj(k i +1 i m, 1 j l i), and Xj/D i = 1, where Xj i is an equivalence class in U/(P Q), thus it holds that POS P Q(D) POS P (D) {x Xj X i j/d i = 1}(k +1 i m, 1 j l i). As above two aspects analyzed, there is POS P Q(D) =POS P (D) {x Xj X i j/d i = 1}(k +1 i m, 1 j l i), by the definition of dependency function, obviously, it holds that γ P Q(D) =γ P (D) + {x Xj X i j/d i =1} / U (k +1 i m, 1 j l i). This completes the proof. If a feature set is deleted from the decision systems, the incremental computation of new dependency function is given as Theorem 2. Here are some explanations which will be used in Theorem 2. Given a decision system DS = (U, A = C D), P C, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}. Assume Q is the deleted feature

4 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 131 set, and U/(P Q) ={X 1,X 2,...,X t,x t+1,..., X m }, where X k(1 k t) denotes the unchanged equivalence class, and X k (t +1 k m ) is combined by the equivalence classes in U/P, i.e., X k = X i... X j(1 i, j m). Theorem 2 Let DS = (U, A = C D) be a decision system, U = {x 1,x 2,...,x z}, Q P C, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}, if the original dependency function of DS is γ P (D), suppose U/(P Q) = {X 1,X 2,...,X t,x t+1,..., X m }, then the new dependency function by deleting Q from P is γ P Q (D) =γ P (D) {x X k X k /D 1} / U (t +1 k m ). Proof According to the definition of dependency function, we firstly proof that POS P Q(D) = POS P (D) {x X k X k /D 1}(t +1 k m ). Because POS P (D) = {x U [x] P Y l }, Y l U/D(1 l n), and since P Q P, by Lemma 2, it holds that [x] P [x] P Q, obviously, there is POS P Q(D) POS P (D). In addition, it is obvious that {x U [x] P Q Y l } {x U [x] P Q Y l } = {x U [x] P Y l }, thus it holds that POS P Q(D) =POS P (D) {x U [x] P Q/D 1}. Since [x] P [x] P Q, it follows that in U/P there exists some equivalence classes X i = [x i] P = {y U (x i,y) IND(P )} and X j =[x j] P = {y U (x j,y) IND(P )}, such that X k = {y U (x i,y) IND(P Q)}, i.e., X i... X j = X k (t +1 k m ), where X k U/(P Q). Therefore, there is POS P Q(D) =POS P (D) {x X k X k /D 1}, bythe definition of dependency function, it holds that γ P Q(D) = γ P (D) {x X k X k /D 1} / U (t +1 k m ). This completes the proof. IV. Feature Selection in Decision Systems with the Variation of Feature Set When a feature set varies dynamically in decision systems, a direct approach is to retrain such systems from scratch to acquire new feature subset. Algorithm 1 (denoted by algorithm DGFS) is a direct greedy feature selection algorithm, it starts off with an empty set, then adds those indispensable feature into the feature subset one by one according to the significance measure dependency function, in each round we select the most significant feature, until the dependency of the selected feature subset is the same as that of the full feature set. In order to improve the computational efficiency, we develop an incremental feature selection algorithm for feature subset selection, which avoid a large of recomputation. Suppose P is an original feature subset, we firstly compute the new dependency function in an incremental manner, then we judge whether the original feature subset is the candidate feature subset, if the new dependency function under P is equal to that under the whole feature set, then P is also the new feature subset; otherwise, a new feature subset is computed originated from the feature subset P, features with highest significance are selected from C P and added to the feature subset gradually. Finally, a redundancy-removing step is executed to delete some redundant features from the obtained feature subset, so as to guarantee that the obtained feature subset with a certainty. Algorithm 1 A Direct greedy feature selection algorithm (DGFS) with the variation of feature set in decision systems Input A decision system DS =(U, A = C D), original feature subset Red on U, and the adding feature set C ad, or the deleting feature set C de, where C ad C = and C de C; Output A new feature subset Red. Begin 1) Initialize C C C ad or C C C de, Red ; 2) Compute γ C (D); 3) For i =1to C do 4) compute sig 1(c i,c,d); 5) if sig 1(c i,c,d) > 0, then Red Red {c i}; 6) End for 7) While (γ Red (D) γ C (D)) do 8) for c C Red, compute sig 2(c, Red,D); 9) select Red Red {c j}, and compute γ Red (D), where sig 2(c j,red,d)=max{sig 2(c, Red,D)}; 10) End while 11) Return Red. End Algorithm 2 An Iincremental feature selection algorithm (IFSA) in decision systems with the variation of feature set Input A decision system DS =(U, A = C D), the original feature subset Red, the original positive region γ C(D), and the adding feature set C ad or the deleting feature set C de, where C ad C =, C de C; Output A new feature subset Red. Begin 1) Initialize P Red; 2) If a feature set C ad is added into the system DS; 3) let C C C ad ; 4) compute the equivalence classes U/C and γ C (D); //According to Theorem 1 5) for i =1to C ad do 6) compute sig 1(c i,c ad,d); 7) if sig 1(c i,c ad,d) > 0, then P P {c i}; 8) end for 9) if γ P (D) =γ C (D), turn to Step 25; else turn to Step 16; 10) End if 11) If an feature set C de are deleted from the system DS; 12) let C C C de ; 13) if C de P =, turntostep25;elsep P C de andturntostep14; 14) compute the equivalence classes U/C and γ C (D); //According to Theorem 2 15) End if 16) For c C P, construct a descending sequence by sig 2(c, P, D), and record the result by {c 1,c 2,..., c C P }; 17) While (γ P (D) γ C (D)) do 18) for j =1to C P do 19) select P P {c j} and compute γ P (D); 20) End while 21) For each c j P do 22) compute sig 1(c j,p,d); 23) if sig 1(c j,p,d)=0,thenp P {c j}; 24) End for 25) Red P,returnRed ; End

5 132 Chinese Journal of Electronics 2015 In Algorithm 2, when a new feature set C ad is added into the system, the time complexity of computing new dependency function γ U C (D) iso( U C C ad + U ), and when a feature set C de is deleted from the decision system, the time complexity of computing new dependency function γ U C (D) is O( U C C de + U ).ThetimecomplexityofSteps5)- 8) is O( C ad 2 U ), the time complexity of Steps 16)-20) is O( C P 2 U + C P P U ), the time complexity of Steps 21)-24) is O( P 2 U ). However, when a new feature set is added into the system, the time complexity of algorithm DGAR is O( C C ad 2 U 2 ); and when a feature is deleted from the system, the time complexity is O( C C de 2 U 2 ). Compare with the time complexities of two algorithms, we can easily see that algorithm IFSA is more efficient than DGAR, especially for large decision systems, there is C U. V. Experimental Analysis The objective of the following experiments is to test the efficiency of the proposed algorithm, data sets utilized in the experiments are outlined in Table 1, which are downloaded from UCI [21]. Here, for data set Mushroom, we delete the objects with missing feature values. All the experiments are conducted on a PC with Windows XP, Intel Core2, CPU E7400 and 2GB memory. Algorithms are coded in C++ and the software being used is Microsoft Visual The variation of feature set includes two aspects: a new feature set is added and a feature set is deleted. In what Table 1. The detailed information of the data sets ID Data sets Objects No. of Features Classes 1 Lymphograhy Vote Vehicle Car Satimage Mushroom follows, we illustrate the effectiveness and efficiency of algorithm IFSA from the two aspects. A comparative study of algorithm DGAR and algorithm IFSA in terms of feature reduct size and runtimes is executed on six data sets. For convenience to distinguish the computational time, the original features and adding features of the six data sets are shown in Table 2. By the same way, the original features and deleting features of the six data sets are shown in Table 3. When adding these features into the data sets or deleting the features from the decision systems, the new feature subset can be computed by algorithms DGAR and IFSA, respectively. The experimental results in terms of feature reduct size and runtimes taken to find a new feature subset are shown as Table 2 and Table 3, respectively. Table 2. Comparison of reduct size and runtimes in six data sets when adding features ID Data sets Original features Adding features Reduct size Runtimes(/s) DGAR IFSA DGAR IFSA 1 Lymphograhy {c 7,c 8,...,c 18 } {c 1,c 2,...,c 6 } Vote {c 5,c 6,...,c 16 } {c 1,c 2,...,c 4 } Vehicle {c 7,c 8,...,c 18 } {c 1,c 2,...,c 6 } Car {c 3,c 4,...,c 6 } {c 1,c 2 } Satimage {c 12,c 13,...,c 36 } {c 1,c 2,...,c 11 } Mushroom {c 8,c 9,...,c 22 } {c 1,c 2,...,c 7 } Table 3. Comparison of reduct size and runtimes in six data sets when deleting features ID Data sets Original features Deleting features Reduct size Runtimes(/s) DGAR IFSA DGAR IFSA 1 Lymphograhy {c 1,c 2,...,c 18 } {c 13,c 14,...,c 18 } Vote {c 1,c 2,...,c 16 } {c 12,c 13,...,c 16 } Vehicle {c 1,c 2,...,c 18 } {c 13,c 14,...,c 18 } Car {c 1,c 2,...,c 6 } {c 5,c 6 } Satimage {c 1,c 2,...,c 36 } {c 25,c 26,...,c 36 } Mushroom {c 1,c 2,...,c 22 } {c 16,c 17,...,c 22 } From the experimental results shown in Table 2 and Table 3, we know that the sizes of feature subset obtained by algorithm IFSA are equal to that of algorithm DGAR at most cases, which shows the the effectiveness of our proposed algorithm IFSA. Sometimes, there are some smaller feature subsets selected by IFSA, such as the data set Vehicle. The main reason is that a redundancy-removing step of algorithm IFSA in the last, some redundant features are deleted from the obtained feature subset. From the runtimes of the two algorithms, it is easy to see that both of the runtimes increase with the increasing size of data sets, however, the runtime of algorithm IFSA is much less than that of algorithm DGAR, the reason should be attributed to the fact that IFSA avoids recalculation, just carry out the computations of new dependency function and feature subset selection in an incremental manner, some previous results are utilized, such that the efficiency is improved. Therefore, algorithm IFSA is more efficient than DGAR to find new feature subset for the variation of feature set in decision systems. VI. Conclusion and Future Work Feature selection in a dynamic environment is a challenging issue in pattern recognition and machine learning. In this paper, we developed incremental methods to feature selection for the variation of feature set in decision systems. The incremental manner is firstly employed to efficiently update the dependency function. Then we incorporated the calculated de-

An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 133 pendency function into the computation of feature subset selection, the corresponding feature

Finally, we carried out extensive experiments to test the effectiveness of the proposed algorithm.

Our future work is to study how to design efficient feature selection algorithms based on other generalized rough set models.

Skowron, Rough sets and boolean reasoning, Information Sciences, Vol.177, pp.41 73, 2007. [2] Y.H. Qian, J.Y. Liang, W. Pedrycz, et al.

6 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 133 pendency function into the computation of feature subset selection, the corresponding feature selection algorithm in terms of a feature set is added into or deleted from the decision system has been designed. Finally, we carried out extensive experiments to test the effectiveness of the proposed algorithm. The experimental results demonstrated that the incremental algorithm can effectively reduce computational time to compute a new feature subset compared with the direct algorithm. Our future work is to study how to design efficient feature selection algorithms based on other generalized rough set models. And the incremental feature selection algorithm in terms of the variation of feature values in decision systems may further be investigated. References [1] Z. Pawlak and A. Skowron, Rough sets and boolean reasoning, Information Sciences, Vol.177, pp.41 73, [2] Y.H. Qian, J.Y. Liang, W. Pedrycz, et al., Positive approximation: An accelerator for attribute reduction in rough set theory, Artificial Intelligence, Vol.174, pp , [3] R.W. Swiniarski and A. Skowron, Rough set methods in feature selection and recognition, Pattern Recognition Letters, Vol.24, pp , [4] D. Yamaguchi, Attribute dependency functions considering data efficiency, International Journal of Approximate Reasoning, Vol.51, No.1, pp.89 98, [5] S. Nakariyakul and D.P. Casasent, An improvement on floating search algorithms for feature subset selection, Pattern Recognition, Vol.42, pp.9, pp , [6] M. Kryszkiewicz and P. Lasek, FUN: Fast discovery of minimal sets of attributes functionally determining a decision attribute, Transaction on Rough Sets, Vol.9, pp.76 95, [7] J. Qian, D.Q. Miao, Z.H. Zhang, et al., Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation, International Journal of Approximating Reasoning, Vol.50, No.1, pp , [8] A. Radaideh, Q. Sulaiman and M. Selamat, Feature selection by ordered rough set based feature weighting, Database and Expert Systems Applications, Springer, Berlin, Germany, pp , [9] D.Q. Miao, Y. Zhao, Y.Y. Yao, et al., Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model, Information Sciences, Vol.179, pp , [10] J. Zhong, Q.G. Sun and X. Li, A novel feature selection method based on probability latent semantic analysis for chinese text classification, Chinese Journal of Electronics, Vol.20, No.2, pp , [11] J.B. Li, L.J. Yu and S.H. Sun, Refined kernel principal component analysis based feature extraction, Chinese Journal of Electronics, Vol.20, No.3, pp , [12] A.S. kowron and C. Rauszer, The Discernibility matrics and functions in information systems, Intelligent Decision Support, Springer, Netherlands, pp , [13] Z.Y. Xu, B.R. Yang and W. Song, Comparative study of different attribute reduction based on decision table, Chinese Journal of Electronics, Vol.15, No.4, pp , [14] R.L. Lang, Z.P. Xu and F. Gao, A knowledge acquisition method for fault diagnosis of airborne equipments based on support vector regression machine, Chinese Journal of Electronics, Vol.22, No.2, pp , [15] J.Y. Liang, F. Wang, C.Y. Dang, et al., A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering, Vol.26, No.2, pp , [16] F. Wang, J.Y. Liang and Y.H. Qian, Attribute reduction: A dimension incremental strategy, Knowledge-Based Systems, Vol.39, pp , [17] F. Hu, G.Y. Wang, H. Huang, et al., Incremental attribute reduction based on elementary sets, Proc. of the 10th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Regina, Canada, pp , [18] Y.Y. Yao and Y. Zhao, Discernibility matrix simplification for constructing attribute reducts, Information Sciences, Vol. 179, No.5, pp , [19] M. Yang, An incremental updating algorithm for attribute reduction based on improved discernibility matrix, Chinese Journal of Computers, Vol.30, No.5, pp , (in Chinese) [20] Y.T. Xu, L.S. Wang and R.Y. Zhang, A dynamic attribute reduction algorithm based on 0-1 integer programming, Knowledge-Based Systems, Vol.24, pp , [21] A. Asuncion and D.J. Newman, UCI machine learning repository, available at /ml/datasets.html, QIAN Wenbin was born in He received the Ph.D. degree in computer science from University of Science and Technology Beijing, in He is currently a lecturer with the software school of Jiangxi Agriculture University, China. His research interests include data mining, rough sets and machine learning. ( qianwenbin1027@126.com) SHU Wenhao was born in She is currently a Ph.D. candidate of the school of computer and information technology at Beijing Jiaotong University. Her research interests include granular computing, design and analysis of algorithm, and data mining. YANG Bingru was born in He is a lifetime chief professor and a Ph.D. supervisor in the school of computer and communication engineering and the dean of Institute of Knowledge Engineering at University of Science and Technology Beijing. His research works focus on the fields of knowledge discovery, knowledge engineering and intelligence system. He serves on the editorial board of several journals. ZHANG Changsheng was born in He is currently a Ph.D. candidate of the school of computer and communication engineering at University of Science and Technology Beijing. He is also a lecturer in Wenzhou University. His research interests include formal concept lattice and data mining.

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms