An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set

Size: px
Start display at page:

Download "An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set"

Transcription

1 Chinese Journal of Electronics Vol.24, No.1, Jan An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set QIAN Wenbin 1,2, SHU Wenhao 3, YANG Bingru 2 and ZHANG Changsheng 2 (1. School of Software, Jiangxi Agriculture University, Nanchang , China) (2. Beijing Key Laboratory of Knowledge Engineering of Material Science, Beijing , China) (3. School of Computer and Information Technology, Beijing Jiaotong University, Beijing , China) Abstract Feature selection is a challenging problem in pattern recognition and machine learning. In real-life applications, feature set in the decision systems may vary over time. There are few studies on feature selection with the variation of feature set. This paper focuses on this issue, an incremental feature selection algorithm in dynamic decision systems is developed based on dependency function. The incremental algorithm avoids some recomputations, rather than retrain the dynamic decision system as new one to compute the feature subset from scratch. We firstly employ an incremental manner to update the new dependency function, then we incorporate the calculated dependency function into the incremental feature selection algorithm. Compared with the direct (non-incremental) algorithm, the computational efficiency of the proposed algorithm is improved. The experimental results on different data sets from UCI show that the proposed algorithm is effective and efficient. Key words Rough sets, Feature selection, Attribute reduction, Incremental algorithm, Dynamic data set. I. Introduction In the early eighties, Pawlak [1] introduced the theory of rough sets as an extension of set theory to deal with uncertain and inconsistent information. In rough set theory, two definable subsets called lower and upper approximations are used to describe a crisp subset of a universe. By using two definable subsets, the knowledge hidden in decision systems can be discovered. One of the many successful applications of rough set theory has been to the feature selection, also called attribute reduction [2,3]. The rough set ideology of using only the supplied data and no other information has many benefits in feature selection, where most other methods require supplementary knowledge. The main aim of feature subset selection in rough set theory is to determine a minimal feature subset (also called reduct) from a problem domain while retaining a suitably high accuracy in representing the original features. Feature selection allows to delete some of the irrelevant features and better performance may be achieved by discarding such features. A straightforward way is to exhaustively calculate the quality of feature subsets to find an optimal subset. However, this is not feasible even given a moderate size of candidate features because of the exponential complexity. Thus many heuristic feature selection algorithms have been designed for classification learning. Generally speaking, an efficient feature selection method reflects in two aspects: feature evaluation is used to evaluate the quality of candidate features [4] and search strategies are to find optimal solutions in terms of the used evaluation functions [5]. So far, a number of heuristic feature selection algorithms have employed the dependency function, which is based on lower approximations as an evaluation step in the feature selection process. The feature subset acquired by feature selection-based on dependency function algorithms can induce certain rules [4]. These approaches have been adopted as the certainty that is embodied in the lower approximation is associated with greater importance in scientific analysis. In this paper, we also adopt dependency function as evaluation criterion to feature selection, and the feature selection algorithm is of greedy and forward search, keeping selecting features with high significance until the dependence no longer increases. Many rough set-based approaches to feature selection can be found in Refs.[2, 6 11]. A hybrid feature selection approach based on feature weighting can be found in Ref.[8], the selected feature subset is based on feature ranking and greedy forward selection. An approach to computing the minimal set of features that functionally determine a decision attribute is proposed in Ref.[6], but the proposed algorithm is usually com- Manuscript Received May 2013; Accepted July This work is supported in part by the National Natural Science Foundation of China (No , No ), the Key Project of Ministry of Science and Technology of China (No.2010IM020900), the 2012 Ladder Plan Project of Beijing Key Laboratory of Knowledge Engineering for Materials Science (No.Z ), and Zhejiang Provincial Natural Science Foundation of China (No.LY13F020024).

2 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 129 putationally expensive, especially for dealing with large-scale data sets. Skowron [12] designed a feature selection algorithm by computing disjunctive normal form to find all exact feature subsets of a given system and proved that finding the minimal feature subset of a decision system is NP-hard. From the viewpoint of indiscernibility and discernibility relation, a feature selection algorithm based on hybrid relation is provided in Ref.[7]. From the view of algebra and information entropy, a comparative of three feature selection methods is made [13]. The feature selection algorithms in consistent and inconsistent decision tables are investigated [9]. To accelerate a heuristic process of feature selection, a theoretic framework named positive approximation is proposed [2]. Based on the proposed accelerator, the efficiency of some general feature selection algorithms is improved. In real-world application, the data sets are usually varying dynamically, correspondingly, the feature subset needs updating for knowledge discovery and other related tasks under a dynamic environment [14]. To improve the computational efficiency, some new analytic techniques are highly desirable in practice. To deal with dynamic data sets, there exists some researches on feature selection in an incremental technique. The methods of feature selection based on rough set theory can be classified into three types: one based on the entropy [15,16],one basedonthepositiveregion [17], and another based on the discernibility matrix [18 20]. From the notion of information theory, when multiple objects are added to a decision system, the information-theoretic based feature selection was developed in an incremental manner [15]. In addition, they also developed an incremental feature selection algorithm based on three representative information entropies for decision systems with dynamically-increasing features [16], but the computation of entropy is not computationally costless. Based on positive region, an incremental feature selection algorithm was presented in Ref.[17] when single object varies in the decision system. From the idea of discernibility ability, an incremental feature selection algorithm based on the discernibility matrix was proposed in Ref.[19]. Based on 0-1 integer programming, when multiple objects are added into an information system, an incremental feature selection algorithm is presented [20]. However, the incremental feature selection algorithm for updating feature subset when feature set varies dynamically in decision systems have not yet been developed so far. Therefore, this paper focuses on this issue. The remainder of this paper is organized as follows: Section II reviews the basic concepts of the rough sets. An incremental computation of new dependency function with the variation of feature set is presented in Section III. We develop an incremental feature selection algorithm for computing a new feature subset under the variation of feature set in decision systems in Section IV. In Section V, the performance of the proposed algorithm is evaluated on different UCI data set. This paper ends with conclusions and future works in Section VI. II. Preliminary Knowledge on Rough Sets 1. Basic concepts Basic concepts in rough sets are reviewed in this section. The theory of rough sets begins with the notion of an approximation space, which is a pair U, A, whereu is a finite nonempty set of objects and is called the universe and A is a non-empty, finite set of features, also called attributes in the universe. V a is the value domain of feature a, V = S a A Va, and f is an information function f : U A V. An approximation space is also called an information system. Any subset P of knowledge A defines a binary equivalence (also called indiscernibility) relation IND(P ) on U as follows: IND(P )={(x, y) U U a P, f(x, a) =f(y, a)} If (x, y) IND(P ), then x and y are indiscernible by features from P. The partition of U generated by IND(P )is denoted as U/IND(P )(just as U/P). The equivalence class containing x is denoted as U/IND(P )={[x] P x U} where [x] P is the equivalence class containing x with respect to P. The elements in [x] P are indiscernible or equivalent with respect to P, i.e., [x] P = {y U (x, y) IND(P )}. Given any subset X U, P A, in general, it may not be possible to describe X precisely in <U,A>. One may characterize X by a pair of lower and upper approximations of X in terms of indiscernibility relation IND(P ) defined respectively: P (X) = S {x U [x] P X} and P (X) ={x U [x] P X φ} where P (X) and P (X) are called the lower approximation and upper approximation with respect to P, respectively. The lower approximation P (X) is the union of all elementary sets which are subsets of X, and the upper approximation P (X) is the union of all elementary sets which have a nonempty intersection with X. The tuple P (X), P (X) is the representation of an ordinary set X in the approximation space U, A. The lower approximation can interpreted as the collection of those elements that definitely belong to X, while the upper approximation can interpreted as the collection of those elements that possibly belong to X. The lower approximation is also called positive region, denoted as POS P (X). An information system U, A is called a decision system if the feature set A = C D and C D = φ, wherec is the condition feature set and D is the decision feature set. for any subset P of feature set A, The dependency function between P and D can be defined as γ P (D) = POS P (D) / U, where POS P (D) = P (X i), X i is the ith equivalence class induced by D, and. denotes the cardinality of a set. The γ measure is used by many researchers based on its simple quantitative evaluation of the positive region, and it measures the approximation power of a condition feature set with respect to the decision feature D. In data mining, especially in feature selection, it is important to find the dependence relations between feature sets. 2. Feature selection based on dependency function In rough set theory, the feature selection based on dependency function is defined as follows. Definition 1 Let DS =(U, A = C D) beadecision system, and let B C, B is a selected feature subset of the decision system iff γ B(D) =γ C(D) andγ B(D) >γ B {a} (D) for a B.

3 130 Chinese Journal of Electronics 2015 In this definition, the first term guarantees the feature subset has the same distinguishing ability as the whole set of features; the second one guarantees that all of the features in the selected feature subset are indispensable, which has no superfluous features. To accelerate the process of feature selection, the selection of survival features can be achieved through the comparison of feature significance measures. Definition 2 Let DS =(U, A = C D) beadecision system, for B C and a B. The significance measure of attribute a is defined by sig 1(a, B, D) =γ B(D) γ B {a} (D) Obviously, 0 sig 1(a, B, D) 1. If sig 1(a, B, D) = 0, then the feature a is dispensable; otherwise it is indispensable. From another standpoint, we can also define the significance measure of a feature as follows. Definition 3 Let DS =(U, A = C D) beadecision system, for B C and a/ B. The significance of feature a is defined by sig 2(a, B, D) =γ B {a} (D) γ B(D) This significance measure sig 2(a, B, D) is monotonic. The higher the change in positive region, the more significant the feature is. Therefore, to speed up the process of feature subset selection, we often use it to construct a sequence by sorting the features in order. III. Updating Scheme of Dependency Function Under the Variation of Feature Set In a dependency function-based feature selection algorithm, the key work is to compute dependency function. If a feature set in decision systems varies over time, a naive approach of calculating new dependency function from scratch. To avoid recomputing the dependency function, we will discuss to compute new dependency function in an incremental manner, such that the efficiency is improved. Generally speaking, the variation of feature set in the systems includes two aspects: a new feature set is added and a feature set is deleted. In the following, we give the computations of new dependency function for the two aspects. We firstly present an equivalent computation on calculating dependency function, which will be used in the following proofs. Lemma 1 Let DS = (U, A = C D) be a decision system, for B C and x U, then γ B(D) = x U [x]b /D =1 {x} / U. Proof We prove this lemma directly according to the definition of dependency function. Lemma 2 Let DS = (U, A = C D) beadecision system, for P, Q C and x U, thenu/(p Q) U/P, U/(P Q) U/Q. Proof According to the definition of equivalence classes, denoted by U/(P Q) ={X 1,X 2,...,X m}, U/P = {P 1,P 2,...,P s}. SinceP P Q, for X i =[x] P Q U/(P Q)(1 i m), it holds that X i =[x] P Q = {y U (x, y) IND(P Q)} P j =[x] P = {y U (x, y) IND(P )}(1 j s), obviously, there is U/(P Q) U/P. Similarly, proving U/(P Q) U/Q canbedoneinthesameway.thiscompletes the proof. Given two lemmas above, we will utilize them to proof the incremental computations on new dependency function when an feature set is added into or deletes from a decision system. If a feature set is added into the decision systems, the incremental computation of new dependency function is shown as Theorem 1. Here are some explanations which will be used in Theorem 1, given a decision system DS =(U, A = C D), for P C, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}, if a feature set Q is added into the system, suppose U/(P Q) ={X 1,X 2,...,X k,x k+1 1,X k+1 2,..., X k+1 l k+1,x k+2 1,X k+2 2,...,X k+2 l k+2, X1 m,x2 m,...,xl m m }, where X i(i =1, 2,,k) denotes the equivalence class unchanged, and the changed equivalence class X i(i = k +1,k+2,...,m) is divided into X1,X i 2,...,X i l i i, i.e., X i = l i j=1 Xj. i Theorem 1 Let DS =(U, A = C D) beadecision system, U = {x 1,x 2,...,x z}, forp C, Q C =, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}, if the original positive region of DS is γ P (D), suppose U/(P Q) = {X 1, X 2,...,X k, X k+1 1, X k+1 2,,...,X k+1 l k+1, X k+2 1, X k+2 2,..., X k+2 l k+2,x1 m,x2 m,...,xl m m }, then the new dependency function by adding Q to P is γ P Q(D) = γ P (D) + {x Xj X i j/d i =1} / U (k +1 i m, 1 j l i). Proof According to the definition of dependency function, we firstly proof that POS P Q(D) =POS P (D) {x Xj X i j/d i =1}(k +1 i m, 1 j l i). Suppose there exists x s POS P (D)(1 s z), according to the definition of positive region, it is obvious that [x s] P Y l,where Y l U/D(1 l n), by Lemma 2, there is [x s] P Q [x s] P, it holds that [x s] P Q Y l, thus x s POS P Q(D). If x s / POS P (D), but x s POS P Q(D) =POS P (D) {x Xj X i j/d i = 1}(k +1 i m, 1 j l i), it follows that [x s] P Q Y l U/D(1 l n). According to the definition of equivalence classes, [x s] P Q = {y (x s,y) IND(P Q),y U}, thusx s POS P Q(D). Therefore, there is POS P (D) {x Xj X i j/d i =1}(k +1 i m, 1 j l i) POS P Q(D). On the other hand, x s POS P Q(D)(1 s z), by Lemma 2, it is obvious that POS P (D) POS P Q(D), thus x s POS P (D). If x s / POS P (D), but x s POS P Q(D), it is obvious that [x s] P Q Y l U/D(1 l n), i.e., x s Xj(k i +1 i m, 1 j l i), and Xj/D i = 1, where Xj i is an equivalence class in U/(P Q), thus it holds that POS P Q(D) POS P (D) {x Xj X i j/d i = 1}(k +1 i m, 1 j l i). As above two aspects analyzed, there is POS P Q(D) =POS P (D) {x Xj X i j/d i = 1}(k +1 i m, 1 j l i), by the definition of dependency function, obviously, it holds that γ P Q(D) =γ P (D) + {x Xj X i j/d i =1} / U (k +1 i m, 1 j l i). This completes the proof. If a feature set is deleted from the decision systems, the incremental computation of new dependency function is given as Theorem 2. Here are some explanations which will be used in Theorem 2. Given a decision system DS = (U, A = C D), P C, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}. Assume Q is the deleted feature

4 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 131 set, and U/(P Q) ={X 1,X 2,...,X t,x t+1,..., X m }, where X k(1 k t) denotes the unchanged equivalence class, and X k (t +1 k m ) is combined by the equivalence classes in U/P, i.e., X k = X i... X j(1 i, j m). Theorem 2 Let DS = (U, A = C D) be a decision system, U = {x 1,x 2,...,x z}, Q P C, U/P = {X 1,X 2,...,X m} and U/D = {Y 1,Y 2,...,Y n}, if the original dependency function of DS is γ P (D), suppose U/(P Q) = {X 1,X 2,...,X t,x t+1,..., X m }, then the new dependency function by deleting Q from P is γ P Q (D) =γ P (D) {x X k X k /D 1} / U (t +1 k m ). Proof According to the definition of dependency function, we firstly proof that POS P Q(D) = POS P (D) {x X k X k /D 1}(t +1 k m ). Because POS P (D) = {x U [x] P Y l }, Y l U/D(1 l n), and since P Q P, by Lemma 2, it holds that [x] P [x] P Q, obviously, there is POS P Q(D) POS P (D). In addition, it is obvious that {x U [x] P Q Y l } {x U [x] P Q Y l } = {x U [x] P Y l }, thus it holds that POS P Q(D) =POS P (D) {x U [x] P Q/D 1}. Since [x] P [x] P Q, it follows that in U/P there exists some equivalence classes X i = [x i] P = {y U (x i,y) IND(P )} and X j =[x j] P = {y U (x j,y) IND(P )}, such that X k = {y U (x i,y) IND(P Q)}, i.e., X i... X j = X k (t +1 k m ), where X k U/(P Q). Therefore, there is POS P Q(D) =POS P (D) {x X k X k /D 1}, bythe definition of dependency function, it holds that γ P Q(D) = γ P (D) {x X k X k /D 1} / U (t +1 k m ). This completes the proof. IV. Feature Selection in Decision Systems with the Variation of Feature Set When a feature set varies dynamically in decision systems, a direct approach is to retrain such systems from scratch to acquire new feature subset. Algorithm 1 (denoted by algorithm DGFS) is a direct greedy feature selection algorithm, it starts off with an empty set, then adds those indispensable feature into the feature subset one by one according to the significance measure dependency function, in each round we select the most significant feature, until the dependency of the selected feature subset is the same as that of the full feature set. In order to improve the computational efficiency, we develop an incremental feature selection algorithm for feature subset selection, which avoid a large of recomputation. Suppose P is an original feature subset, we firstly compute the new dependency function in an incremental manner, then we judge whether the original feature subset is the candidate feature subset, if the new dependency function under P is equal to that under the whole feature set, then P is also the new feature subset; otherwise, a new feature subset is computed originated from the feature subset P, features with highest significance are selected from C P and added to the feature subset gradually. Finally, a redundancy-removing step is executed to delete some redundant features from the obtained feature subset, so as to guarantee that the obtained feature subset with a certainty. Algorithm 1 A Direct greedy feature selection algorithm (DGFS) with the variation of feature set in decision systems Input A decision system DS =(U, A = C D), original feature subset Red on U, and the adding feature set C ad, or the deleting feature set C de, where C ad C = and C de C; Output A new feature subset Red. Begin 1) Initialize C C C ad or C C C de, Red ; 2) Compute γ C (D); 3) For i =1to C do 4) compute sig 1(c i,c,d); 5) if sig 1(c i,c,d) > 0, then Red Red {c i}; 6) End for 7) While (γ Red (D) γ C (D)) do 8) for c C Red, compute sig 2(c, Red,D); 9) select Red Red {c j}, and compute γ Red (D), where sig 2(c j,red,d)=max{sig 2(c, Red,D)}; 10) End while 11) Return Red. End Algorithm 2 An Iincremental feature selection algorithm (IFSA) in decision systems with the variation of feature set Input A decision system DS =(U, A = C D), the original feature subset Red, the original positive region γ C(D), and the adding feature set C ad or the deleting feature set C de, where C ad C =, C de C; Output A new feature subset Red. Begin 1) Initialize P Red; 2) If a feature set C ad is added into the system DS; 3) let C C C ad ; 4) compute the equivalence classes U/C and γ C (D); //According to Theorem 1 5) for i =1to C ad do 6) compute sig 1(c i,c ad,d); 7) if sig 1(c i,c ad,d) > 0, then P P {c i}; 8) end for 9) if γ P (D) =γ C (D), turn to Step 25; else turn to Step 16; 10) End if 11) If an feature set C de are deleted from the system DS; 12) let C C C de ; 13) if C de P =, turntostep25;elsep P C de andturntostep14; 14) compute the equivalence classes U/C and γ C (D); //According to Theorem 2 15) End if 16) For c C P, construct a descending sequence by sig 2(c, P, D), and record the result by {c 1,c 2,..., c C P }; 17) While (γ P (D) γ C (D)) do 18) for j =1to C P do 19) select P P {c j} and compute γ P (D); 20) End while 21) For each c j P do 22) compute sig 1(c j,p,d); 23) if sig 1(c j,p,d)=0,thenp P {c j}; 24) End for 25) Red P,returnRed ; End

5 132 Chinese Journal of Electronics 2015 In Algorithm 2, when a new feature set C ad is added into the system, the time complexity of computing new dependency function γ U C (D) iso( U C C ad + U ), and when a feature set C de is deleted from the decision system, the time complexity of computing new dependency function γ U C (D) is O( U C C de + U ).ThetimecomplexityofSteps5)- 8) is O( C ad 2 U ), the time complexity of Steps 16)-20) is O( C P 2 U + C P P U ), the time complexity of Steps 21)-24) is O( P 2 U ). However, when a new feature set is added into the system, the time complexity of algorithm DGAR is O( C C ad 2 U 2 ); and when a feature is deleted from the system, the time complexity is O( C C de 2 U 2 ). Compare with the time complexities of two algorithms, we can easily see that algorithm IFSA is more efficient than DGAR, especially for large decision systems, there is C U. V. Experimental Analysis The objective of the following experiments is to test the efficiency of the proposed algorithm, data sets utilized in the experiments are outlined in Table 1, which are downloaded from UCI [21]. Here, for data set Mushroom, we delete the objects with missing feature values. All the experiments are conducted on a PC with Windows XP, Intel Core2, CPU E7400 and 2GB memory. Algorithms are coded in C++ and the software being used is Microsoft Visual The variation of feature set includes two aspects: a new feature set is added and a feature set is deleted. In what Table 1. The detailed information of the data sets ID Data sets Objects No. of Features Classes 1 Lymphograhy Vote Vehicle Car Satimage Mushroom follows, we illustrate the effectiveness and efficiency of algorithm IFSA from the two aspects. A comparative study of algorithm DGAR and algorithm IFSA in terms of feature reduct size and runtimes is executed on six data sets. For convenience to distinguish the computational time, the original features and adding features of the six data sets are shown in Table 2. By the same way, the original features and deleting features of the six data sets are shown in Table 3. When adding these features into the data sets or deleting the features from the decision systems, the new feature subset can be computed by algorithms DGAR and IFSA, respectively. The experimental results in terms of feature reduct size and runtimes taken to find a new feature subset are shown as Table 2 and Table 3, respectively. Table 2. Comparison of reduct size and runtimes in six data sets when adding features ID Data sets Original features Adding features Reduct size Runtimes(/s) DGAR IFSA DGAR IFSA 1 Lymphograhy {c 7,c 8,...,c 18 } {c 1,c 2,...,c 6 } Vote {c 5,c 6,...,c 16 } {c 1,c 2,...,c 4 } Vehicle {c 7,c 8,...,c 18 } {c 1,c 2,...,c 6 } Car {c 3,c 4,...,c 6 } {c 1,c 2 } Satimage {c 12,c 13,...,c 36 } {c 1,c 2,...,c 11 } Mushroom {c 8,c 9,...,c 22 } {c 1,c 2,...,c 7 } Table 3. Comparison of reduct size and runtimes in six data sets when deleting features ID Data sets Original features Deleting features Reduct size Runtimes(/s) DGAR IFSA DGAR IFSA 1 Lymphograhy {c 1,c 2,...,c 18 } {c 13,c 14,...,c 18 } Vote {c 1,c 2,...,c 16 } {c 12,c 13,...,c 16 } Vehicle {c 1,c 2,...,c 18 } {c 13,c 14,...,c 18 } Car {c 1,c 2,...,c 6 } {c 5,c 6 } Satimage {c 1,c 2,...,c 36 } {c 25,c 26,...,c 36 } Mushroom {c 1,c 2,...,c 22 } {c 16,c 17,...,c 22 } From the experimental results shown in Table 2 and Table 3, we know that the sizes of feature subset obtained by algorithm IFSA are equal to that of algorithm DGAR at most cases, which shows the the effectiveness of our proposed algorithm IFSA. Sometimes, there are some smaller feature subsets selected by IFSA, such as the data set Vehicle. The main reason is that a redundancy-removing step of algorithm IFSA in the last, some redundant features are deleted from the obtained feature subset. From the runtimes of the two algorithms, it is easy to see that both of the runtimes increase with the increasing size of data sets, however, the runtime of algorithm IFSA is much less than that of algorithm DGAR, the reason should be attributed to the fact that IFSA avoids recalculation, just carry out the computations of new dependency function and feature subset selection in an incremental manner, some previous results are utilized, such that the efficiency is improved. Therefore, algorithm IFSA is more efficient than DGAR to find new feature subset for the variation of feature set in decision systems. VI. Conclusion and Future Work Feature selection in a dynamic environment is a challenging issue in pattern recognition and machine learning. In this paper, we developed incremental methods to feature selection for the variation of feature set in decision systems. The incremental manner is firstly employed to efficiently update the dependency function. Then we incorporated the calculated de-

6 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set 133 pendency function into the computation of feature subset selection, the corresponding feature selection algorithm in terms of a feature set is added into or deleted from the decision system has been designed. Finally, we carried out extensive experiments to test the effectiveness of the proposed algorithm. The experimental results demonstrated that the incremental algorithm can effectively reduce computational time to compute a new feature subset compared with the direct algorithm. Our future work is to study how to design efficient feature selection algorithms based on other generalized rough set models. And the incremental feature selection algorithm in terms of the variation of feature values in decision systems may further be investigated. References [1] Z. Pawlak and A. Skowron, Rough sets and boolean reasoning, Information Sciences, Vol.177, pp.41 73, [2] Y.H. Qian, J.Y. Liang, W. Pedrycz, et al., Positive approximation: An accelerator for attribute reduction in rough set theory, Artificial Intelligence, Vol.174, pp , [3] R.W. Swiniarski and A. Skowron, Rough set methods in feature selection and recognition, Pattern Recognition Letters, Vol.24, pp , [4] D. Yamaguchi, Attribute dependency functions considering data efficiency, International Journal of Approximate Reasoning, Vol.51, No.1, pp.89 98, [5] S. Nakariyakul and D.P. Casasent, An improvement on floating search algorithms for feature subset selection, Pattern Recognition, Vol.42, pp.9, pp , [6] M. Kryszkiewicz and P. Lasek, FUN: Fast discovery of minimal sets of attributes functionally determining a decision attribute, Transaction on Rough Sets, Vol.9, pp.76 95, [7] J. Qian, D.Q. Miao, Z.H. Zhang, et al., Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation, International Journal of Approximating Reasoning, Vol.50, No.1, pp , [8] A. Radaideh, Q. Sulaiman and M. Selamat, Feature selection by ordered rough set based feature weighting, Database and Expert Systems Applications, Springer, Berlin, Germany, pp , [9] D.Q. Miao, Y. Zhao, Y.Y. Yao, et al., Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model, Information Sciences, Vol.179, pp , [10] J. Zhong, Q.G. Sun and X. Li, A novel feature selection method based on probability latent semantic analysis for chinese text classification, Chinese Journal of Electronics, Vol.20, No.2, pp , [11] J.B. Li, L.J. Yu and S.H. Sun, Refined kernel principal component analysis based feature extraction, Chinese Journal of Electronics, Vol.20, No.3, pp , [12] A.S. kowron and C. Rauszer, The Discernibility matrics and functions in information systems, Intelligent Decision Support, Springer, Netherlands, pp , [13] Z.Y. Xu, B.R. Yang and W. Song, Comparative study of different attribute reduction based on decision table, Chinese Journal of Electronics, Vol.15, No.4, pp , [14] R.L. Lang, Z.P. Xu and F. Gao, A knowledge acquisition method for fault diagnosis of airborne equipments based on support vector regression machine, Chinese Journal of Electronics, Vol.22, No.2, pp , [15] J.Y. Liang, F. Wang, C.Y. Dang, et al., A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering, Vol.26, No.2, pp , [16] F. Wang, J.Y. Liang and Y.H. Qian, Attribute reduction: A dimension incremental strategy, Knowledge-Based Systems, Vol.39, pp , [17] F. Hu, G.Y. Wang, H. Huang, et al., Incremental attribute reduction based on elementary sets, Proc. of the 10th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Regina, Canada, pp , [18] Y.Y. Yao and Y. Zhao, Discernibility matrix simplification for constructing attribute reducts, Information Sciences, Vol. 179, No.5, pp , [19] M. Yang, An incremental updating algorithm for attribute reduction based on improved discernibility matrix, Chinese Journal of Computers, Vol.30, No.5, pp , (in Chinese) [20] Y.T. Xu, L.S. Wang and R.Y. Zhang, A dynamic attribute reduction algorithm based on 0-1 integer programming, Knowledge-Based Systems, Vol.24, pp , [21] A. Asuncion and D.J. Newman, UCI machine learning repository, available at /ml/datasets.html, QIAN Wenbin was born in He received the Ph.D. degree in computer science from University of Science and Technology Beijing, in He is currently a lecturer with the software school of Jiangxi Agriculture University, China. His research interests include data mining, rough sets and machine learning. ( qianwenbin1027@126.com) SHU Wenhao was born in She is currently a Ph.D. candidate of the school of computer and information technology at Beijing Jiaotong University. Her research interests include granular computing, design and analysis of algorithm, and data mining. YANG Bingru was born in He is a lifetime chief professor and a Ph.D. supervisor in the school of computer and communication engineering and the dean of Institute of Knowledge Engineering at University of Science and Technology Beijing. His research works focus on the fields of knowledge discovery, knowledge engineering and intelligence system. He serves on the editorial board of several journals. ZHANG Changsheng was born in He is currently a Ph.D. candidate of the school of computer and communication engineering at University of Science and Technology Beijing. He is also a lecturer in Wenzhou University. His research interests include formal concept lattice and data mining.

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System *

Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System * JORNAL OF INFORMATION SCIENCE AND ENGINEERING 32, 783-798 (2016) Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System * WENBIN QIAN 1, WENHAO SH 2 AND

More information

Minimal Test Cost Feature Selection with Positive Region Constraint

Minimal Test Cost Feature Selection with Positive Region Constraint Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding

More information

On Reduct Construction Algorithms

On Reduct Construction Algorithms 1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory

More information

Feature Selection using Compact Discernibility Matrix-based Approach in Dynamic Incomplete Decision System *

Feature Selection using Compact Discernibility Matrix-based Approach in Dynamic Incomplete Decision System * JOURNAL OF INFORMATION SIENE AND ENGINEERING 31, 509-527 (2015) Feature Selection using ompact Discernibility Matrix-based Approach in Dynamic Incomplete Decision System * WENBIN QIAN 1, WENHAO SHU 2,

More information

A Model of Machine Learning Based on User Preference of Attributes

A Model of Machine Learning Based on User Preference of Attributes 1 A Model of Machine Learning Based on User Preference of Attributes Yiyu Yao 1, Yan Zhao 1, Jue Wang 2 and Suqing Han 2 1 Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada

More information

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

Attribute Reduction using Forward Selection and Relative Reduct Algorithm Attribute Reduction using Forward Selection and Relative Reduct Algorithm P.Kalyani Associate Professor in Computer Science, SNR Sons College, Coimbatore, India. ABSTRACT Attribute reduction of an information

More information

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY ALGEBRAIC METHODS IN LOGIC AND IN COMPUTER SCIENCE BANACH CENTER PUBLICATIONS, VOLUME 28 INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES WARSZAWA 1993 ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING

More information

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Feature Selection Based on Relative Attribute Dependency: An Experimental Study Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez

More information

Finding Rough Set Reducts with SAT

Finding Rough Set Reducts with SAT Finding Rough Set Reducts with SAT Richard Jensen 1, Qiang Shen 1 and Andrew Tuson 2 {rkj,qqs}@aber.ac.uk 1 Department of Computer Science, The University of Wales, Aberystwyth 2 Department of Computing,

More information

A study on lower interval probability function based decision theoretic rough set models

A study on lower interval probability function based decision theoretic rough set models Annals of Fuzzy Mathematics and Informatics Volume 12, No. 3, (September 2016), pp. 373 386 ISSN: 2093 9310 (print version) ISSN: 2287 6235 (electronic version) http://www.afmi.or.kr @FMI c Kyung Moon

More information

Rough Sets, Neighborhood Systems, and Granular Computing

Rough Sets, Neighborhood Systems, and Granular Computing Rough Sets, Neighborhood Systems, and Granular Computing Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract Granulation

More information

Efficient Rule Set Generation using K-Map & Rough Set Theory (RST)

Efficient Rule Set Generation using K-Map & Rough Set Theory (RST) International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 6 Efficient Rule Set Generation using K-Map & Rough Set Theory (RST) Durgesh Srivastava 1, Shalini Batra

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

Formal Concept Analysis and Hierarchical Classes Analysis

Formal Concept Analysis and Hierarchical Classes Analysis Formal Concept Analysis and Hierarchical Classes Analysis Yaohua Chen, Yiyu Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: {chen115y, yyao}@cs.uregina.ca

More information

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction Fuzzy-Rough Sets for Descriptive Dimensionality Reduction Richard Jensen and Qiang Shen {richjens,qiangs}@dai.ed.ac.uk Centre for Intelligent Systems and their Applications Division of Informatics, The

More information

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values Patrick G. Clark Department of Electrical Eng. and Computer Sci. University of Kansas Lawrence,

More information

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY MAJDI MAFARJA 1,2, SALWANI ABDULLAH 1 1 Data Mining and Optimization Research Group (DMO), Center for Artificial Intelligence

More information

A Logic Language of Granular Computing

A Logic Language of Granular Computing A Logic Language of Granular Computing Yiyu Yao and Bing Zhou Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yyao, zhou200b}@cs.uregina.ca Abstract Granular

More information

Data Analysis and Mining in Ordered Information Tables

Data Analysis and Mining in Ordered Information Tables Data Analysis and Mining in Ordered Information Tables Ying Sai, Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Ning Zhong

More information

Feature Selection with Adjustable Criteria

Feature Selection with Adjustable Criteria Feature Selection with Adjustable Criteria J.T. Yao M. Zhang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: jtyao@cs.uregina.ca Abstract. We present a

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data

Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data Jiabin Liu 1,2,FanMin 2(B), Hong Zhao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities,

More information

A rule-extraction framework under multigranulation rough sets

A rule-extraction framework under multigranulation rough sets DOI 10.1007/s13042-013-0194-0 ORIGINAL ARTICLE A rule-extraction framework under multigranulation rough sets Xin Liu Yuhua Qian Jiye Liang Received: 25 January 2013 / Accepted: 10 August 2013 Ó Springer-Verlag

More information

A New Approach to Evaluate Operations on Multi Granular Nano Topology

A New Approach to Evaluate Operations on Multi Granular Nano Topology International Mathematical Forum, Vol. 12, 2017, no. 4, 173-184 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/imf.2017.611154 A New Approach to Evaluate Operations on Multi Granular Nano Topology

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 96 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 179 186 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems,

More information

Rough Set Approaches to Rule Induction from Incomplete Data

Rough Set Approaches to Rule Induction from Incomplete Data Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4 9, 2004, vol. 2, 923 930 Rough

More information

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction Jerzy W. Grzymala-Busse 1,2 1 Department of Electrical Engineering and Computer Science, University of

More information

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

Rough Set based Cluster Ensemble Selection

Rough Set based Cluster Ensemble Selection Rough Set based Cluster Ensemble Selection Xueen Wang, Deqiang Han, Chongzhao Han Ministry of Education Key Lab for Intelligent Networks and Network Security (MOE KLINNS Lab), Institute of Integrated Automation,

More information

A Decision-Theoretic Rough Set Model

A Decision-Theoretic Rough Set Model A Decision-Theoretic Rough Set Model Yiyu Yao and Jingtao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,jtyao}@cs.uregina.ca Special Thanks to Professor

More information

An Effective Feature Selection Approach Using the Hybrid Filter Wrapper

An Effective Feature Selection Approach Using the Hybrid Filter Wrapper , pp. 119-128 http://dx.doi.org/10.14257/ijhit.2016.9.1.11 An Effective Feature Selection Approach Using the Hybrid Filter Wrapper Haitao Wang 1 and Shufen Liu 2 1 School of Computer Science and Technology

More information

REDUNDANCY OF MULTISET TOPOLOGICAL SPACES

REDUNDANCY OF MULTISET TOPOLOGICAL SPACES Iranian Journal of Fuzzy Systems Vol. 14, No. 4, (2017) pp. 163-168 163 REDUNDANCY OF MULTISET TOPOLOGICAL SPACES A. GHAREEB Abstract. In this paper, we show the redundancies of multiset topological spaces.

More information

A Test Suite Reduction Method based on Test Requirement Partition

A Test Suite Reduction Method based on Test Requirement Partition A Test Suite Reduction Method based on Test Requirement Partition Wan Yongbing 1, Xu Zhongwei 1, Yu Gang 2 and Zhu YuJun 1 1 School of Electronics & Information Engineering, Tongji University, Shanghai,

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search Jianli Ding, Liyang Fu School of Computer Science and Technology Civil Aviation University of China

More information

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School

More information

On Generalizing Rough Set Theory

On Generalizing Rough Set Theory On Generalizing Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract. This paper summarizes various formulations

More information

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954

More information

Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction

Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction Hongbo Liu 1, Ajith Abraham 2, Hong Ye 1 1 School of Computer Science, Dalian Maritime University, Dalian 116026, China

More information

Consistency Based Attribute Reduction

Consistency Based Attribute Reduction Consistency ased Attribute eduction inghua Hu, Hui Zhao, Zongxia Xie, and aren Yu Harbin nstitute of Technology, Harbin 150001, P China huqinghua@hcmshiteducn ough sets are widely used in feature subset

More information

Sequential Three-way Decisions with Probabilistic Rough Sets

Sequential Three-way Decisions with Probabilistic Rough Sets Sequential Three-way Decisions with Probabilistic Rough Sets Yiyu Yao and Xiaofei Deng Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,deng200x}@cs.uregina.ca

More information

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM Pavel Jirava Institute of System Engineering and Informatics Faculty of Economics and Administration, University of Pardubice Abstract: This article

More information

A Rough Set Approach to Data with Missing Attribute Values

A Rough Set Approach to Data with Missing Attribute Values A Rough Set Approach to Data with Missing Attribute Values Jerzy W. Grzymala-Busse Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA and Institute

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

An Improvement on Sub-Herbrand Universe Computation

An Improvement on Sub-Herbrand Universe Computation 12 The Open Artificial Intelligence Journal 2007 1 12-18 An Improvement on Sub-Herbrand Universe Computation Lifeng He *1 Yuyan Chao 2 Kenji Suzuki 3 Zhenghao Shi 3 and Hidenori Itoh 4 1 Graduate School

More information

Fuzzy Stabilizer in IMTL-Algebras

Fuzzy Stabilizer in IMTL-Algebras Appl. Math. Inf. Sci. 8, No. 5, 2479-2484 (2014) 2479 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/080544 Fuzzy Stabilizer in IMTL-Algebras Maosen

More information

An Integrated Face Recognition Algorithm Based on Wavelet Subspace

An Integrated Face Recognition Algorithm Based on Wavelet Subspace , pp.20-25 http://dx.doi.org/0.4257/astl.204.48.20 An Integrated Face Recognition Algorithm Based on Wavelet Subspace Wenhui Li, Ning Ma, Zhiyan Wang College of computer science and technology, Jilin University,

More information

Modeling the Real World for Data Mining: Granular Computing Approach

Modeling the Real World for Data Mining: Granular Computing Approach Modeling the Real World for Data Mining: Granular Computing Approach T. Y. Lin Department of Mathematics and Computer Science San Jose State University San Jose California 95192-0103 and Berkeley Initiative

More information

Intuitionistic Fuzzy Petri Nets for Knowledge Representation and Reasoning

Intuitionistic Fuzzy Petri Nets for Knowledge Representation and Reasoning Intuitionistic Fuzzy Petri Nets for Knowledge Representation and Reasoning Meng Fei-xiang 1 Lei Ying-jie 1 Zhang Bo 1 Shen Xiao-yong 1 Zhao Jing-yu 2 1 Air and Missile Defense College Air Force Engineering

More information

A Divide-and-Conquer Discretization Algorithm

A Divide-and-Conquer Discretization Algorithm A Divide-and-Conquer Discretization Algorithm Fan Min, Lijun Xie, Qihe Liu, and Hongbin Cai College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

A Generalized Decision Logic Language for Granular Computing

A Generalized Decision Logic Language for Granular Computing A Generalized Decision Logic Language for Granular Computing Y.Y. Yao Department of Computer Science, University of Regina, Regina Saskatchewan, Canada S4S 0A2, E-mail: yyao@cs.uregina.ca Churn-Jung Liau

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

A STUDY ON COMPUTATIONAL INTELLIGENCE TECHNIQUES TO DATA MINING

A STUDY ON COMPUTATIONAL INTELLIGENCE TECHNIQUES TO DATA MINING A STUDY ON COMPUTATIONAL INTELLIGENCE TECHNIQUES TO DATA MINING Prof. S. Selvi 1, R.Priya 2, V.Anitha 3 and V. Divya Bharathi 4 1, 2, 3,4 Department of Computer Engineering, Government college of Engineering,

More information

Data Mining & Feature Selection

Data Mining & Feature Selection دااگشنه رتبيت م عل م Data Mining & Feature Selection M.M. Pedram pedram@tmu.ac.ir Faculty of Engineering, Tarbiat Moallem University The 11 th Iranian Confernce on Fuzzy systems, 5-7 July, 2011 Contents

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

APPROXIMATION DISTANCE CALCULATION ON CONCEPT LATTICE

APPROXIMATION DISTANCE CALCULATION ON CONCEPT LATTICE International Journal of Physics and Mathematical Sciences ISSN: 77-111 (Online) 015 Vol. 5 (3) July- September, pp. 7-13/Mao et al. APPROXIMATION DISTANC CALCULATION ON CONCPT LATTIC Hua Mao, *Ran Kang,

More information

Induction of Strong Feature Subsets

Induction of Strong Feature Subsets Induction of Strong Feature Subsets Mohamed Quafafou and Moussa Boussouf IRIN, University of Nantes, 2 rue de la Houssiniere, BP 92208-44322, Nantes Cedex 03, France. quafafou9 Abstract The problem of

More information

Optimization of fuzzy multi-company workers assignment problem with penalty using genetic algorithm

Optimization of fuzzy multi-company workers assignment problem with penalty using genetic algorithm Optimization of fuzzy multi-company workers assignment problem with penalty using genetic algorithm N. Shahsavari Pour Department of Industrial Engineering, Science and Research Branch, Islamic Azad University,

More information

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling Yanhong Wang*, Dandan Ji Department of Information Science and Engineering, Shenyang University of Technology, Shenyang 187, China. * Corresponding

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

FMC: An Approach for Privacy Preserving OLAP

FMC: An Approach for Privacy Preserving OLAP FMC: An Approach for Privacy Preserving OLAP Ming Hua, Shouzhi Zhang, Wei Wang, Haofeng Zhou, Baile Shi Fudan University, China {minghua, shouzhi_zhang, weiwang, haofzhou, bshi}@fudan.edu.cn Abstract.

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Some Properties of Intuitionistic. (T, S)-Fuzzy Filters on. Lattice Implication Algebras

Some Properties of Intuitionistic. (T, S)-Fuzzy Filters on. Lattice Implication Algebras Theoretical Mathematics & Applications, vol.3, no.2, 2013, 79-89 ISSN: 1792-9687 (print), 1792-9709 (online) Scienpress Ltd, 2013 Some Properties of Intuitionistic (T, S)-Fuzzy Filters on Lattice Implication

More information

CLUSTERING analysis [1] is one of the most popular

CLUSTERING analysis [1] is one of the most popular Rough K-modes Clustering Algorithm Based on Entropy Qi Duan, You Long Yang, and Yang Li Abstract Cluster analysis is an important technique used in data mining. Categorical data clustering has received

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

arxiv: v1 [cs.ai] 25 Sep 2012

arxiv: v1 [cs.ai] 25 Sep 2012 Feature selection with test cost constraint Fan Min a,, Qinghua Hu b, William Zhu a a Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China b Tianjin University, Tianjin 300072,

More information

Fuzzy Entropy based feature selection for classification of hyperspectral data

Fuzzy Entropy based feature selection for classification of hyperspectral data Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering NIT Kurukshetra, 136119 mpce_pal@yahoo.co.uk Abstract: This paper proposes to use

More information

Application of Fuzzy Soft Set in Selection Decision Making Problem Rajesh Kumar Pal

Application of Fuzzy Soft Set in Selection Decision Making Problem Rajesh Kumar Pal Application of Fuzzy Soft Set in Selection Decision Making Problem Rajesh Kumar Pal Assistant Professor, Department of Mathematics, DAV (PG) College, Dehradun (UK) India, Pin-248001 Abstract: In our daily

More information

On Fuzzy Topological Spaces Involving Boolean Algebraic Structures

On Fuzzy Topological Spaces Involving Boolean Algebraic Structures Journal of mathematics and computer Science 15 (2015) 252-260 On Fuzzy Topological Spaces Involving Boolean Algebraic Structures P.K. Sharma Post Graduate Department of Mathematics, D.A.V. College, Jalandhar

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Support Vector Regression for Software Reliability Growth Modeling and Prediction

Support Vector Regression for Software Reliability Growth Modeling and Prediction Support Vector Regression for Software Reliability Growth Modeling and Prediction 925 Fei Xing 1 and Ping Guo 2 1 Department of Computer Science Beijing Normal University, Beijing 100875, China xsoar@163.com

More information

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

FCA-Map Results for OAEI 2016

FCA-Map Results for OAEI 2016 FCA-Map Results for OAEI 2016 Mengyi Zhao 1 and Songmao Zhang 2 1,2 Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China 1 myzhao@amss.ac.cn,

More information

Climate Precipitation Prediction by Neural Network

Climate Precipitation Prediction by Neural Network Journal of Mathematics and System Science 5 (205) 207-23 doi: 0.7265/259-529/205.05.005 D DAVID PUBLISHING Juliana Aparecida Anochi, Haroldo Fraga de Campos Velho 2. Applied Computing Graduate Program,

More information

Vague Congruence Relation Induced by VLI Ideals of Lattice Implication Algebras

Vague Congruence Relation Induced by VLI Ideals of Lattice Implication Algebras American Journal of Mathematics and Statistics 2016, 6(3): 89-93 DOI: 10.5923/j.ajms.20160603.01 Vague Congruence Relation Induced by VLI Ideals of Lattice Implication Algebras T. Anitha 1,*, V. Amarendra

More information

REDUCING GRAPH COLORING TO CLIQUE SEARCH

REDUCING GRAPH COLORING TO CLIQUE SEARCH Asia Pacific Journal of Mathematics, Vol. 3, No. 1 (2016), 64-85 ISSN 2357-2205 REDUCING GRAPH COLORING TO CLIQUE SEARCH SÁNDOR SZABÓ AND BOGDÁN ZAVÁLNIJ Institute of Mathematics and Informatics, University

More information

Granular Computing. Y. Y. Yao

Granular Computing. Y. Y. Yao Granular Computing Y. Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca, http://www.cs.uregina.ca/~yyao Abstract The basic ideas

More information

A Fast Method for Extracting all Minimal Siphons from Maximal Unmarked Siphons of a Petri Net

A Fast Method for Extracting all Minimal Siphons from Maximal Unmarked Siphons of a Petri Net 582 JOURNAL OF SOFTWARE, VOL. 9, NO. 3, MARCH 2014 A Fast Method for Extracting all Minimal Siphons from Maximal Unmarked Siphons of a Petri Net Qiaoli Zhuang School of Information Science and Technology,

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

Han Liu, Alexander Gegov & Mihaela Cocea

Han Liu, Alexander Gegov & Mihaela Cocea Rule-based systems: a granular computing perspective Han Liu, Alexander Gegov & Mihaela Cocea Granular Computing ISSN 2364-4966 Granul. Comput. DOI 10.1007/s41066-016-0021-6 1 23 Your article is published

More information

Yiyu Yao University of Regina, Regina, Saskatchewan, Canada

Yiyu Yao University of Regina, Regina, Saskatchewan, Canada ROUGH SET APPROXIMATIONS: A CONCEPT ANALYSIS POINT OF VIEW Yiyu Yao University of Regina, Regina, Saskatchewan, Canada Keywords: Concept analysis, data processing and analysis, description language, form

More information

Multidirectional 2DPCA Based Face Recognition System

Multidirectional 2DPCA Based Face Recognition System Multidirectional 2DPCA Based Face Recognition System Shilpi Soni 1, Raj Kumar Sahu 2 1 M.E. Scholar, Department of E&Tc Engg, CSIT, Durg 2 Associate Professor, Department of E&Tc Engg, CSIT, Durg Email:

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

Cost-sensitive C4.5 with post-pruning and competition

Cost-sensitive C4.5 with post-pruning and competition Cost-sensitive C4.5 with post-pruning and competition Zilong Xu, Fan Min, William Zhu Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363, China Abstract Decision tree is an effective

More information

S-APPROXIMATION SPACES: A FUZZY APPROACH

S-APPROXIMATION SPACES: A FUZZY APPROACH Iranian Journal of Fuzzy Systems Vol. 14, No.2, (2017) pp. 127-154 127 S-APPROXIMATION SPACES: A FUZZY APPROACH A. SHAKIBA, M. R. HOOSHMANDASL, B. DAVVAZ AND S. A. SHAHZADEH FAZELI Abstract. In this paper,

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}. Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

FUZZY METRIC SPACES ZUN-QUAN XIA AND FANG-FANG GUO

FUZZY METRIC SPACES ZUN-QUAN XIA AND FANG-FANG GUO J. Appl. Math. & Computing Vol. 16(2004), No. 1-2, pp. 371-381 FUZZY METRIC SPACES ZUN-QUAN XIA AND FANG-FANG GUO Abstract. In this paper, fuzzy metric spaces are redefined, different from the previous

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Research on Design and Application of Computer Database Quality Evaluation Model

Research on Design and Application of Computer Database Quality Evaluation Model Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

ESSENTIALLY, system modeling is the task of building

ESSENTIALLY, system modeling is the task of building IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 53, NO. 4, AUGUST 2006 1269 An Algorithm for Extracting Fuzzy Rules Based on RBF Neural Network Wen Li and Yoichi Hori, Fellow, IEEE Abstract A four-layer

More information