Minimal Test Cost Feature Selection with Positive Region Constraint

Similar documents
Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data

arxiv: v1 [cs.ai] 25 Sep 2012

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

Cost-sensitive C4.5 with post-pruning and competition

On Reduct Construction Algorithms

Research Article Ant Colony Optimization with Three Stages for Independent Test Cost Attribute Reduction

Granular association rules for multi-valued data

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY

A rule-extraction framework under multigranulation rough sets

A Model of Machine Learning Based on User Preference of Attributes

Rough Set Approaches to Rule Induction from Incomplete Data

Mining High Order Decision Rules

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

A Logic Language of Granular Computing

A Divide-and-Conquer Discretization Algorithm

An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set

International Journal of Approximate Reasoning

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

A study on lower interval probability function based decision theoretic rough set models

On Generalizing Rough Set Theory

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

Formal Concept Analysis and Hierarchical Classes Analysis

ORG - Oblique Rules Generator

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

Classification with Diffuse or Incomplete Information

Improving Classifier Performance by Imputing Missing Values using Discretization Method

Dynamic Tabu Search for Dimensionality Reduction in Rough Set

A Rough Set Approach to Data with Missing Attribute Values

Efficient SQL-Querying Method for Data Mining in Large Data Bases

High Capacity Reversible Watermarking Scheme for 2D Vector Maps

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning

Yiyu Yao University of Regina, Regina, Saskatchewan, Canada

Collaborative Rough Clustering

Comparison of Heuristics for Optimization of Association Rules

RPKM: The Rough Possibilistic K-Modes

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Mining Local Association Rules from Temporal Data Set

An Effective Feature Selection Approach Using the Hybrid Filter Wrapper

Data Analysis and Mining in Ordered Information Tables

Feature Selection with Adjustable Criteria

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL., NO., MONTH YEAR 1

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

About New Version of RSDS System

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

Rough Set based Cluster Ensemble Selection

Graph Matching Iris Image Blocks with Local Binary Pattern

A Finite State Mobile Agent Computation Model

Sequential Three-way Decisions with Probabilistic Rough Sets

LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL SEARCH ALGORITHM

Feature Selection based on Rough Sets and Particle Swarm Optimization

Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System *

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Application of Rough Set for Routing Selection Based on OSPF Protocol

Research on Design and Application of Computer Database Quality Evaluation Model

Rough Sets, Neighborhood Systems, and Granular Computing

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction

A Test Sequence Generation Method Based on Dependencies and Slices Jin-peng MO *, Jun-yi LI and Jian-wen HUANG

Face Recognition with Rough-Neural Network: A Rule Based Approach

Automated Segmentation Using a Fast Implementation of the Chan-Vese Models

The Rough Set Database System: An Overview

Rough Approximations under Level Fuzzy Sets

A New Approach to Evaluate Operations on Multi Granular Nano Topology

Finding Rough Set Reducts with SAT

Support Vector Regression for Software Reliability Growth Modeling and Prediction

Simulating a Finite State Mobile Agent System

A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

An efficient face recognition algorithm based on multi-kernel regularization learning

Granular Computing. Y. Y. Yao

Efficient Rule Set Generation using K-Map & Rough Set Theory (RST)

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine

Induction of Strong Feature Subsets

Constrained modification of the cubic trigonometric Bézier curve with two shape parameters

A Bricklaying Best-Fit Heuristic Algorithm for the Orthogonal Rectangle Packing Problem*

An Edge-Swap Heuristic for Finding Dense Spanning Trees

Local and Global Approximations for Incomplete Data

arxiv: v3 [cs.fl] 5 Mar 2017

Scheduling Mixed-Model Assembly Lines with Cost Objectives by a Hybrid Algorithm

Efficiently Mining Positive Correlation Rules

Feature Selection using Compact Discernibility Matrix-based Approach in Dynamic Incomplete Decision System *

A Graph Theoretic Approach to Image Database Retrieval

Mubug: a mobile service for rapid bug tracking

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

Thresholds Determination for Probabilistic Rough Sets with Genetic Algorithms


An Extension of Complexity Bounds and Dynamic Heuristics for Tree-Decompositions of CSP

Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction

Algorithms for Minimum m-connected k-dominating Set Problem

Value Added Association Rules

Enhancing K-means Clustering Algorithm with Improved Initial Center

C-NBC: Neighborhood-Based Clustering with Constraints

Granular Computing: A Paradigm in Information Processing Saroj K. Meher Center for Soft Computing Research Indian Statistical Institute, Kolkata

Pattern Recognition Using Graph Theory

Transcription:

Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding 626001, China. liujiabin418@163.com 2 Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China minfanphd@163.com Abstract. Test cost is often required to obtain feature values of an object. When this issue is involved, people are often interested in schemes minimizing it. In many data mining applications, due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100% accuracy. There may be an industrial standard to indicate the accuracy of the classification. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. The constraint is expressed by the positive region; whereas the objective is to minimize the total test cost. The new problem is essentially a dual of the test cost constraint attribute reduction problem, which has been addressed recently. We propose a heuristic algorithm based on the information gain, the test cost, and a user specified parameter λ to deal with the new problem. Experimental results indicatetherational settingof λ is different among datasets, and the algorithm is especially stable when the test cost is subject to the Pareto distribution. Keywords: cost-sensitive learning, positive region, test cost, constraint, heuristic algorithm. 1 Introduction When industrial products are manufactured, they must be inspected strictly before delivery. Testing equipments are needed to classify the product as qualified, unqualified, etc. Each equipment costs money, which will be averaged on each product. Generally, we should pay more to obtain better classification accuracy. However, in real world applications, due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100% accuracy. There may be an industrial standard to indicate the accuracy of the classification, such as 95%. Consequently, we are interested in a set of equipments with minimal cost meeting the standard. In this scenario, there are two issues: one is the equipment cost, and the other is the product classification accuracy. They are called test cost and classification accuracy, respectively. Since the classification accuracy only needs to meet the industrial standard, we can choose some testing equipments to make the total cost minimal. Corresponding author. J.T. Yao et al. (Eds.): RSCTC 2012, LNAI 7413, pp. 259 266, 2012. c Springer-Verlag Berlin Heidelberg 2012

260 J. Liu et al. Feature selection plays an important role in machine learning and pattern recognition application [1]. It has been defined by many authors by booking at it from various angles [11]. Minimal reducts have the best generalization ability, hence there are many existing feature selection reduction algorithms based on rough set to deal with it, such as [15,16,18,20,23] are devoted to find one of them. The positive region is a widely used concept in rough set [12]. We use this concept instead of the classification accuracy to specify the industrial standard. We formally define this problem and call it the minimal test cost feature selection with positive region constraint (MTPC) problem. The new problem is essentially a dual of the optimal subreduct with test cost constraint (OSRT) problem, which has been defined in [7] and studied in [7,8,9]. The OSRT problem considers the test cost constraint, while the new problem considers the positive region constraint. As will be discussed in the following text, the classical reduct problem can be viewed as a special case of the MTPC problem. Since the classical reduct problem is NP-hard, the new problem is at least NP-hard. Consequently, we propose a heuristic algorithm to deal with it. The heuristic information function is based on both the information gain and the test cost. This algorithm is tested on four UCI datasets with various test cost settings. Experimental results indicate the rational setting of λ is different among datasets, and the algorithm is especially stable when the test cost is subject to the Pareto distribution. The rest of this paper is structured as follows. Section 2 describes related concepts in the rough set theory and defines the MTPC problem formally. In Section 3, a heuristic algorithm based on λ-weighted information gain is presented. Section 4 illustrates some results on four UCI datasets with detailed analysis. Finally, Section 5 concludes. 2 Preliminaries In this section, we define the MTPC problem. First, we revisit the data model on which the problem is defined. Then we review the concept of positive region. Finally we propose the minimal test cost feature selection with positive region constraint problem. 2.1 Test-Cost-Independent Decision Systems Decision systems are fundamental in machine learning and data mining. A decision system is often denoted as S =(U, C, D, {V a a C D}, {I a a C D}),whereU is a finite set of objects called the universe, C is the set of conditional attributes, D is the set of decision attributes, V a is the set of values for each a C D, andi a : U V a is an information function for each a C D. We often denote {V a a C D} and {I a a C D} by V and I, respectively. A decision system is often stored in a relational database or a text file. A test-cost-independent decision system (TCI-DS) [6] is a decision system with test cost information represented by a vector. It is the most simple form of the test-costsensitive decision system and defined as follows. Definition 1. [6] A test-cost-independent decision system (TCI-DS) S is the 6-tuple: S =(U, C, D, V, I, c), (1)

Minimal Test Cost Feature Selection with Positive Region Constraint 261 where U, C, D, V and I have the same meanings as in a decision system, and c : C R + {0} is the test cost function. It can easily be represented by a vector c= [c(a 1 ),c(a 2 ),,c(a C )]. Test costs are independent of one another, that is, c(b) = a B c(a) for any B C. 2.2 Positive region Let S =(U, C, D, V, I) be a decision system. Any B C D determines an indiscernibility relation on U. A partition determined by B is denoted by U/B. Let B(X) denote the B-lower approximation of X. Definition 2. [13] Let S =(U, C, D, V, I) be a decision system, B C, the positive region of D with respect to B is defined as POS B (D) = B(X), (2) X U/D where U, C, D, V and I have the same meanings as in a decision system. In other words, D is totally (partially) dependent on B, if all (some) elements of the universe U can be uniquely classified to blocks of the partition U/D, employing B [12]. 2.3 Problem definition Attribute reduction is the process of choosing an appropriate subset of attributes from the original dataset [17]. There are numerous reduct problems which have been defined on the classical [14], the covering-based [21,22,23], the decision-theoretical [19], and the dominance-based [2] rough set models. Respective definitions of relative reducts also have been studied in [3,15]. Definition 3. [13] Let S =(U, C, D, V, I) be a decision system. Any B C is called a decision relative reduct (or a relative reduct for brevity) of S iff: (1) POS B (D) =POS C (D), and (2) a B,POS B {a} (D) POS B (D). Definition 3 implies two issues. One is that the reduct is jointly sufficient, the other is that the reduct is individually necessary for preserving a particular property (positive region in this context) of the decision systems [4]. The set of all relative reducts of S is denoted by Red(S). The core of S is the intersection of these reducts, namely, core(s) = Red(S). Core attributes are of great importance to the decision system and should never be removed, except when information loss is allowed [19]. In this paper, due to the positive region constraint, it is not necessary to construct a reduct. On the other side, we never want to select any redundant test. Therefore we propose the following concept. Definition 4. Let S =(U, C, D, V, I) be a decision system. Any B C is a positive region sub-reduct of S iff a B, POS B {a} (D) POS B (D).

262 J. Liu et al. According to the Definition 4, we observe the following: (1) A reduct is also a sub-reduct, and (2) A core attribute may not be included in a sub-reduct. Here we are interested those feature subsets satisfying the positive region constraint, and at the same time, with minimal possible test cost. We adopt the style of [5] and propose the following problem. Problem 1. The minimal test cost feature selection with positive region constraint (MTPC) problem. Input: S =(U, C, d, V, I, c), the positive region lower bound pl; Output: B C; Constraint: POS B (D) / POS C (D) pl; Optimization objective: min c(b). In fact, the MTPC problem is more general than the minimal test cost reduct problem, which is defined in [4]. In case where pl =1, it coincides with the later. The minimal test cost reduct problem is in turn more general than the classical reduct problem, which is NP-hard. Therefore the MTPC problem is at least NP-hard, and heuristic algorithms are needed to deal with it. Note that the MTPC is different with the variable precision rough set model. The variable precision rough set model changes the lower approximation by varying the accuracy, but in our problem definition, it is unchanged. 3 The Algorithm Similar to the heuristic algorithm to the OSRT problem [7], we also design a heuristic algorithm to deal with the new problem. We firstly analyze the heuristic function which is the key issue in the algorithm. Let B C and a i C B, the information gain of a i with respect to B is f e (B,a i )=H({d} B) H({d} B {a i }), (3) where d D is a decision attribute. At the same time, the λ-weighted function is defined as f(b,a i,c,λ)=f e (B,a i )c λ i. (4) where λ is a non-positive number. Our algorithm is listed in Algorithm 1. It contains two main steps. The first step contains lines 3 through 8. Attributes are added to B one by one according to the heuristic function indicated in Equation (4). This step stops while the positive region reaches the lower bound. The second step contains lines 9 through 15. Redundant attributes are removed from B one by one until all redundant have been removed. As discussed in Section 2.3, our algorithm has not a stage of core computing.

Minimal Test Cost Feature Selection with Positive Region Constraint 263 Algorithm 1. A heuristic algorithm to the MTPC problem Input: S =(U, C, D, V, I, c), p con,λ Output: A sub-reduct of S Method:MTPC 1: B = ; //the sub-reduct 2: CA = C; //the unprocessed attributes 3: while ( POS B(D) <p con) do 4: For any a CA compute f(b,a,c,λ) //Addition 5: Select a with maximal f(b,a,c,λ); 6: B = B {a }; 7: CA = CA {a }; 8: end while //Deletion, B must be a sub-reduct 9: CD = B; //sort attribute in CD according to respective test cost in a descending order 10: while CD do 11: CD = CD {a }; //where a is the first element in CD 12: if (POS B {a }(D) =POS B(D)) then 13: B = B {a }; 14: end if 15: end while 16: return B 4 Experiments To study the effectiveness of the algorithm, we have undertaken experiments using our open source software Coser [10] on 4 different datasets from the UCI library. To evaluate the performance of the algorithm, we need to study the quality of each subreduct which it computes. This experiment should be undertaken by comparing each sub-reduct to an optimal sub-reduct with the positive region constraint. Unfortunately, the computation of an optimal sub-reduct with test positive region constraint is more complex than that of a minimal reduct, or that of a minimal test cost reduct. In this paper, we only study the influence of λ to the quality of the result. Because of lacking the predefined test costs in the four artificial datasets, we specify them as the same setting as that of [4] to produce test costs within [1, 100]. Three distributions, namely, Uniform, Normal, and bounded Pareto, are employed. In order to control the shape of the Normal distribution and the bounded Pareto distribution respectively, we must set the parameter α. In our experiment, for the Normal distribution, α =8, and test costs as high as 70 and as low as 30 are often generated. For the bounded Pareto distribution, α =2, and test costs higher than 50 are often generated. We intentionally set pl =0.8. This setting shows that we need a sub-reduct rather than a reduct. The experimental results of the 4 datasets are illustrated in Fig 1. By running our program in different λ values, the 3 different test cost distributions are compared. We can observe the following.

264 J. Liu et al. (a) (b) (c) (d) Fig. 1. Optimal probability: (a) zoo; (b) iris; (c) voting; (d) tic-tac-toe

Minimal Test Cost Feature Selection with Positive Region Constraint 265 (1) The result is influenced by the user-specified λ. The probability of obtained the best results is different with different λ values, where the best" means the best one over the solutions we obtained, not the optimal one. (2) The algorithm s performance is related with the test cost distribution. It is best on datasets with bounded Pareto distribution. At the same time, it is worst on datasets with Normal distribution. Consequently, if the real data has test cost subject to the Normal distribution, one may develop other heuristic algorithms to this problem. (3) There is not a setting of λ such that the algorithm always can obtain the best result. 5 Conclusions In this paper, we firstly proposed the MTPC problem. Then we designed a heuristic algorithm to deal with it. Experimental results indicates that the optimal solution is not easy to obtain. In the future, we will develop an exhaustive algorithm to evaluate the performance of a heuristic algorithm. We will also develop more advanced heuristic algorithms to obtain better performance. Acknowledgements. This work is partially supported by the Natural Science Youth Foundation of Department of Education of Sichuan Province under Grant No. 2006C059, the National Science Foundation of China under Grant No. 61170128, the Natural Science Foundation of Fujian Province under Grant No. 2011J01374, and the Education Department of Fujian Province under Grant No. JA11176. References 1. Chen, X.: An Improved Branch and Bound Algorithm for Feature Selection. Pattern Recognition Letters 24(12), 1925 1933 2. Greco, S., Matarazzo, B., Słowiński, R., Stefanowski, J.: Variable Consistency Model of Dominance-Based Rough Sets Approach. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 170 181. Springer, Heidelberg (2001) 3. Hu, Q., Yu, D., Liu, J., Wu, C.: Neighborhood rough set based heterogeneous feature subset selection. Information Sciences 178(18), 3577 3594 (2008) 4. Min, F., He, H., Qian, Y., Zhu, W.: Test-cost-sensitive attribute reduction. Information Sciences 181, 4928 4942 (2011) 5. Min, F., Hu, Q., Zhu, W.: Feature selection with test cost constraint. Submitted to Knowledge-Based Systems (2012) 6. Min, F., Liu, Q.: A hierarchical model for test-cost-sensitive decision systems. Information Sciences 179, 2442 2452 (2009) 7. Min, F., Zhu, W.: Attribute reduction with test cost constraint. Journal of Electronic Science and Technology of China 9(2), 97 102 (2011) 8. Min, F., Zhu, W.: Optimal sub-reducts in the dynamic environment. In: IEEE GrC, pp. 457 462 (2011) 9. Min, F., Zhu, W.: Optimal Sub-Reducts with Test Cost Constraint. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 57 62. Springer, Heidelberg (2011)

266 J. Liu et al. 10. Min, F., Zhu, W., Zhao, H., Pan, G., Liu, J.: Coser: Cost-senstive rough sets (2011), http://grc.fjzs.edu.cn/~fmin/coser/ 11. Dash, M., Liu, H.: Feature Selection for Classification. Intelligent Data Analysis 1, 131 156 (1997) 12. Pawlak, Z.: Rough set approach to knowledge-based decision support. European Joumal of Operational Research 99, 48 57 (1997) 13. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341 356 (1982) 14. Pawlak, Z.: Rough sets and intelligent data analysis. Information Sciences 147(12), 1 12 (2002) 15. Qian, Y., Liang, J., Pedrycz, W., Dang, C.: Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence 174(9-10), 597 618 (2010) 16. Swiniarski, W., Skowron, A.: Rough set methods in feature selection and recognition. Pattern Recognition Letters 24(6), 833 849 (2003) 17. Wang, X., Yang, J., Teng, X., Xia, W., Jensen, R.: Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Letters 28(4), 459 471 (2007) 18. Yao, J., Zhang, M.: Feature Selection with Adjustable Criteria. In: Ślęzak, D., Wang, G., Szczuka, M.S., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3641, pp. 204 213. Springer, Heidelberg (2005) 19. Yao, Y., Zhao, Y.: Attribute reduction in decision-theoretic rough set models. Information Sciences 178(17), 3356 3373 (2008) 20. Zhange, W., Mi, J., Wu, W.: Knowledge reductions in inconsistent information systems. Chinese Journal of Computers 26(1), 12 18 (2003) 21. Zhu, W.: Topological approaches to covering rough sets. Information Sciences 177(6), 1499 1508 (2007) 22. Zhu, W.: Relationship between generalized rough sets based on binary relation and covering. Information Sciences 179(3), 210 225 (2009) 23. Zhu, W., Wang, F.: Reduction and axiomization of covering generalized rough sets. Information Sciences 152(1), 217 230 (2003)