Consistency Based Attribute Reduction

Size: px
Start display at page:

Download "Consistency Based Attribute Reduction"

Transcription

1 Consistency ased Attribute eduction inghua Hu, Hui Zhao, Zongxia Xie, and aren Yu Harbin nstitute of Technology, Harbin , P China huqinghua@hcmshiteducn ough sets are widely used in feature subset selection and attribute reduction n most of the existing algorithms, the dependency function is employed to evaluate the quality of a feature subset The disadvantages of using dependency are discussed in this paper And the problem of forward greedy search algorithm based on dependency is presented We introduce the consistency measure to deal with the problems The relationship between dependency and consistency is analyzed t is shown that consistency measure can reflects not only the size of decision positive region, like dependency, but also the sample distribution in the boundary region Therefore it can more finely describe the distinguishing power of an attribute set ased on consistency, we redefine the redundancy and reduct of a decision system We construct a forward greedy search algorithm to find reducts based on consistency What s more, we employ cross validation to test the selected features, and reduce the overfitting features in a reduct The experimental results with UC data show that the proposed algorithm is effective and efficient 1 ntroduction As the capability of gathering and storing data increases, there are a lot of candidate features in some pattern recognition and machine learning tasks Applications show that excessive features will not only significantly slow down the learning process, but also decrease the generalization power of the learned classifiers Attribute reduction, also called feature subset selection, is usually employed as a preprocessing step to select part of the features and focuses the learning algorithm on the relevant information [1, 3, 4, 5, 7, 8] n recent years, rough set theory has been widely discussed and used in attribute reduction and feature selection [6, 7, 8, 14, 16, 17] educt is a proper term in rough set methodology t means a minimal attribute subset with the same approximating power as the whole set [14] This definition shows that a reduct should have the least redundant information and not loss the classification ability of the raw data Thus the attributes in a reduct should not only be strongly relevant to the learning task, but also be not redundant with each other This property of reducts exactly accords with the objective of feature selection Thereby, the process of searching reducts, called attribute reduction, is a feature subset selection process As so far, a series of approaches to search reducts have been published iscernibility Matrices [11, 14] were introduced to store the features which can distinguish the corresponding pair of objects, and then oolean operations were conducted on the matrices to search all of the reducts The main problem of this method is space and Z-H Zhou, H i, and Yang (Eds): PAK 2007, NA 4426, pp , 2007 Springer-erlag erlin Heidelberg 2007

2 Consistency ased Attribute eduction 97 time cost We need a 10 h0 matrix if there are 10 4 samples What s more, it is also time-consuming to search reducts from the matrix with oolean operations With the dependency function, a heuristic search algorithm was constructed [1, 6, 7, 8, 16] There are some problems in dependency based attribute reduction The dependency function in rough set approaches is the ratio of sizes of the positive region over the sample space The positive region is the sample set which can be undoubtedly classified into a certain class according to the existing attributes rom the definition of the dependency function, we can find that it ignores the influence of boundary samples, which maybe belong to more than one class However, in classification learning, the boundary samples also exert an influence on the learned results or example, in learning decision trees with CAT or C45 learning, the samples in leaf nodes sometimes belong to more than one class [2, 10] n this case, the nodes are labeled with the class with majority of samples However, the dependency function does not take this kind of samples into account What s more, there is another risk in using the dependency function in greedy feature subset search algorithms n a forward greedy search, we usually start with an empty set of attribute, and then we add the selected features into the reduct one by one n the first round, we need to compute the dependency of each single attribute, and select the attribute with the greatest dependency value We find that the greatest dependency of a single attribute is zero in some applications because we can not classify any of the samples beyond dispute with any of the candidate features Therefore, according to the criterion that the dependency function should be greater than zero, none of the attributes can be selected Then the feature selection algorithm can find nothing However, some combinations of the attributes are able to distinguish any of the samples although a single one cannot distinguish any of them As much as we know, there is no research reporting on this issue so far These issues essentially result from the same problem of the dependency function t completely neglects the boundary samples n this paper, we will introduce a new function, proposed by ash and iu [3], called consistency, to evaluate the significance of attributes We discuss the relationship between dependency and consistency, and employ the consistency function to construct greedy search attribute reduction algorithm The main difference between the two functions is in considering the boundary samples Consistency not only computes the positive region, but also the samples of the majority class in boundary regions Therefore, even if the positive region is empty, we can still compare the distinguishing power of the features according to the sample distribution in boundary regions Consistency is the ratio of consistent samples; hence it is linear with the size of consistent samples Therefore it is easy to specify a stopping criterion in a consistency-based algorithm With numerical experiments, we will show the specification is necessary for real-world applications n the next section, we review the basic concepts on rough sets We then present the definition and properties of the consistency function, compare the dependency function with consistency, and construct consistency based attribute reduction in section 3We present the results of experiments in section 4 inally, the conclusions are presented in section 5

3 98 Hu et al 2 asic Concepts on ough Sets ough set theory, which was introduced to deal with imperfect and vague concepts, has Wtracted a lot of attention from theory and application research areas ata sets are usually given as the form of tables, we call a data table as an information system, formulated as S =< U, A,, f >, where U = { x1, x2, xn} is a set of finite and nonempty objects, called the universe, A is the set of attributes characterizing the objects, is the domain of attribute value and f is the information function f : U A f the attribute set is divided into condition attribute set C and decision attribute set, the information system is also called a decision table With arbitrary attribute subset A, there is an indiscernibility relation N () : N ( ) = { < x, y > U U a, a( x) = a( y)} < x, y > N( ) means objects x and y are indiscernible with respect to attribute set Obviously, indiscernibility relation is an equivalent relation, which satisfies the properties of reflexivity, symmetry and transitivity The equivalenw class induced by the attributes is denoted by [ xi ] = { x < xi, x > N( ), y U} Equivalent classes generated by are also called -elemental granules, - information granules The set of elemental granules forms a concept system, which is used to characterize the imperfect concepts in the information system Given an arbitrary concept X in the information system, two unions of elemental granules are associated with X = {[ x] [ x] X, x U}, X = {[ x] [ x] X, x U} The concept X is approximated with the two sets of elemental granules X and X are called lower and upper approximations of X in terms of attributes X is also called the positive region X is a definable if X = X, which means the concept X can be perfectly characterized with the knowledge, otherwise, X is indefinable An indefinable set is called a rough set N( X ) = X X is called the boundary of the approximations As a definable set, the boundary is empty Given <U, C,,,f>, C and will generates two partitions of the universe Machine learning is usually involved in using condition knowledge to approximate the decision and finding the mapping from the conditions to decisions Approximating U / with U / C, the positive and boundary regions are defined as: POS C ( ) = U CX, N ( ) = U CX U CX X U / C X U / X U / The boundary region is the set of elemental granules which can not be perfectly described by the knowledge C, while the positive region is the set of C-elemental granules which completely belong to one of the decision concepts The size of positive or boundary regions reflects the approximation power of the condition

4 Consistency ased Attribute eduction 99 attributes Given a decision table, for any C, it is said the decision attribute set depends on the condition attributes with the degree k, denoted by, where POS ( ) k = γ ( ) = U The dependency function k measures the approximation power of a condition attribute set with respect to the decision n data mining, especially in feature selection, it is important to find the dependence relations between attribute sets and to find a concise and efficient representation of the data Given a decision table T =< U, C U,, f >, if P C, we have γ ( ) γ ( ) Given a decision table T =< U, C U,, f >, C, a, we say that the condition attribute a is indispensable if γ ( a) ( ) < γ ( ), otherwise we say a is redundant We say C is independent if any a in is indispensable Attribute subset is a reduct of the decision table if 1) γ ( ) = γ C ( ) ; 2) a : γ ( ) > γ ( ) a A reduct of a decision table is the attribute subset which keeps the approximating capability of all the condition attributes n the meantime it has no redundant attribute The term of reduct presents a concise and complete ways to define the objective of feature selection and attribute reduction P k 3 Consistency ased Attribute eduction A binary classification problem in discrete spaces is shown in igure 1, where the samples are divided into a finite set of equivalence classes { E1, E2,, EK } based on their feature values The samples with the same feature values are grouped into one equivalence class We find that some of the equivalence classes are pure as their samples belong to one of decision classes, but there also are some inconsistent equivalence classes, such as E 3 and E 4 in figure1 According to rough set theory, they are named as decision boundary region, and the set of consistent equivalence classes is named as decision positive region The objective of feature selection is to find a feature subset which minimizes the inconsistent region, in either discrete or numerical cases, accordingly, minimizes ayesian decision error t is therefore desirable to have a measure to reflect the size of inconsistent region for discrete and numerical spaces for feature selection ependency reflects the ratio of consistent samples over the whole set of samples Therefore dependency doesn t take the boundary samples into account in computing significance of attributes Once there are inconsistent samples in an equivalence class, these equivalence classes are just ignored However, inconsistent samples can be divided into two groups: a subset of samples under the majority class and a subset under the minority classes According

5 100 Hu et al p ( E1 ω1 ) p ( E 2 ω 1 ) p ( E 3 ω 2 ) p( E 4 ω 2 ) p( E5 ω2) p( E6 ω2 ) p ( E1 ω1 ) p ( E 2 ω 1 ) p ( E 3 ω 2 ) p( E 4 ω 2 ) p( E5 ω2) p( E6 ω2) E1 E2 E3 E4 E5 E6 p E3 ω ) p E4 ω ) ( 1 ( 1 E1 E2 E3 E4 E5 E6 p E3 ω ) p E4 ω ) ( 1 (1) (2) ig 1 Classification complexity in a discrete feature space ( 1 to ayesian rule, only the samples under the minority classes are misclassified ox example, the samples in E 3 and E 4 are inconsistent in figure 1 ut only the samples labeled with P ( E 3 ω2 ) and P ( E 4 ω1) are misclassified The classification power in this case can be given by f = 1 [ P( E3 ω2 ) P( E3 ) P( E4 ω1) P( E4 )] ependency can not reflect the true classification complexity n the discrete cases, we can see from comparison of figure 1 (1) and (2) although the probabilities of inconsistent samples are identical, the probabilities of misclassification are different ependency function in rough sets can not reflect this difference n [3], ash and iu introduced the consistency function which can measure the difference Now we present the basic definition on consistency Consistency measure is defined by inconsistency rate, computed as follows efinition 1 A pattern is considered to be inconsistent if there are at least two objects such that they match the whole condition attribute set but are with different decision label efinition 2 The inconsistency count ξ i for a pattern p i of feature subset is the number of times it appears in the data minus the largest number among different class labels efinition 3 The inconsistency rate of a feature subset is the sum, ξ i, of all the inconsistency counts over all patterns of the feature subset that appears in data divided by U, the size of all samples, namely ξ i / U Correspondingly, consistency is computed as = ( U ξi )/ U δ ased on the above analysis, we can understand that dependency is the ratio of samples undoubtedly correctly classified, and consistency is the ratio of samples probably correctly classified There are two kinds of samples in POS ( ) U M POS () is the set of consistent samples, while M is the set of the samples with the largest number among different class labels in the boundary region n the paper, we will call M pseudoconsistent samples

6 Consistency ased Attribute eduction 101 Property 1: Given a decision table <U, C,, f>, C, we have 0 δ ( ) 1, γ ( ) δ ( ) Property 2 (monotonicity): Given a decision table <U, C,, f>, if we have δ ) δ ( ) 1( 2 Property 3: Given a decision table <U, C,, f>, if and only if namely, the table is consistent, we have δ ( ) = γ ( ) = 1 C C 1 2, U / C U /, efinition 4 Given a decision table T =< U, C U,, f >, C, a, we say condition attribute a is indispensable in if δ ( a) ( ) < δ ( ), otherwise; we say a is redundant We say C is independent if any attribute a in is indispensable δ () reflects not only the size of positive regions, but also the distribution of boundary samples The attribute is said to be redundant if the consistency doesn t decrease when we delete it Here the term redundant has two meanings The first one is relevant but redundant, the same as the meaning in general literatures [6, 7, 8, 14, 16, 17] The second meaning is irrelevant So consistency can detect the two kinds of superfluous attributes [3] efinition 5 Attribute subset is a consistency-based reduct of the decision table if (1) δ ( ) = δ C ( ) ; (2) a : δ ( ) > δ ( ) a n this definition, the first term guarantees the reduct has the same distinguishing ability as the whole set of features; the second one guarantees that all of the attributes in the reduct are indispensable Therefore, there is not any superfluous attribute in the reduct inding the optimal subset of features is a NP-hard problem We require evaluating N 2 1 combinations of features for find the optimal subset if there are N features in the decision table Considering computational complexity, here we construct a forward greedy search algorithm based on the consistency function We start with an empty set of attribute, and add one attribute into the reduct in a round The selected attribute should make the increment of consistency maximal Knowing attribute subset, we evaluate the significance of an attribute a as SG( a,, ) = δ U ( ) δ ( ) SG ( a,, ) is the increment of consistency by introducing the new attribute a in the condition of The measure is linear with the size of the new consistent and pseudo-consistent samples ormally, a forward greedy reduction algorithm based on consistency can be formulated as follows a

7 102 Hu et al Algorithm: Greedy eduction Algorithm based on Consistency nput: ecision table < U, C U d, f > Output: One reduct red Step 1: red ; // red is the pool to contain the selected attributes Step 2: or each a i A red Compute SG( a, red, ) δ ( ) δ ( ) end Step 3: select the attribute SG i = redu a i red ak which satisfies: ( a, red, ) max( SG( a, red, k = Step 4: f SG( a k, red, ) > 0, red U ak red go to step2 else return red Step 5: end i n the first round, we start with an empty set, then specify δ ( ) = 0 n this algorithm, we generate attribute subsets with a semi-exhaustive search Namely, we evaluate all of the rest attributes in each round with the consistency function, and select the feature producing the maximal significance The algorithm stops when adding any of the rest attributes will not bring increment of consistency value n realworld applications, we can stop the algorithm if the increment of consistency is less than a given threshold to avoiding the over-fitting problem n section 4, we will discuss this problem in detail The output of the algorithm is a reduced decision table The irrelevant, relevant and redundant attributes are deleted from the system The output results will be validated with two popular learning algorithms: CAT and SM, in section 4 y employing a hashing mechanism, we can compute the inconsistency rate approximately with a time complexity of O ( U ) [3] n the worst case the whole computational complexity of the algorithm can be computed as U C + U ( C 1) + + U = ( C + 1) C U / 2 i )) 4 Experimental Analysis There are two main objectives to conduct the experiments irst, we compare the proposed method with dependency based algorithm Second, we study the classification performance of the attributes selected with the proposed algorithm, n particular, how the classification accuracy varies with adding a new feature This can tell us where the algorithm should be stopped We download data sets from UC epository of machine learning databases The data sets are described in table 1 There are some numerical attributes in the data sets Here we employ four discretization techniques to transform the numerical data into

8 Consistency ased Attribute eduction 103 Table 1 ata description ata set Abbreviation Samples eatures Classes Australian Credit Approval Crd Ecoli Ecoli Heart disease Heart onosphere ono Sonar, Mines vs ocks Sonar Wisconsin iagnostic reast Cancer WC Wisconsin Prognostic reast Cancer WPC Wine recognition Wine categorical one: equal-width, equal-frequency, CM and entropy Then we conduct the dependency based algorithm [8] and the proposed one on the discretized data sets The numbers of the selected features are presented in table 2 rom table 2, we can find there is a great problem with dependency based algorithm, where, P stands for dependency based algorithm, and C stands for consistency based algorithm The algorithm selects two few feature for classification learning as to some data sets As to the discretized data with Equal-width method, the dependency based algorithm only selects one attribute, while the consistency one selects 7 attributes As to Equal-frequency method, the dependency based algorithm selects nothing for data sets Heart, Sonar and WPC The similar case occurs to Entropy and CM based discretization methods Obviously, the results are unacceptable if a feature selection algorithm cannot find anything y contrast, the consistency based attribute reduction algorithm finds feature subsets with moderate sizes for all of the data sets What s more, the sizes of selected features with the two algorithms are comparable if the dependency algorithm works well Why does the dependency based algorithm find nothing for some data sets? As we know, dependency just reflects the ratio of positive regions The forward greedy algorithm starts off with an empty set and adds, in turn, one of the best attributes into the pool at a time, those attributes that result in the greatest increase in the dependency function, until this produces its maximum possible value for the data set n the first turn, we need to evaluate each single attribute or some data sets, the dependency is zero for each single attribute Therefore, no attribute can be added into the pool in the first turn Then the algorithm stops here Sometimes, the algorithm can Table 2 The numbers of selected features with different methods aw Equal-width Equal-frequency Entropy CM data P C P C P C P C Crd Ecoli Heart ono Sonar WC WPC Wine Aver

9 104 Hu et al also stop in the second turn or the third turn However, the selected features are not enough for classification learning Consistency can overcome this problem as it can reflect the change in distribution of boundary samples Now we use the selected data to train classifiers with CAT and SM learning algorithms We test the classification power of the selected data with 10-fold cross validation The average classification accuracies with CAT and SM are presented in tables 3 and 4, respectively rom table 3, we can find most of the reduced data can keep, even improve the classification power if the numbers of selected attributes are appropriate although most of the candidate features are deleted from the data t shows that most of the features in the data sets are irrelevant or redundant for training decision trees; thereby, it should be deleted However, the classification performance will greatly decrease if the data are excessively reduced, such as iono in the equalwidth case and ecoli in the entropy and CM cases Table 3 Classification accuracy with 10-fold cross validation (CAT) aw Equal-width Equal-frequency Entropy CM data P C P C P C P C Crd Ecoli Heart ono Sonar WC WPC Wine Aver We can also find from table 4 that most of classification accuracies of reduced data decrease a little compared with the original data Correspondingly, the average classification accuracies for all of the four discretization algorithms are a little lower than the original data This shows that both dependency and consistency based feature selection algorithms are not fit for SM learning because both dependency and consistency compute the distinguishing power in discrete spaces Table 5 shows the selected features based on consistency algorithm and the corresponding turns being selected for parts of the data, where we use the CM discretized data sets The trends of consistency and classification accuracies with Table 4 Classification accuracy with 10-fold cross validation (SM) aw Equal-width Equal-frequency Entropy CM data P C P C P C Positive C Crd Ecoli Heart ono Sonar WC WPC Wine Aver

10 Consistency ased Attribute eduction 105 Table 5 The selected eatures with method CM + Consistency 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Heart ono Sonar WC WPC W O [ H G, W O [ H G, 1XPEHUHWXUH 1XPEHUHWXUH (1)Heart (2) ono W O [ H G, 1XPEHUHWXUH W O [ H G, (3) Sonar (4) WC 1XPEHUHWXUH W O [ H G, 1XPEHUHWXUH (5) WPC &WH\ &$57XU\ 690XU\ ig 4 Trends of consistency, accuracies with CAT and SM CAT and SM are shown in figure 4 As to all of the five plots, the consistency monotonously increases with the number of selected attributes The maximal value of consistency is 1, which shows that the corresponding decision table is consistent With the selected attributes, all of the samples can be distinguished What s more, it is

11 106 Hu et al noticeable that the consistency rapidly rises at the beginning; and then slowly increases, until stops at 1 t means that the majority of samples can be distinguished with a few features, while the rest of the selected features are introduced to discern several samples This maybe leads to the over-fitting problem Therefore the algorithm should be ceased earlier or we need a pruning algorithm to delete the over-fitting features The classification accuracy curves also show this problem n figure 4, the accuracies with CAT and SM rise at first, arrive at a peak, then keep unchanged, or even decrease n terms of classification learning, it shows the features after the peak are useless They sometimes even deteriorate learning performance Here we can take two measures to overcome the problem The first one is to stop the algorithm when the increment of consistency is less than a given threshold The second one is to employ some learning algorithm to validate the selected features, and delete the features after the accuracy peak However, sometimes the first one, called prepruning method, is not feasible because we usually cannot exactly predict where the algorithm should stop The latter, called post-pruning, is widely employed n this work, cross validation are introduced to test the selected features Table 6 shows the numbers of selected features and corresponding classification accuracies We can find that the classification performance improves in most of the cases At the same time, the selected features with consistency are further reduced Especially for data sets Heart and ono, the improvement is high to 10% and 18% with CAT algorithm Table 6 Comparison of features and classification performance with post-pruning aw data CAT SM features CAT SM features Accuracy features Accuracy Heart ono Sonar WC WPC Conclusions n this paper, we introduce consistency function to overcome the problems in dependency based algorithms We discuss the relationship between dependency and consistency, and analyze the properties of consistency With the measure, the redundancy and reduct are redefined We construct a forward greedy attribute reduction algorithm based on consistency The numerical experiments show the proposed method is effective Some conclusions are shown as follows Compared with dependency, consistency can reflect not only the size of decision positive region, but also the sample distribution in boundary region Therefore, the consistency measure is able to describe the distinguishing power of an attribute set more finely than the dependency function Consistency is monotonous The consistency value increases or keeps when a new attribute is added into the attribute set What s more, some attributes are introduced into the reduct just for distinguishing a few samples f we keep these attributes in the final result, the attributes maybe overfit the data Therefore, a pruning technique is

12 Consistency ased Attribute eduction 107 required We use 10-fold cross validation to test the results in the experiments and find more effective and efficient feature subsets eference 1 hatt, Gopal M: On fuzzy-rough sets approach to feature selection Pattern ecognition etters 26 (2005) reiman, reidman J, Olshen, Stone C: Classification and regression trees California: Wadsworth nternational ash M, iu H: Consistency-based search in feature selection Artificial ntelligence 151 (2003) Guyon, Weston J, arnhill S, et al: Gene selection for cancer classification using support vector machines Machine earning 46 (2002) Guyon, Elisseeff A: An introduction to variable and feature selection Journal of Machine earning esearch 3 (2003) Hu H, i X, Yu : Analysis on Classification Performance of ough Set ased educts Yang and G Webb (Eds): PCA 2006, NA 4099, pp , 2006 Springer-erlag erlin Heidelberg 7 Hu H, Yu, Xie Z X: nformation-preserving hybrid data reduction based on fuzzy-rough techniques Pattern ecognition etters 27 (2006) Jensen, Shen : Semantics-preserving dimensionality reduction: ough and fuzzyrough-based approaches EEE transactions of knowledge and data engineering 16 (2004) iu H, Yu : Toward integrating feature selection algorithms for classification and clustering EEE Transactions on knowledge and data engineering 17 (2005) uinlan J : nduction of decision trees Machine earning 1 (1986) Skowron A, auszer C: The iscernibility Matrices and unctions in nformation Systems ntelligent ecision Support-Handbook of Applications and Advances of the ough Sets Theory, Slowinski (ed), 1991, pp Slezak : 2001 Approximate decision reducts Ph Thesis, Warsaw University 13 Ślezak : Approximate Entropy educts undamenta nformaticae 53 (2002) Swiniarski W, Skowron A: ough set methods in feature selection and recognition Pattern recognition letters 24 (2003) Xie Z X, Hu H, Yu : mproved feature selection algorithm based on SM and correlation ecture notes in computer science 3971(2006) Zhong N, ong,j, Ohsuga S: Using rough sets with heuristics for feature selection J ntelligent nformation Systems 16 (2001) Ziarko W: ariable precision rough sets model Journal of Computer and System Sciences 46 (1993) 39-59

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

On Reduct Construction Algorithms

On Reduct Construction Algorithms 1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Information Technology Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Sombut FOITHONG 1,*, Phaitoon SRINIL 1, Ouen PINNGERN 2 and Boonwat ATTACHOO 3 1 Faculty

More information

Fuzzy Entropy based feature selection for classification of hyperspectral data

Fuzzy Entropy based feature selection for classification of hyperspectral data Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering NIT Kurukshetra, 136119 mpce_pal@yahoo.co.uk Abstract: This paper proposes to use

More information

Minimal Test Cost Feature Selection with Positive Region Constraint

Minimal Test Cost Feature Selection with Positive Region Constraint Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding

More information

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY MAJDI MAFARJA 1,2, SALWANI ABDULLAH 1 1 Data Mining and Optimization Research Group (DMO), Center for Artificial Intelligence

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values Patrick G. Clark Department of Electrical Eng. and Computer Sci. University of Kansas Lawrence,

More information

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School

More information

Efficient Rule Set Generation using K-Map & Rough Set Theory (RST)

Efficient Rule Set Generation using K-Map & Rough Set Theory (RST) International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 6 Efficient Rule Set Generation using K-Map & Rough Set Theory (RST) Durgesh Srivastava 1, Shalini Batra

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

Survey on Rough Set Feature Selection Using Evolutionary Algorithm

Survey on Rough Set Feature Selection Using Evolutionary Algorithm Survey on Rough Set Feature Selection Using Evolutionary Algorithm M.Gayathri 1, Dr.C.Yamini 2 Research Scholar 1, Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women,

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction Fuzzy-Rough Sets for Descriptive Dimensionality Reduction Richard Jensen and Qiang Shen {richjens,qiangs}@dai.ed.ac.uk Centre for Intelligent Systems and their Applications Division of Informatics, The

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY ALGEBRAIC METHODS IN LOGIC AND IN COMPUTER SCIENCE BANACH CENTER PUBLICATIONS, VOLUME 28 INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES WARSZAWA 1993 ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING

More information

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Feature Selection Based on Relative Attribute Dependency: An Experimental Study Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez

More information

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and

More information

Rough Set Approaches to Rule Induction from Incomplete Data

Rough Set Approaches to Rule Induction from Incomplete Data Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4 9, 2004, vol. 2, 923 930 Rough

More information

SSV Criterion Based Discretization for Naive Bayes Classifiers

SSV Criterion Based Discretization for Naive Bayes Classifiers SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Feature Selection with Adjustable Criteria

Feature Selection with Adjustable Criteria Feature Selection with Adjustable Criteria J.T. Yao M. Zhang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: jtyao@cs.uregina.ca Abstract. We present a

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Finding Rough Set Reducts with SAT

Finding Rough Set Reducts with SAT Finding Rough Set Reducts with SAT Richard Jensen 1, Qiang Shen 1 and Andrew Tuson 2 {rkj,qqs}@aber.ac.uk 1 Department of Computer Science, The University of Wales, Aberystwyth 2 Department of Computing,

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System *

Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System * JORNAL OF INFORMATION SCIENCE AND ENGINEERING 32, 783-798 (2016) Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System * WENBIN QIAN 1, WENHAO SH 2 AND

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

Attribute Reduction using Forward Selection and Relative Reduct Algorithm Attribute Reduction using Forward Selection and Relative Reduct Algorithm P.Kalyani Associate Professor in Computer Science, SNR Sons College, Coimbatore, India. ABSTRACT Attribute reduction of an information

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

Fuzzy-Rough Feature Significance for Fuzzy Decision Trees

Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Richard Jensen and Qiang Shen Department of Computer Science, The University of Wales, Aberystwyth {rkj,qqs}@aber.ac.uk Abstract Crisp decision

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

A Decision-Theoretic Rough Set Model

A Decision-Theoretic Rough Set Model A Decision-Theoretic Rough Set Model Yiyu Yao and Jingtao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,jtyao}@cs.uregina.ca Special Thanks to Professor

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

h=[3,2,5,7], pos=[2,1], neg=[4,4]

h=[3,2,5,7], pos=[2,1], neg=[4,4] 2D1431 Machine Learning Lab 1: Concept Learning & Decision Trees Frank Hoffmann e-mail: hoffmann@nada.kth.se November 8, 2002 1 Introduction You have to prepare the solutions to the lab assignments prior

More information

A study on lower interval probability function based decision theoretic rough set models

A study on lower interval probability function based decision theoretic rough set models Annals of Fuzzy Mathematics and Informatics Volume 12, No. 3, (September 2016), pp. 373 386 ISSN: 2093 9310 (print version) ISSN: 2287 6235 (electronic version) http://www.afmi.or.kr @FMI c Kyung Moon

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set

An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set Chinese Journal of Electronics Vol.24, No.1, Jan. 2015 An Incremental Algorithm to Feature Selection in Decision Systems with the Variation of Feature Set QIAN Wenbin 1,2, SHU Wenhao 3, YANG Bingru 2 and

More information

Rough Sets, Neighborhood Systems, and Granular Computing

Rough Sets, Neighborhood Systems, and Granular Computing Rough Sets, Neighborhood Systems, and Granular Computing Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract Granulation

More information

A Divide-and-Conquer Discretization Algorithm

A Divide-and-Conquer Discretization Algorithm A Divide-and-Conquer Discretization Algorithm Fan Min, Lijun Xie, Qihe Liu, and Hongbin Cai College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

Forward Feature Selection Using Residual Mutual Information

Forward Feature Selection Using Residual Mutual Information Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute

More information

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM Pavel Jirava Institute of System Engineering and Informatics Faculty of Economics and Administration, University of Pardubice Abstract: This article

More information

ECE 285 Class Project Report

ECE 285 Class Project Report ECE 285 Class Project Report Based on Source localization in an ocean waveguide using supervised machine learning Yiwen Gong ( yig122@eng.ucsd.edu), Yu Chai( yuc385@eng.ucsd.edu ), Yifeng Bu( ybu@eng.ucsd.edu

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE sbsridevi89@gmail.com 287 ABSTRACT Fingerprint identification is the most prominent method of biometric

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954

More information

Data Mining & Feature Selection

Data Mining & Feature Selection دااگشنه رتبيت م عل م Data Mining & Feature Selection M.M. Pedram pedram@tmu.ac.ir Faculty of Engineering, Tarbiat Moallem University The 11 th Iranian Confernce on Fuzzy systems, 5-7 July, 2011 Contents

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Handling Missing Attribute Values in Preterm Birth Data Sets

Handling Missing Attribute Values in Preterm Birth Data Sets Handling Missing Attribute Values in Preterm Birth Data Sets Jerzy W. Grzymala-Busse 1, Linda K. Goodwin 2, Witold J. Grzymala-Busse 3, and Xinqun Zheng 4 1 Department of Electrical Engineering and Computer

More information

Some questions of consensus building using co-association

Some questions of consensus building using co-association Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper

More information

A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data

A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data Jerzy W. Grzymala-Busse 1, Witold J. Grzymala-Busse 2, and Linda K. Goodwin 3 1 Department of Electrical Engineering and Computer

More information

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM Lu Chen and Yuan Hang PERFORMANCE DEGRADATION ASSESSMENT AND FAULT DIAGNOSIS OF BEARING BASED ON EMD AND PCA-SOM.

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Wrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Basic Concepts, Decision Trees, and Model Evaluation Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Tomohiro Tanno, Kazumasa Horie, Jun Izawa, and Masahiko Morita University

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search Jianli Ding, Liyang Fu School of Computer Science and Technology Civil Aviation University of China

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?

More information

COMP61011 Foundations of Machine Learning. Feature Selection

COMP61011 Foundations of Machine Learning. Feature Selection OMP61011 Foundations of Machine Learning Feature Selection Pattern Recognition: The Early Days Only 200 papers in the world! I wish! Pattern Recognition: The Early Days Using eight very simple measurements

More information

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques M. Lazarescu 1,2, H. Bunke 1, and S. Venkatesh 2 1 Computer Science Department, University of Bern, Switzerland 2 School of

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Part I. Instructor: Wei Ding

Part I. Instructor: Wei Ding Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set

More information

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy

More information