Chapter 12 Feature Selection

Similar documents
Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Filter methods for feature selection. A comparative study

Discriminate Analysis

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

ISSN ICIRET-2014

MANY factors affect the success of data mining algorithms

arxiv: v2 [cs.ds] 2 Apr 2018

Correlation Based Feature Selection with Irrelevant Feature Removal

Feature Selection with Decision Tree Criterion

Machine Learning Feature Creation and Selection

Class dependent feature weighting and K-nearest neighbor classification

Feature selection. LING 572 Fei Xia

Machine Learning Techniques for Data Mining

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Slides for Data Mining by I. H. Witten and E. Frank

Fuzzy Entropy based feature selection for classification of hyperspectral data

Variable Selection 6.783, Biomedical Decision Support

Distribution-free Predictive Approaches

Feature Selection in Knowledge Discovery

A Survey on Clustered Feature Selection Algorithms for High Dimensional Data

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Features: representation, normalization, selection. Chapter e-9

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

Feature-weighted k-nearest Neighbor Classifier

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

Data Preprocessing. Supervised Learning

Feature selection with iterative feature weighting methods

Data Preprocessing. Data Preprocessing

Classification. Slide sources:

Data Mining and Data Warehousing Classification-Lazy Learners

Theoretical and Empirical Analysis of ReliefF and RReliefF

PARALLEL CLASSIFICATION ALGORITHMS

Regularization and model selection

COMP61011 Foundations of Machine Learning. Feature Selection

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Improving Feature Selection Techniques for Machine Learning

CS Machine Learning

Dimensionality Reduction, including by Feature Selection.

Strangeness Based Feature Selection for Part Based Recognition

Keywords Soil classification; Neural networks; fuzzy logic; fuzzy association rule mining; Relief-F

Redundancy Based Feature Selection for Microarray Data

Information Driven Healthcare:

International Journal of Scientific & Engineering Research, Volume 5, Issue 7, July ISSN

Scalable Feature Selection for Multi-class Problems

CS 229 Midterm Review

CHAPTER 4: CLUSTER ANALYSIS

Theoretical and Empirical Analysis of ReliefF and RReliefF

INF 4300 Classification III Anne Solberg The agenda today:

Lecture outline. Decision-tree classification

Feature Subset Selection using Clusters & Informed Search. Team 3

Data Mining - Motivation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Searching for Interacting Features

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

SGN (4 cr) Chapter 10

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Nearest Neighbor Predictors

Classification/Regression Trees and Random Forests

Efficiently Handling Feature Redundancy in High-Dimensional Data

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Feature Subset Selection Algorithms for Irrelevant Removal Using Minimum Spanning Tree Construction

Preprocessing and Feature Selection DWML, /16

A Survey of Distance Metrics for Nominal Attributes

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Genetic Algorithms for Classification and Feature Extraction

Data Mining and Knowledge Discovery: Practice Notes

3 Feature Selection & Feature Extraction

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

MUSI-6201 Computational Music Analysis

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Feature Selection and Classification for Small Gene Sets

Chapter 2: Classification & Prediction

AN ENSEMBLE OF FILTERS AND WRAPPERS FOR MICROARRAY DATA CLASSIFICATION

Data Engineering. Data preprocessing and transformation

INFORMATION GAIN FEATURE SELECTION BASED ON FEATURE INTERACTIONS

Experimental Data and Training

Modified-MCA Based Feature Selection Model for Preprocessing Step of Classification

Chapter 2 Basic Structure of High-Dimensional Spaces

Data Mining and Knowledge Discovery: Practice Notes

CSE4334/5334 DATA MINING

Feature Selection Using Principal Feature Analysis

Statistical Pattern Recognition

CS570: Introduction to Data Mining

Recommender Systems New Approaches with Netflix Dataset

A Survey on Pre-processing and Post-processing Techniques in Data Mining

FEATURE SELECTION TECHNIQUES

CSC411/2515 Tutorial: K-NN and Decision Tree

Information Fusion Dr. B. K. Panigrahi

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

A Novel Clustering based Feature Subset Selection Framework for Effective Data Classification

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

ARE you? CS229 Final Project. Jim Hefner & Roddy Lindsay

Transcription:

Chapter 12 Feature Selection Xiaogang Su Department of Statistics University of Central Florida - 1 -

Outline Why Feature Selection? Categorization of Feature Selection Methods Filter Methods Wrapper Methods Embedded RELIEF - 2 -

Why Feature Selection? It is cheaper to measure less variables. The resulting classifier is simpler and potentially faster. Prediction accuracy may improve by discarding irrelevant variables. Identifying relevant variables gives more insight into the nature of the corresponding classification problem. Alleviate the curse of dimensionality. - 3 -

Categorization of Feature Selection Methods Wrapper: Feature selection takes into account the contribution to the performance of a given type of classifier. Filter: Feature selection is based on an evaluation criterion for quantifying how well feature (subsets) discriminate the two classes. Embedded: Feature selection is part of the training procedure of a classifier (e.g. decision trees). - 4 -

Embedded Methods Attempt to jointly or simultaneously train both a classifier and a feature subset. Often optimize an objective function that jointly rewards accuracy of classification and penalizes use of more features. Intuitively appealing. Example: tree-building algorithms - 5 -

Filter and Wrapper Methods - 6 -

Filter Methods Features are scored independently and the top S of them will be used by the classifier. Score: correlation, mutual information, t-statistic, F-statistic, p- value, tree variable importance ranking, etc. - 7 -

Several Filter Methods for Variable Screening Correlation matrix; VIF (Variance Inflation Factor) RELIEF (Kira and Rendell, 1992) ranks variables as per distance. FOCUS (Almuallim and Dietterich, 1991) searches for smallest subset that completely discriminates between target classes. In filter methods, variables are evaluated independently and not in context of a learning algorithm. - 8 -

Problems with Filter Methods Redundancy in selected features: features are considered independently and not measured on the basis of whether they contribute new information. Interactions among features generally cannot be explicitly incorporated (some filter methods are smarter than others). Classifier has no say in what features should be used: some scores may be more appropriates in conjunction with some classifiers than others. - 9 -

Wrapper Methods Done in an iterative manner. Many feature subsets are scored based on classification performance of a classifier and the best is used. Problems: Computationally expensive: for each feature subset to be considered, a classifier must be built and evaluated. No exhaustive search is possible (2 subsets to consider) : generally greedy algorithms only. Risk for overfitting. - 10 -

RELIEF Initialized by Kira K, Rendell L,10th Int. Conf. on AI,129-134, 1992. Having been extended, the first version of RELIEF handles only binary responses. Idea: Relevant features make (1) nearest examples of same class closer and (2) nearest examples of opposite classes more far apart. RELIEF assigns weights to variables based on how well they separate samples from their nearest neighbors from the same and from the opposite class. - 11 -

Normalize data first. RELIEF - Steps Set the relevance of every feature as zero For each example or observation in training set: Find nearest example or observation from same (hit) and opposite class (miss) Update weight of each feature by adding abs(example - miss) - abs(example - hit) - 12 -

- 13 -

RELIEF- Algorithm - 14 -

Function relief{dprep} relief(data, nosample, threshold,vnom) Arguments data the dataset for which feature selection will be carried out nosample number of instances drawn from the original dataset threshold the cutoff point to select the features vnom a vector containing the indexes of the nominal features References KIRA, K. and RENDEL, L. (1992). The Feature Selection Problem: Traditional Methods and a new algorithm. Proc. Tenth National Conference on Artificial Intelligence, MIT Press, 129-134. KONONENKO, I., SIMEC, E., and ROBNIK-SIKONJA, M. (1997). Overcoming the myopia of induction learning algorithms with RELIEFF. Applied Intelligence 7: 39-55. - 15 -