Feature Selection in Knowledge Discovery
|
|
- Gavin Jones
- 5 years ago
- Views:
Transcription
1 Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco Pais 1, Lisboa, Portugal November, 010 Knowledge discovery process Interpretation Modeling Knowledge Feature selection Patterns Preprocessing Data acquisition Target data Data Preprocessed data Reduced data Based on G. Piatetsky-Shapiro U. Fayyad and P. Smyth. From data mining to knowledge discovery in databases. Artificial Intelligence Magazine, 17(3):37-54,
2 Outline Motivation Why feature selection Basic definitions Ranking methods Feature subset selection Optimization methods: Tree search Ant feature selection 3 Why feature selection? Why even think about feature selection? The information about the target class is inherent in the variables! Naive theoretical view: More features More information More discrimination power. In practice: many reasons why this is not the case! Also: Optimization is (usually) good, so why not try to optimize the input-coding? 4
3 Practical problems Many explored domains have hundreds to tens of thousands of variables/features with many irrelevant and redundant ones! In domains with many features the underlying probability distribution can be very complex and very hard to estimate (e.g. dependencies between variables) Irrelevant and redundant features can confuse learners! Limited training data! Limited computational resources! Curse of dimensionality! 5 Practical problems The required number of samples (to achieve the same accuracy) grows exponentionally with the number of variables! In practice: number of training examples is fixed! the classifier s performance usually will degrade for a large number of features! In many cases the information that is lost by discarding variables is made up for by a more accurate mapping/sampling in the lower-dimensional space! 6 3
4 Real world example Gene selection from microarray data Variables: gene expression coefficients corresponding to the amount of mrna in a patient s sample (e.g. tissue biopsy) Task: Separate healthy patients from cancer patients Usually there are only about 100 examples (patients) available for training and testing (!!!) Number of variables in the raw data: Does this work? ([8]) [8] C. Ambroise, G.J. McLachlan: Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS Vol (00) 7 Feature selection What is feature selection? Remove features X(i) to improve (or least degrade) prediction of Y. Advantages: Feature selection specify the most relevant features Collect/process less features and data Less complex models run faster Models are easier to understand, verify and explain 8 4
5 Feature selection: definition Given a set of features F{ f1,..., fi,..., fn} the Feature Selection problem is to find a subset F' F that maximizes the learners ability to classify patterns. Formally F should maximize some scoring function : (where is the space of all possible feature subsets of F), i.e. F' arg m ax G G 9 Feature extraction - definition Given a set of features F{ f1,..., fi,..., fn} the Feature Extraction( Construction ) problem is is to map F to some feature set F'' that maximizes the learner s ability to classify patterns. (again F'' argmax G ) * G This general definition subsumes feature selection (i.e. a feature selection algorithm also performs a mapping but can only map to subsets of the input variables) * here is the set of all possible feature sets 10 5
6 Feature Selection Feature Selection: F F { f,..., f,..., f } { f,..., f,..., f } 1 i n f. selection i1 ij im Feature Extraction/Creation i 1,..., n ; j 1,..., m j i i ab; a, b 1,..., m a b F F { f,..., f,..., f } { g ( f,..., f ),..., g ( f,..., f ),..., g ( f,..., f )} 1 i n f. extraction 1 1 n j 1 n m 1 n 11 Feature selection optimality In theory the goal is to find an optimal feature-subset (one that maximizes the scoring function) In real world applications this is usually not possible For most problems it is computationally intractable to search the whole space of possible feature subsets One usually has to settle for approximations of the optimal subset Most of the research in this area is devoted to finding efficient search-heuristics 1 6
7 Relevance of features Relevance vs Optimality of Feature-Set Classifiers induced from training data are likely to be suboptimal (no access to the real distribution of the data) Relevance does not imply that the feature is in the optimal feature subset Even irrelevant features can improve a classifier s performance Defining relevance in terms of a given classifier (and therefore a hypothesis space) would be better. 13 Feature selection Filters Based on general characteristics of data to be evaluated. No model is involved. Wrappers Uses model Hybrid performance methods to evaluate feature subsets. Train one model for each feature subset. Embedded methods Do not retrain the model at every step. Search feature selection space and model parameter space simultaneously. 14 7
8 Filter methods R p Feature selection R s s << p Classifier design Features are scored independently and the top s are used by the classifier Score: correlation, mutual information, t-statistic, F-statistic, p- value, etc. Easy to interpret. Usually fast. Adapted from J. Fridlyand 15 Feature ranking Given a set of features F Variable Ranking is the process of ordering the features by the value of some scoring function S:F (which usually measures feature-relevance) Resulting set: a permutation of F: F ' { f,..., f,... f } ij1 i1 i j in S ( f ) S ( f ); j 1,..., n 1; i j with The score S(f i ) is computed from the training data, measuring some criteria of feature f i. By convention a high score is indicative for a valuable (relevant) feature. 16 8
9 Feature ranking feature selection A simple method for feature selection using variable ranking is to select the k highest ranked features according to S. This is usually not optimal But often preferable to other, more complicated methods Computationally efficient(!): only calculation and sorting of n scores 17 Ranking criteria Questions: Can variables with small score be automatically discarded? NO Can a useless variable (i.e. one with a small score) be useful together with others? Can two variables that are useless by themselves can be useful together?) YES YES 18 9
10 Ranking criteria Correlation between variables and target is not enough to assess relevance! Correlation / covariance between pairs of variables has to be considered too! (potentially difficult) Diversity of features which one to choose? 19 Problems with filter method Redundancy in selected features: features are considered independently and not measured on the basis of whether they contribute with new information; Interactions among features generally can not be explicitly incorporated (some filter methods are smarter than others); Classifier has no say in what features should be used: some scores may be more appropriate in conjunction with some classifiers than others; Sometimes used as a pre-processing step for other methods. Adapted from J. Fridlyand 0 10
11 Dimension reduction A variant on filter methods: Rather than retain a subset of s features, perform dimension reduction by projecting features onto s principal components of variation (e.g. PCA etc) Problem is that we are no longer dealing with one feature at a time but rather a linear or possibly more complicated combination of all features. It may be good enough for a black box but how does one build a diagnostic chip on a supergene? (even though we don t want to confuse the tasks) Those methods tend not to work better than simple filter methods. Adapted from J. Fridlyand 1 Wrapper methods R p Feature selection R s s << p Classifier design Iterative approach: many feature subsets are scored based on classification performance and best is used. Selection of subsets: forward selection, backward selection, forward-backward selection, ant colony optimization, genetic algorithms, particle swarm optimization, etc. By using the learner as a black box wrappers are universal and simple! Adapted from J. Fridlyand 11
12 Problems with wrapper methods Computationally expensive: for each feature subset to be considered, a classifier must be built and evaluated p No exhaustive search is possible ( subsets to consider) : generally greedy algorithms only. Easy to overfit. Adapted from J. Fridlyand 3 Validation Cross Validation 1 N samples Cross Validation N samples Leave One Out Train and test the featureselector and the classifier Count errors feature selection Leave One Out Train and test the classifier Count errors CV - can yield optimistic estimation of classification true error 4 1
13 Taxonomy of feature selection Saeys Y, InzaI,Larrañaga P. A review of feature selection techniques in bioinformatics Bioinformatics. 007 Oct 1;3(19): Tree search methods: Bottom-up 6 13
14 Tree search methods: Top-down 7 Tree search methods Advantages: Easy to use Reduce number of iterations Bottom-up achieves smaller number of features Disadvantages: Converge to local minima Computationally very heavy for more than about 50 features Metaheuristic methods global search 8 14
15 Artificial ants Artificial ants move in graphs nodes / arcs environment is discrete As real ants: choose paths based on pheromone concentration deposit pheromones on paths environment updates pheromones Extra abilities of artificial ants: prior knowledge (heuristic ) memory (feasible neighbourhood N 9 Proposed algorithm Multicriteria algorithm: Feature 1 Feature N Rank Features Update pheromone Ant colony for cardinality of features Ant system Update pheromone Ant colony for selection of features X test Test Modeling Y test Minimize number of features N cycles Cost Minimize classification error 30 15
16 Ant Feature Selection (AFS) Choose node p k ij ij ij, if j N ij ij j 0, otherwise x 3 x 1 x x 4 Pheromone update x 5 x 6 k ( l1) ()(1 l ) ij x 7 x n Subset: {x 3,x 6,x 7,x 1,x 4 } 31 Heuristics in AFS Heuristic for feature cardinality: Fisher s score for the features Fi () () i () i c c 1 c c 1 () i () i mean and variance values of feature i for the samples in class c 1 and c Heuristic for selection of features: classification error e(i) for the individual features () i f 1 ei () 3 16
17 Results: fuzzy models Classification rates with 10-fold cross validation: Data set Fuzzy Models Classification Accuracy Standard deviation Number of features No FS AFS No FS AFS No FS AFS 1 WBCO Wine Vote WDBC WPBC Sonar Musk Average WTL 0/0/7 0/1/ Comparison with state-of-the-art GAAR - genetic algorithm-based PSORSFS - particle swarm optimization algorithm-based GBML multi-objective fuzzy genetics-based machine learning MIFS - a classical filter method based on mutual information HGA - a hybrid genetic algorithm wrapper approach based on mutual information 34 17
18 Real world example MEDAN database Web: Variables: The MEDAN data base contains the data of 38 patients. The data were copied from intensive care unit records in the years by medical documentation staff. All patients have septic shock of abdominal cause. Task: Predict patients survival. Problems in the database 35 Sepsis patients database - MEDAN Patient Variable The matrix contains 387 patients and 59 variables
19 MEDAN - Problems Different time samples: 37 MEDAN - Problems Missing data : 38 19
20 MEDAN - Problems Stoped being measured: 39 MEDAN - Problems 40 0
21 Test example Problem definition: x rcos() t 1 x rsin() t r 0.99,1.01 yr 1 Features: F x1 x x1 x Output: y Test example Features: Output: y Correlation: F x1 x x1 x 0 1 x x 1 1 x x y x x x x y
22 Test example x 4 x x x x 3 x x x x x 1 x x 3 43 Test example 100 Fuzzy model performance (correct classification [%]) Subset cardinality All combinations test using fuzzy models Pareto solutions 44
23 Test example Pheromone concentration evolution Iteration Features Ant feature selection using fuzzy models (5 Ants, 0 Iterations). 45 Fuzzy Objective function Classic objective function minimize fwn 1 ewn f Fuzzy objective function Fuzzy decision D(x) = C 1 (x)... C n (x) Optimal decision maximize D( x) D( x) CN ( x) oc ( ) e N x f 46 3
24 Fuzzy criteria Classification error C Ne 47 Fuzzy criteria Feature cardinality C Nf 48 4
25 Results: fuzzy models Classification rates with 10-fold cross validation: Data set Fuzzy Models Classification Accuracy Standard deviation Number of features No FS AFS No FS AFS AFS FOF 1 WBCO Wine Vote WDBC WPBC Sonar Musk Average WTL 0/0/7 0/1/
ACO, NATURAL AGENTS APPLIED TO FEATURE SELECTION
Multi agent systems ACO, NATURAL AGENTS APPLIED TO FEATURE SELECTION Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Dept. of Mechanical Engineering, Center of Intelligent Systems/IDMEC
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationDimensionality Reduction, including by Feature Selection.
Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain
More informationReview of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.
Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More informationChapter 12 Feature Selection
Chapter 12 Feature Selection Xiaogang Su Department of Statistics University of Central Florida - 1 - Outline Why Feature Selection? Categorization of Feature Selection Methods Filter Methods Wrapper Methods
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationMTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen
MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 2: Feature selection Feature Selection feature selection (also called variable selection): choosing k < d important
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:
More informationFeature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262
Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationIntroduction to GE Microarray data analysis Practical Course MolBio 2012
Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical
More informationA Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
More informationTopics In Feature Selection
Topics In Feature Selection CSI 5388 Theme Presentation Joe Burpee 2005/2/16 Feature Selection (FS) aka Attribute Selection Witten and Frank book Section 7.1 Liu site http://athena.csee.umbc.edu/idm02/
More informationFEATURE SELECTION TECHNIQUES
CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,
More informationForward Feature Selection Using Residual Mutual Information
Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics
More informationNEURAL NETWORKS ... FEATURE SELECTION USING ANT COLONY OPTIMIZATION: APPLICATIONS IN HEALTH CARE. Motivation. Outline.
Motivation FEATURE SELECTION USING ANT COLONY OPTIMIZATION: APPLICATIONS IN HEALTH CARE João M. C. Sousa jmsousa@ist.utl.pt S. M. Vieira, S. N. Finkelstein 2,3, A. S. Fialho,2, F. Cismondi,2, S. R. Reti
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationClassification by Nearest Shrunken Centroids and Support Vector Machines
Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,
More informationA Naïve Soft Computing based Approach for Gene Expression Data Analysis
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2124 2128 International Conference on Modeling Optimization and Computing (ICMOC-2012) A Naïve Soft Computing based Approach for
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More informationInformation Fusion Dr. B. K. Panigrahi
Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature
More informationInformation Driven Healthcare:
Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017
CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationUniversity of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationInformation theory methods for feature selection
Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský
More informationCombination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset
International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran
More informationFeature-weighted k-nearest Neighbor Classifier
Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka
More informationFeature Selection for Image Retrieval and Object Recognition
Feature Selection for Image Retrieval and Object Recognition Nuno Vasconcelos et al. Statistical Visual Computing Lab ECE, UCSD Presented by Dashan Gao Scalable Discriminant Feature Selection for Image
More informationPre-requisite Material for Course Heuristics and Approximation Algorithms
Pre-requisite Material for Course Heuristics and Approximation Algorithms This document contains an overview of the basic concepts that are needed in preparation to participate in the course. In addition,
More informationClassification with PAM and Random Forest
5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class
More informationResampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016
Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation
More informationA Wrapper-Based Feature Selection for Analysis of Large Data Sets
Edith Cowan University Research Online ECU Publications Pre. 2011 2010 A Wrapper-Based Feature Selection for Analysis of Large Data Sets Jinsong Leng Edith Cowan University Craig Valli Edith Cowan University
More informationCT79 SOFT COMPUTING ALCCS-FEB 2014
Q.1 a. Define Union, Intersection and complement operations of Fuzzy sets. For fuzzy sets A and B Figure Fuzzy sets A & B The union of two fuzzy sets A and B is a fuzzy set C, written as C=AUB or C=A OR
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More informationCOMP61011 Foundations of Machine Learning. Feature Selection
OMP61011 Foundations of Machine Learning Feature Selection Pattern Recognition: The Early Days Only 200 papers in the world! I wish! Pattern Recognition: The Early Days Using eight very simple measurements
More informationData Mining - Motivation
Data Mining - Motivation "Computers have promised us a fountain of wisdom but delivered a flood of data." "It has been estimated that the amount of information in the world doubles every 20 months." (Frawley,
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationMachine Learning Feature Creation and Selection
Machine Learning Feature Creation and Selection Jeff Howbert Introduction to Machine Learning Winter 2012 1 Feature creation Well-conceived new features can sometimes capture the important information
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationSpecial Topic: Missing Values. Missing Can Mean Many Things. Missing Values Common in Real Data
Special Topic: Missing Values Missing Values Common in Real Data Pneumonia: 6.3% of attribute values are missing one attribute is missing in 61% of cases C-Section: only about 1/2% of attribute values
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationImproving Feature Selection Techniques for Machine Learning
Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 11-27-2007 Improving Feature Selection Techniques for Machine Learning Feng
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationSGN (4 cr) Chapter 10
SGN-41006 (4 cr) Chapter 10 Feature Selection and Extraction Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 18, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationWrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationMin-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection
Information Technology Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Sombut FOITHONG 1,*, Phaitoon SRINIL 1, Ouen PINNGERN 2 and Boonwat ATTACHOO 3 1 Faculty
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationFeature Selection Using Principal Feature Analysis
Feature Selection Using Principal Feature Analysis Ira Cohen Qi Tian Xiang Sean Zhou Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Urbana,
More informationA FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM
A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationAn Information-Theoretic Approach to the Prepruning of Classification Rules
An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from
More informationChapter 8 The C 4.5*stat algorithm
109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationFuzzy Clustering of Time-variant and invariant Features: Application to Sepsis Outcome Prediction
Fuzzy Clustering of Time-variant and invariant Features: Application to Sepsis Outcome Prediction Marta C. Ferreira* * Technical University of Lisbon, Instituto Superior Técnico, Dept. of Mechanical Engineering,
More informationInternational Journal of Current Trends in Engineering & Technology Volume: 02, Issue: 01 (JAN-FAB 2016)
Survey on Ant Colony Optimization Shweta Teckchandani, Prof. Kailash Patidar, Prof. Gajendra Singh Sri Satya Sai Institute of Science & Technology, Sehore Madhya Pradesh, India Abstract Although ant is
More informationStatistical dependence measure for feature selection in microarray datasets
Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department
More informationMostafa Salama Abdel-hady
By Mostafa Salama Abdel-hady British University in Egypt Supervised by Professor Aly A. Fahmy Cairo university Professor Aboul Ellah Hassanien Introduction Motivation Problem definition Data mining scheme
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationGenetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland
Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming
More informationFlexible-Hybrid Sequential Floating Search in Statistical Feature Selection
Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and
More informationAn Effective Feature Selection Approach Using the Hybrid Filter Wrapper
, pp. 119-128 http://dx.doi.org/10.14257/ijhit.2016.9.1.11 An Effective Feature Selection Approach Using the Hybrid Filter Wrapper Haitao Wang 1 and Shufen Liu 2 1 School of Computer Science and Technology
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation
More informationSurvey on Rough Set Feature Selection Using Evolutionary Algorithm
Survey on Rough Set Feature Selection Using Evolutionary Algorithm M.Gayathri 1, Dr.C.Yamini 2 Research Scholar 1, Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women,
More information