Towards robust feature selection for high-dimensional, small sample settings
|
|
- Sharon Harper
- 6 years ago
- Views:
Transcription
1 Towards robust feature selection for high-dimensional, small sample settings Yvan Saeys Bioinformatics and Evolutionary Genomics, Ghent University, Belgium Marseille, January 14th, 2010
2 Background: biomarker discovery Common task in computational biology Find the entities that best explain phenotypic differences Challenges: Many possible biomarkers (high dimensionality) Only very few biomarkers are important for the specific phenotypic difference Very few samples Examples: Microarray data Mass spectrometry data SNP data Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
3 Dimensionality reduction techniques Dimensionality reduction techniques Feature selection techniques Feature transformation techniques Subset selection Feature ranking Feature weighting Projection PCA LDA Compression Fourier transform Wavelet transform Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
4 Dimensionality reduction techniques Dimensionality reduction techniques Feature selection techniques Feature transformation techniques Subset selection Feature ranking Feature weighting Projection PCA LDA Compression Fourier transform Wavelet transform Preserve the original semantics! Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
5 Casting the problem as a feature selection task Feature selection is a way to avoid the curse of dimensionality Improve model performance Classification: improve classification performance (maximize accuracy, AUC) Clustering: improve cluster detection (AIC, BIC, sum of squares, various indices) Regression: improve fit (sum of squares error) Faster and more cost-effective models Improve generalization performance (avoiding overfitting) Gain deeper insight into the processes that generated the data (esp. important in Bioinformatics) Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
6 The need for robust marker selection algorithms Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
7 The need for robust marker selection algorithms Ranked gene list: gene A gene B gene C gene D gene E Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
8 The need for robust marker selection algorithms Ranked gene list: gene A gene B gene C gene D gene E Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
9 The need for robust marker selection algorithms Ranked gene list: gene A gene B gene C gene D gene E Ranked gene list: gene X gene A gene W gene Y gene C Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
10 The need for robust marker selection algorithms Motivation Highly variable marker ranking algorithms decrease the confidence of a domain expert Need to quantify the stability of a ranking algorithm Use this as an additional criterion next to the predictive power More robust rankings yield a higher chance of representing biologically relevant markers Focus on quantifying/increasing marker stability within one data source Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
11 Formalizing feature selection robustness Definition Consider a dataset D = {x 1,..., x M }, x i = (xi 1,... xi N ) with M instances and N features. A feature selection algorithm can then be defined as a mapping F : D f from D to an N-dimensional vector f = (f 1,..., f N ), 1 weighting: f i = w i denotes the weight of feature i 2 ranking: f i {1, 2,..., N} denotes the rank of feature i 3 subset selection: f i = 0/1 denotes the exclusion/inclusion of feature i in the selected subset Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
12 Formalizing feature selection robustness Research questions: 1 How stable are current feature selection techniques for high dimensional, small sample settings? Analyze sensitivity of robustness to signature size, model parameters. 2 Can we increase the robustness of feature selection in this setting? Definition A feature selection algorithm is stable if small variations in the input [training data] result in small variations in the output [selected features]: F is stable iff for D D, it follows that S(f, f ) < ɛ Methodological requirements: 1 Framework to generate small changes in training data 2 Similarity measures for feature weightings/rankings/subsets Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
13 Generating training set variations A subsampling approach: Draw k subsamples of size xm (0 < x < 1) randomly without replacement from D, where the parameters k and x can be varied. In our experiments: k=500 x=0.9 Algorithm 1 Generate k subsamples of size xm, {D 1,..., D k } 2 Perform the basic feature selector F on each of these k subsamples k : F(D k ) = f k 3 Perform all k(k 1) 2 pairwise comparisons, and average over them Stab(F) = 2 k i=1 k j=i+1 S(f i, f j ) k(k 1) where S(.,.) denotes an appropriate similarity function between weightings/rankings/subsets Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
14 Similarity measures for feature selection outputs 1 Weighting (Pearson CC): S(f i, f j ) = l (fl i µ fi )(f l j µ fj ) l (fl i µ fi ) 2 l (fl j µ fj ) 2 2 Ranking (Spearman rank CC): S(f i, f j ) = 1 6 l (f l i f l j )2 N(N 2 1) 3 Subset selection (Jaccard index): S(f i, f j ) = f i f j f i f j = l I(fl i = f l j = 1)) l I(fl i + f l j > 0) Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
15 Kuncheva s index for comparing feature subsets Definition Let A and B be subsets of features, both of the same cardinality s. Let r = A B Requirements for a desirable stability index for feature subsets: 1 Monotonicity: for a fixed subset size s, and number of features N, the larger the intersection between the subsets, the higher the value of the consistency index. 2 Limits: index should be bound by constants that do not depend on N or s. Maximum should be attained when the subsets are identical: r = s 3 Correction for chance: index should have a constant value for independently drawn subsets of the same cardinality s. Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
16 Kuncheva s index for comparing feature subsets General form of the index: Observed r Expected r Maximum r Expected r For randomly drawn A and B,the number of objects from A selected also in B is a random variable Y with hypergeometric distribution with probability mass function P(Y = r) = ( s r ) ( ) N s s r ( N s ) The expected value of Y for given s and N is s2 N KI(A, B) = r s 2 N s s2 N KI is bound by 1 KI 1 [Kuncheva (2007)] = Thus define rn s2 s(n s) Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
17 Improving feature selection robustness Methodology based on ensemble methods for classification. Can we transfer this to feature selection? Previous work Use feature selection to construct an ensemble Works of Cherkauer, Opitz, Tsymbal and Cunningham Feature selection ensemble This work Use ensemble methods to perform feature selection Feature selection ensemble Research questions: Can we improve feature selection robustness/stability using ensembles of feature selectors? Statistical, computational and representational aspects of ensemble learning transferable to feature selection? How does it affect classification performance? Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
18 Components of ensemble feature selection Training set Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
19 Components of ensemble feature selection Training set Feature selection algorithm 1 Ranked list 1 Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
20 Components of ensemble feature selection Training set Feature selection algorithm 1 Feature selection algorithm 2 Ranked list 1 Ranked list 2 Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
21 Components of ensemble feature selection Training set Feature selection algorithm 1 Feature selection algorithm 2 Feature selection algorithm t Ranked list 1 Ranked list 2 Ranked list T Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
22 Components of ensemble feature selection Training set Feature selection algorithm 1 Feature selection algorithm 2 Feature selection algorithm t Ranked list 1 Ranked list 2 Ranked list T Aggregation operator Consensus Ranked list C Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
23 Components of ensemble feature selection Variation in the feature selectors Choosing different feature selection techniques Dataset perturbation Instance level perturbation Feature level perturbation Stochasticity in the feature selector Bayesian model averaging Combinations of these techniques Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
24 Components of ensemble feature selection Variation in the feature selectors Choosing different feature selection techniques Dataset perturbation Instance level perturbation Feature level perturbation Stochasticity in the feature selector Bayesian model averaging Combinations of these techniques Aggregation of the results into a single output Rank aggregation Weighted rank aggregations Score aggregation Counting most frequently selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
25 Overview: 2 case studies 1 Bagging based ensemble feature selection Microarray data sets Feature ranking approach Rank aggregation method 2 Ensemble feature selection using model stochasticity Mass spectrometry data sets Feature selection approach Subset aggregation approach Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
26 Case study 1: Bagging based ensemble feature selection Generate feature selection diversity by instance perturbation Bootstrapping Generate t datasets by sampling the training set with replacement For each dataset, apply a feature selection algorithm (e.g. a ranker) EFS = {F 1, F 2,..., F t } Each feature selector F k results in a ranking f i = (fi 1,..., fi N ), where f j i denotes the rank of feature j in bootstrap i. Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
27 Aggregation methods Rank aggregation t t f = ( w 1 f 1 i,..., w N f N i ) i=1 i=1 Complete linear aggregation (CLA) w i = 1 Complete weighted aggregation (CWA) w i = OO-AUC i Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
28 Overview methodology Full data set (100% of the samples) SUBSAMPLING 90 % 90 % 90 % Marker selection algorithm Marker selection algorithm Marker selection algorithm Ranked list 1 Ranked list 2 Ranked list K Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
29 Overview methodology 90 % Marker selection algorithm Ranked list 1 Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
30 Overview methodology 90 % BOOTSTRAPPING Bootstrap 1 Bootstrap 2 Bootstrap T Marker selection algorithm Marker selection algorithm Marker selection algorithm Ranked list A Ranked list B Ranked list T Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
31 Overview methodology 90 % Consensus marker selection algorithm Consensus ranked list Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
32 Experiments Microarray datasets Name # Class 1 # Class 2 Size # Features SDR Reference Colon Alon et al. (1999) Leukemia Golub et al. (1999) Lymphoma Alizadeh et al. (2000) Prostate Singh et al. (2002) Baseline classifier/feature selection algorithm Linear SVM SVM Recursive Feature Elimination (RFE, Guyon et al. (2002)) 1 Train linear SVM on full feature set 2 Rank features based on w 3 Eliminate 50% worst features 4 Retrain SVM on remaining features 5 Go to step 2 Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
33 Results: stability distributions 45,000 colon 45,000 leukemia 40,000 40,000 35,000 35,000 30,000 30,000 25,000 25,000 20,000 20,000 15,000 15,000 10,000 10,000 5,000 5, lymphoma prostate 45,000 45,000 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5, ,000 35,000 30,000 25,000 20,000 15,000 10,000 5, ensemble baseline Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
34 Results: stability 0.8 Colon 0.8 Leukemia Kuncheva index Kuncheva index CLA CWA Baseline Percentage of selected features CLA CWA Baseline Percentage of selected features 0.8 Lymphoma 0.8 Prostate Kuncheva index Kuncheva index CLA CWA Baseline Percentage of selected features CLA CWA Baseline Percentage of selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
35 Results: classification performance 1 Colon 1 Leukemia AUC AUC CLA CWA Baseline Percentage of selected features 1 Lymphoma CLA CWA Baseline Percentage of selected features 1 Prostate AUC AUC CLA CWA Baseline Percentage of selected features CLA CWA Baseline Percentage of selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
36 Bagging based EFS: first conclusions Ensemble feature selection (EFS) increases model performance: More stable biomarker selection Increased predictive performance EFS is easy to parallelize As signature sizes get smaller, EFS progressively improves upon the baseline Robust, small signatures are interesting candidates for prognostic tests Linear aggregation method is preferred Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
37 Sensitivity analysis: number of bootstraps 0.8 Colon Effect on stability 0.8 Leukemia Kuncheva Index Boot Boot 60 Boot Baseline Percentage of selected features Lymphoma Kuncheva Index Boot Boot 60 Boot Baseline Percentage of selected features Prostate Kuncheva Index Boot Boot 60 Boot Baseline Percentage of selected features Kuncheva Index Boot Boot 60 Boot Baseline Percentage of selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
38 Sensitivity analysis: number of bootstraps 1 Effect on classification performance Colon Leukemia AUC AUC Boot Boot 60 Boot Baseline Percentage of selected features Lymphoma 20 Boot Boot 60 Boot Baseline Percentage of selected features Prostate AUC AUC Boot Boot 60 Boot Baseline Percentage of selected features 20 Boot Boot 60 Boot Baseline Percentage of selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
39 Sensitivity analysis: RFE elimination percentage 0.8 Colon Effect on stability 0.8 Leukemia Kuncheva Index CLA,E=20% CLA,E=50% 0.4 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Lymphoma Kuncheva Index CLA,E=20% CLA,E=50% 0.4 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Prostate Kuncheva Index CLA,E=20% CLA,E=50% 0.4 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Kuncheva Index CLA,E=20% CLA,E=50% 0.4 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
40 Sensitivity analysis: RFE elimination percentage 1 Effect on classification performance Colon Leukemia AUC AUC CLA,E=20% CLA,E=50% 0.8 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Lymphoma CLA,E=20% CLA,E=50% 0.8 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Prostate AUC CLA,E=20% CLA,E=50% 0.8 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features AUC CLA,E=20% CLA,E=50% 0.8 CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100% Percentage of selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
41 Bagging based EFS: final conclusions Ensemble feature selection (EFS) increases model performance: More stable biomarker selection Increased predictive performance Number of bootstraps only effects stability RFE elimination percentage does not affect EFS RFE elimination percentage has a strong impact on baseline: Single run SVM performs best in terms of stability Smaller impact on classification performance Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
42 Case study 2: Ensemble FS using model stochasticity Traditional approach: Run a stochastic FS method many times (e.g. MCMC, Genetic Algorithm, stochastic iterative sampling) Compare all feature subsets found Make a final selection Intersection of the results Most frequently selected features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
43 Case study 2: Ensemble FS using model stochasticity Traditional approach: Run a stochastic FS method many times (e.g. MCMC, Genetic Algorithm, stochastic iterative sampling) Compare all feature subsets found Make a final selection Intersection of the results Most frequently selected features Computationally more efficient approach: Don t use only the single best results of the sampling procedure Average over the whole distribution Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
44 Estimation of distribution algorithms (EDA) Instead of working on one solution, work on a set of solutions (distribution) Use stochastic iterative sampling, combined with probabilistic graphical models to model good solutions 1. Generate initial solution set S0 2. Select a number of samples 3. Estimate probability distribution 4. Generate new samples i by sampling the estimated distribution 5. Create new solution set No Termination criteria met? Yes End Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
45 Estimating the probability distribution EDA UMDA BMDA BOA, EBNA Graphical model X 4 X 3 X 7 X 1 X 5 X 8 X 2 X 6 X 1 X 2 X 1 X 2 X 4 X 4 X 3 X 3 X 5 X X 5 6 X 6 X 7 X X 7 8 X 8 Probability distribution X i p(x ij ) j=0,1 X 1 p(x 1j ) X 2 p(x j 2 ) X 3 p(x 3j ) X 4 p(x j 4 ) X 5 p(x j 5 ) X 6 p(x 6j ) X 7 p(x 7j ) X 8 p(x 8j ) X i p(x ij ) j,k=0,1 X 1 p(x 1j ) X 2 p(x 2j ) X 3 p(x 3j ) X 4 p(x j 4 x k 1 ) X 5 p(x j 5 x k 3 ) X 6 p(x j 6 x k 4 ) X 7 p(x j 7 x k 3 ) X 8 p(x 8j x 5k ) X i p(x ij ) j,k,l=0,1 X 1 p(x 1j ) X 2 p(x j 2 ) X 3 j p(x 3 x 1k ) X 4 p(x 4j x 1k ) X 5 p(x j 5 x k 3, x l 4 ) X 6 p(x 6j x 4k ) X 7 p(x 7j x 3k ) X 8 p(x j 8 x k 5 ) Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
46 Experiments Mass spectrometry datasets Name # C1 # C2 Size # Features SDR Reference Ovarian cancer profiling ,200 0,0044 Petricoin et al. (2002) Detection of drug-induced toxicity ,200 0,00137 Petricoin et al. (2004) Hepatocellular carcinoma ,802 0,0041 Ressom et al. (2006) Estimation algorithms: UMDA, BMDA Classifiers: Naive Bayes, k-nn, SVM Average all EDA results over 500 multistarts Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
47 Results [preliminary] Usage for knowledge discovery: peak frequency plots Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
48 Future challenges Better dealing with correlated features First cluster correlated features, then choose representatives from each cluster, and build a model with the representatives Adapt similarity measures to deal with correlated features Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
49 Future challenges Better dealing with correlated features First cluster correlated features, then choose representatives from each cluster, and build a model with the representatives Adapt similarity measures to deal with correlated features Increasing stability by transfer learning Assume 2 related datasets D 1 and D 2 Use feature selection on D 1 as prior for feature selection on D 2 Preliminary research shows that this transferral of feature selection information increases the stability of feature selection on D 2 [Helleputte and Dupont (2009)] Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
50 Future challenges Better dealing with correlated features First cluster correlated features, then choose representatives from each cluster, and build a model with the representatives Adapt similarity measures to deal with correlated features Increasing stability by transfer learning Assume 2 related datasets D 1 and D 2 Use feature selection on D 1 as prior for feature selection on D 2 Preliminary research shows that this transferral of feature selection information increases the stability of feature selection on D 2 [Helleputte and Dupont (2009)] A comparative evaluation of different ensemble FS techniques Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
51 Acknowledgements Thomas Abeel (Ghent University) Yves Van de Peer (Ghent University) Thibault Helleputte (UC Louvain) Pierre Dupont (UC Louvain) Ruben Armañanzas (Universidad Politecnica de Madrid) Iñaki Inza (University of the Basque Country) Pedro Larrañaga (Universidad Politecnica de Madrid) Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
52 Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., & Levine, A. (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, Golub, T., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, Helleputte, T., & Dupont, P. (2009). Feature selection by transfer learning with linear regularized models. Lecture Notes in Artificial Intelligence, 5781, Kuncheva, L. (2007). A stability index for feature selection. Proceedings of the 25th International Multi-Conference on Artificial Intelligence and Applications (pp ). Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C., & Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, Petricoin, E. F., Rajapaske, V., Herman, E. H., Arekani, A. M., Ross, S., Johann, D., Knapton, A., Zhang, J., Hitt, B. A., Conrads, T. P., Veenstra, T. D., Liotta, L. A., & Sistare, F. D. (2004). Toxicoproteomics: Serum proteomic pattern diagnostics for early detection of drug induced cardiac toxicities and cardioprotection. Toxicologic Pathology, 32, Ressom, H. W., Varghese, R. S., Orvisky, E., Drake, S. K., Hortin, G. L., Abdel-Hamid, M., Loffredo, C. A., & Goldman, R. (2006). Ant colony optimization for biomarker identification from MALDI-TOF mass spectra. Proceedings of the 28th International Conference of the IEEE Engineering in Medicine and Biology Society (pp ). Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., TR, T. G., & Sellers, W. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1, Yvan Saeys (UGent) Towards robust feature selection Marseille / 36
Stability of Feature Selection Algorithms
Stability of Feature Selection Algorithms Alexandros Kalousis, Jullien Prados, Phong Nguyen Melanie Hilario Artificial Intelligence Group Department of Computer Science University of Geneva Stability of
More informationEstimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification
1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore
More informationTHE EFFECT OF NOISY BOOTSTRAPPING ON THE ROBUSTNESS OF SUPERVISED CLASSIFICATION OF GENE EXPRESSION DATA
THE EFFECT OF NOISY BOOTSTRAPPING ON THE ROBUSTNESS OF SUPERVISED CLASSIFICATION OF GENE EXPRESSION DATA Niv Efron and Nathan Intrator School of Computer Science, Tel-Aviv University, Ramat-Aviv 69978,
More informationFeature Selection for SVMs
Feature Selection for SVMs J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik Barnhill BioInformatics.com, Savannah, Georgia, USA. CBCL MIT, Cambridge, Massachusetts, USA. AT&T Research
More informationSVM-Based Local Search for Gene Selection and Classification of Microarray Data
SVM-Based Local Search for Gene Selection and Classification of Microarray Data Jose Crispin Hernandez Hernandez, Béatrice Duval, and Jin-Kao Hao LERIA, Université d Angers, 2 Boulevard Lavoisier, 49045
More informationAssessing Similarity of Feature Selection Techniques in High-Dimensional Domains
The following article has been published in Cannas L.M, Dessì N., Pes B., Assessing similarity of feature selection techniques in high-dimensional domains, Pattern Recognition Letters, Volume 34, Issue
More informationSTABILITY OF FEATURE SELECTION ALGORITHMS AND ENSEMBLE FEATURE SELECTION METHODS IN BIOINFORMATICS
STABILITY OF FEATURE SELECTION ALGORITHMS AND ENSEMBLE FEATURE SELECTION METHODS IN BIOINFORMATICS PENGYI YANG, BING B. ZHOU, JEAN YEE HWA YANG, ALBERT Y. ZOMAYA SCHOOL OF INFORMATION TECHNOLOGIES AND
More informationReview of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.
Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationPackage OrderedList. December 31, 2017
Title Similarities of Ordered Gene Lists Version 1.50.0 Date 2008-07-09 Package OrderedList December 31, 2017 Author Xinan Yang, Stefanie Scheid, Claudio Lottaz Detection of similarities between ordered
More informationGene Expression Based Classification using Iterative Transductive Support Vector Machine
Gene Expression Based Classification using Iterative Transductive Support Vector Machine Hossein Tajari and Hamid Beigy Abstract Support Vector Machine (SVM) is a powerful and flexible learning machine.
More informationStability of Feature Selection Algorithms
Stability of Feature Selection Algorithms Alexandros Kalousis, Julien Prados, Melanie Hilario University of Geneva, Computer Science Department Rue General Dufour 24, 1211 Geneva 4, Switzerland {kalousis,
More informationComparison of different preprocessing techniques and feature selection algorithms in cancer datasets
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract
More informationA New Implementation of Recursive Feature Elimination Algorithm for Gene Selection from Microarray Data
2009 World Congress on Computer Science and Information Engineering A New Implementation of Recursive Feature Elimination Algorithm for Gene Selection from Microarray Data Sihua Peng 1, Xiaoping Liu 2,
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More informationStable Feature Selection via Dense Feature Groups
Stable Feature Selection via Dense Feature Groups Lei Yu Department of Computer Science Binghamton University Binghamton, NY 392, USA lyu@cs.binghamton.edu Chris Ding Department of Computer Science and
More informationStatistical dependence measure for feature selection in microarray datasets
Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department
More informationA hybrid of discrete particle swarm optimization and support vector machine for gene selection and molecular classification of cancer
A hybrid of discrete particle swarm optimization and support vector machine for gene selection and molecular classification of cancer Adithya Sagar Cornell University, New York 1.0 Introduction: Cancer
More informationModel Mining for Robust Feature Selection
Model Mining for Robust Feature Selection Adam Woznica University of Geneva Department of Computer Science 7, route de Drize Battelle - building A 1227 Carouge, Switzerland Adam.Woznica@unige.ch Phong
More informationKernel methods for mixed feature selection
ESANN 04 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 3- April 04, i6doc.com publ., ISBN 978-874909-7. Available from
More informationFeature Selection and Classification for Small Gene Sets
Feature Selection and Classification for Small Gene Sets Gregor Stiglic 1,2, Juan J. Rodriguez 3, and Peter Kokol 1,2 1 University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor,
More informationThe Iterative Bayesian Model Averaging Algorithm: an improved method for gene selection and classification using microarray data
The Iterative Bayesian Model Averaging Algorithm: an improved method for gene selection and classification using microarray data Ka Yee Yeung, Roger E. Bumgarner, and Adrian E. Raftery April 30, 2018 1
More informationAutomated Microarray Classification Based on P-SVM Gene Selection
Automated Microarray Classification Based on P-SVM Gene Selection Johannes Mohr 1,2,, Sambu Seo 1, and Klaus Obermayer 1 1 Berlin Institute of Technology Department of Electrical Engineering and Computer
More informationFeature Selection in Knowledge Discovery
Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco
More information10. Clustering. Introduction to Bioinformatics Jarkko Salojärvi. Based on lecture slides by Samuel Kaski
10. Clustering Introduction to Bioinformatics 30.9.2008 Jarkko Salojärvi Based on lecture slides by Samuel Kaski Definition of a cluster Typically either 1. A group of mutually similar samples, or 2. A
More informationIntroduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.677 C.B.C.L Paper No.8
More informationSVM Classification in -Arrays
SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What
More informationThe Imbalanced Problem in Mass-spectrometry Data Analysis
The Second International Symposium on Optimization and Systems Biology (OSB 08) Lijiang, China, October 31 November 3, 2008 Copyright 2008 ORSC & APORC, pp. 136 143 The Imbalanced Problem in Mass-spectrometry
More informationProfiling of mass spectrometry data for ovarian cancer detection using negative correlation learning
Profiling of mass spectrometry data for ovarian cancer detection using negative correlation learning Shan He, Huanhuan Chen, Xiaoli Li, and Xin Yao The Centre of Excellence for Research in Computational
More informationFeature Selection for SVMs
Feature Selection for SVMs The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Weston, J., S. Mukherjee,
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationGene selection through Switched Neural Networks
Gene selection through Switched Neural Networks Marco Muselli Istituto di Elettronica e di Ingegneria dell Informazione e delle Telecomunicazioni Consiglio Nazionale delle Ricerche Email: Marco.Muselli@ieiit.cnr.it
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationNeighborhood Component Feature Selection for High-Dimensional Data
JOURNAL OF COMPUTERS, VOL. 7, NO., JANUARY Neighborhood Component Feature Selection for High-Dimensional Data Wei Yang,Kuanquan Wang and Wangmeng Zuo Biocomputing Research Centre, School of Computer Science
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More informationA Memetic Algorithm for Gene Selection and Molecular Classification of Cancer
A Memetic Algorithm for Gene Selection and Molecular Classification of Cancer Béatrice Duval LERIA, Université d Angers 2 Boulevard Lavoisier 49045 Angers cedex 01, France bd@info.univ-angers.fr Jin-Kao
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationCURSE-OF-DIMENSIONALITY is a major problem associated
24 Feature Article: An Embedded Two-Layer Feature Selection Approach for Microarray Data Analysis An Embedded Two-Layer Feature Selection Approach for Microarray Data Analysis Pengyi Yang and Zili Zhang
More informationAn integrated tool for microarray data clustering and cluster validity assessment
An integrated tool for microarray data clustering and cluster validity assessment Nadia Bolshakova Department of Computer Science Trinity College Dublin Ireland +353 1 608 3688 Nadia.Bolshakova@cs.tcd.ie
More informationRedundancy Based Feature Selection for Microarray Data
Redundancy Based Feature Selection for Microarray Data Lei Yu Department of Computer Science & Engineering Arizona State University Tempe, AZ 85287-8809 leiyu@asu.edu Huan Liu Department of Computer Science
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationStable Feature Selection with Minimal Independent Dominating Sets
Stable Feature Selection with Minimal Independent Dominating Sets Le Shu Computer and Information Science, Temple University 85 N. Broad St. Philadelphia, PA slevenshu@gmail.com Tianyang Ma Computer and
More informationPackage twilight. August 3, 2013
Package twilight August 3, 2013 Version 1.37.0 Title Estimation of local false discovery rate Author Stefanie Scheid In a typical microarray setting with gene expression data observed
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationComparison of Optimization Methods for L1-regularized Logistic Regression
Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationDimensionality Reduction, including by Feature Selection.
Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Adrian Alexa Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken slides by Jörg Rahnenführer NGFN - Courses
More informationClustering Jacques van Helden
Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation
More informationCenter for Genome Research, MIT Whitehead Institute, One Kendall Square, Cambridge, MA 02139, USA
BIOINFORMATICS Vol. 17 Suppl. 1 2001 Pages S316 S322 Molecular classification of multiple tumor types Chen-Hsiang Yeang, Sridhar Ramaswamy, Pablo Tamayo, Sayan Mukherjee, Ryan M. Rifkin, Michael Angelo,
More informationSUPPLEMENTARY INFORMATION
Supplementary Discussion 1: Rationale for Clustering Algorithm Selection Introduction: The process of machine learning is to design and employ computer programs that are capable to deduce patterns, regularities
More informationWeighted Heuristic Ensemble of Filters
Weighted Heuristic Ensemble of Filters Ghadah Aldehim School of Computing Sciences University of East Anglia Norwich, NR4 7TJ, UK G.Aldehim@uea.ac.uk Wenjia Wang School of Computing Sciences University
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationContents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results
Statistical Analysis of Microarray Data Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation of clustering results Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationUsing a Clustering Similarity Measure for Feature Selection in High Dimensional Data Sets
Using a Clustering Similarity Measure for Feature Selection in High Dimensional Data Sets Jorge M. Santos ISEP - Instituto Superior de Engenharia do Porto INEB - Instituto de Engenharia Biomédica, Porto,
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationComputers in Biology and Medicine
Computers in Biology and Medicine 39 (29) 818 -- 823 Contents lists available at ScienceDirect Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm Feature extraction and dimensionality
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationVariable Selection using Random Forests
Author manuscript, published in "Pattern Recognition Letters 31, 14 (21) 222-2236" Variable Selection using Random Forests Robin Genuer a, Jean-Michel Poggi,a,b, Christine Tuleau-Malot c a Laboratoire
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationMining Quantitative Maximal Hyperclique Patterns: A Summary of Results
Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results Yaochun Huang, Hui Xiong, Weili Wu, and Sam Y. Sung 3 Computer Science Department, University of Texas - Dallas, USA, {yxh03800,wxw0000}@utdallas.edu
More informationA Regularized Method for Selecting Nested Groups of. Relevant Genes from Microarray Data
A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data Christine De Mol Université Libre de Bruxelles, Dept Math. and ECARES Campus Plaine CP217, 1050 Brussels, Belgium
More informationPrognosis of Lung Cancer Using Data Mining Techniques
Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,
More informationFeature Selection for Support Vector Machines by Means of Genetic Algorithms
Feature Selection for Support Vector Machines by Means of Genetic Algorithms Holger Fröhlich holger.froehlich@informatik.uni-tuebingen.de ernhard Schölkopf bernhard.schoelkopf@tuebingen.mpg.de Olivier
More informationMIT 9.520/6.860 Project: Feature selection for SVM
MIT 9.520/6.860 Project: Feature selection for SVM Antoine Dedieu Operations Research Center Massachusetts Insitute of Technology adedieu@mit.edu Abstract We consider sparse learning binary classification
More informationarxiv: v1 [stat.ap] 9 Nov 2017
Dimension Reduction of High-Dimensional Datasets Based on Stepwise SVM arxiv:1711.03346v1 [stat.ap] 9 Nov 2017 Elizabeth P. Chou, Tzu-Wei Ko Department of Statistics, National Chengchi University, No.64,
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationVariable selection using Random Forests
Variable selection using Random Forests Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot To cite this version: Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot. Variable selection using Random
More informationForward Feature Selection Using Residual Mutual Information
Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics
More informationFilter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array: A Review
Filter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array: A Review Binita Kumari #1, Tripti Swarnkar *2 #1 Department of Computer Science - *2 Department of Computer Applications,
More informationTitle: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data
Supplementary material for Manuscript BIOINF-2005-1602 Title: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data Appendix A. Testing K-Nearest Neighbor and Support
More informationClustering and Classification. Basic principles of clustering. Clustering. Classification
Classification Clustering and Classification Task: assign objects to classes (groups) on the basis of measurements made on the objects Jean Yee Hwa Yang University of California, San Francisco http://www.biostat.ucsf.edu/jean/
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationarxiv: v1 [cs.cv] 18 Apr 2017
arxiv:1704.05409v1 [cs.cv] 18 Apr 2017 Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality Giorgio Roffo 1,2 Simone Melzi 2 Giorgio.Roffo@glasgow.ac.uk Simone.Melzi@univr.it 1 School
More informationFeature Selection and Classification of High Dimensional Mass Spectrometry Data: A Genetic Programming Approach
Feature Selection and Classification of High Dimensional Mass Spectrometry Data: A Genetic Programming Approach Soha Ahmed 1, Mengjie Zhang 1, and Lifeng Peng 2 1 School of Engineering and Computer Science
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationPattern Recognition Letters
Pattern Recognition Letters 31 (21) 2225 2236 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec Variable selection using random forests
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More informationMTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen
MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 2: Feature selection Feature Selection feature selection (also called variable selection): choosing k < d important
More informationKeywords: Linear dimensionality reduction, Locally Linear Embedding.
Orthogonal Neighborhood Preserving Projections E. Kokiopoulou and Y. Saad Computer Science and Engineering Department University of Minnesota Minneapolis, MN 4. Email: {kokiopou,saad}@cs.umn.edu Abstract
More informationTechnical Report. Feature Selection for High-Dimensional Data with RapidMiner. Sangkyun Lee, *Benjamin Schowe, *Viswanath Sivakumar, Katharina Morik
Feature Selection for High-Dimensional Data with RapidMiner Technical Report Sangkyun Lee, *Benjamin Schowe, *Viswanath Sivakumar, Katharina Morik 01/2011 technische universität dortmund Part of the work
More informationmirsig: a consensus-based network inference methodology to identify pan-cancer mirna-mirna interaction signatures
SUPPLEMENTARY FILE - S1 mirsig: a consensus-based network inference methodology to identify pan-cancer mirna-mirna interaction signatures Joseph J. Nalluri 1,*, Debmalya Barh 2, Vasco Azevedo 3 and Preetam
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationCombining nearest neighbor classifiers versus cross-validation selection
Statistics Preprints Statistics 4-2004 Combining nearest neighbor classifiers versus cross-validation selection Minhui Paik Iowa State University, 100min@gmail.com Yuhong Yang Iowa State University Follow
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class
More informationImproving Feature Selection Techniques for Machine Learning
Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 11-27-2007 Improving Feature Selection Techniques for Machine Learning Feng
More informationA Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu
More informationOpen Access Computer Image Feature Extraction Algorithm for Robot Visual Equipment
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 5, 7, 993-993 Open Access Computer Image Feature Extraction Algorithm for Robot Visual Equipment
More informationDouble Self-Organizing Maps to Cluster Gene Expression Data
Double Self-Organizing Maps to Cluster Gene Expression Data Dali Wang, Habtom Ressom, Mohamad Musavi, Cristian Domnisoru University of Maine, Department of Electrical & Computer Engineering, Intelligent
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationA new approach to enhance the performance of decision tree for classifying gene expression data
PROCEEDINGS Open Access A new approach to enhance the performance of decision tree for classifying gene expression data Md Rafiul Hassan 1*, Ramamohanarao Kotagiri 2 From Great Lakes Bioinformatics Conference
More informationGene extraction for cancer diagnosis by support vector machines An improvement
Artificial Intelligence in Medicine (2005) 35, 185 194 http://www.intl.elsevierhealth.com/journals/aiim Gene extraction for cancer diagnosis by support vector machines An improvement Te Ming Huang, Vojislav
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More information