Data Mining Final Project on NIPS Competition 2003
|
|
- Mabel Ellis
- 6 years ago
- Views:
Transcription
1 Data Mining Final Project on NIPS Competition 2003 Chun F.Hsu R Yao-Hui Yu R Yin-Tzu Lin R June 18, 2006 Abstract Feature Selection is an important issues in data mining, its main purpose is to reduce dimensionality and to filter the noise. In this project, we will perform classification on data sets with preprocessing by feature selection. We focus on two feature selection methods, PCA and univariate significance test and combined them with SVM to do predictions. The output prediction balanced error rate and area under curve(auc) has improved by using Feature selection methods. We will also have short discuss on different classifiers performance, such as KNN. Contents 1 Introduction Competition Scoring Criteria and Analysis AUC BER Training Procedure Feature Selection Method PCA Univariate Significance Test Result Experiment Result Using Libsvm Result without F.S Final Result Comparison with PCA+KNN Summary Critism on UST Critism on PCA Group Contribution and Participation
2 List of Tables 1 First Observation After Data Feature Selection Result without F.S Final Result Madelon comparison between SVM and KNN Gisette comparison between SVM and KNN Introduction 1.1 Competition Scoring Criteria and Analysis The data sets we used come from NIPS2003, there is a competition using five data sets to do classification. Our goal is to figure out why 1st place has won and by tracing their work, we will do a implementation on our own. In NIPS2003, they performed a series of test on the output result and collect these as the final score. However, we don t have the True labels of testing set, and we don t know where they had inserted noise in which they called Probes. So the only measures we can do is BER and AUC. We will apply them on adjusting results AUC AUC(Area Under Curve)has been proposed by many researcher using it to evaluate the predictive ability of learning algorithms. It will calculate the area under ROC curve. Our implementation of AUC is based on [1]. The formula is  = S 0 n 0 (n 0 + 1)/2 n 0 n 1, (1) where n 0 and n 1 are the numbers of positive and negative examples, S 0 = r i, r i is the rank of i th example in the ranked list BER BER(Balanced Error Rate) is a measure that wants to pull within the importance of one large size data set and a small size data set. It set the weight of two data set to be equvalent as 50% each. The formula would be BER = F P R + F NR 2, while F P R is F P /F P + T P and F NR is F N/F N + T N, which are False Positive, True Positive,False Negative and True Negative of prediction result. 1.2 Training Procedure Our training procedure can be separated into several steps, 1. Data Preprocessing: Our classifier is libsvm[3], it is a commonly used SVM tool. We have to combine class labels with attributes or features, then throws into textsllibsvm. features, and transform them into sparse mode. However, in R[4] environment, they have a package e1071 to do this. The function is called write.matrix.csr. (2) 2
3 2. Feature selection: In this stage, we split the flow to two directions, one is using PCA and another is using permutation test. In both methods, when we received the data after selection, we used binary search to search for the best # of attributes. For examples, in Madelon, by using this method. We get the result that only need 5 attributes to acheive the best performance. 3. Run Classifiers: After feature selection, we used grid search for the best parameters in libsvm, therefore, when we grab the parameters in our hands, then we do training with probability output. However, since we don t have test data label, so we treated the valid data as test data and uses Cross Validation on training data to seek for good parameters. Scaling between 0 1 is also need to be considered. It may change the result significantly. 4. Compute BER and AUC: Using our implementations of these measures, we got a neat evaluations of our results and classfiiers. 2 Feature Selection 2.1 Method PCA The PCA (Principal Component Analysis) is a procedure computes the most meaningful basis to re-express a noisy, garbled data set. It is useful on pattern recognition, image compression, and feature selection. Here we use it to select useful features to improve our learning model. Below is the basic PCA steps used in this project. [6][7] 1. Construct the covariance matrix M for the training data. 2. Compute the eigenvectors of M. 3. Use the first k eigenvectors as the new basis, and transform the training and validate data to this new basis. K need to be preselected. 4. Now we can use the transformed data to do training Univariate Significance Test Significance Test is to set up a null hypothesis(h 0 ), then we use several statistical methods to compute the probability estimate(p-value) on this hypothesis H 0. i.e. if p = 0.5 means It is 50 % likely that H 0 is true. The winner s Method: They assume the relevant variables will be at least somewhat relevent on their own,and only apply significance test on features that were non-zero in at least 4 training instances. He use Pearson and Spearman 3
4 correlation between each attribute and class label, finally, using permutation test to estimate p-value, the formula of computing correlation are as follows: r s = 1 6 (X Y ) 2 n(n 2, (3) 1) p-value r p = X Y XY n ( X 2 ( X) 2 n )( Y 2 ( Y ) 2, (4) n ) p = 2 min( 1 I(r xyπ >= r xy ), 1 I(r xyπ <= r xy )) (5) n! n! π π x is feature, y is label, n is the total number of instance and there are m! possible permutations. y π represent one of this permutation. { I(.) = 1, if. = T rue I(.) = I(.) = 0, else (6) Thus, if we want the p-value to be larger, then the number of (r xyπ >= r xy ) and (r xyπ <= r xy ) should be identical. However, when we try the method mentioned above, we encountered a big problem: the number of permutations could be exhaustively large. Take the case if n = 6000 (number of instances of dataset Gisette), the permutations will be e 20065, which will be too much to implement. So, we try to find some alternatives [2].In this paper, the author use another approach of permutation test, let X = [X 1, X 2, X 3...X k ] t be features, Y is label 1 and -1, He assume that the relevence of feature X is measured as the difference between Pr[X = x Y = 1]and Pr[X = x Y = 1] Then he use the following four methods to compute this difference: difference in sample means(r M ), Symmetric variant of the Kullback- Leibler distance(j-measure,r J ), Information gain(r IG ) and chi 2 statistics-based measure(r CHI ). Afterall, he defined Dj 1 = x i,j y i = 1 and D 1 j = x i,j y i = 1 as the values of j t h features of all instances with class 1 and -1. θ(dj 1, D 1 j ) is the statistic methods mentioned above, π j represent p-value and the null hypothesis H 0 : Pr[X j Y = 1] = Pr[X j Y = 1]. It says the relevance of X j is inversely proposional to p-value π j then use the permutation test to estimate π j for each feature j. Formula 1 Let U (b) and V (b) denote the shuffled values of the j th features of all instances with class 1 and -1. b is one of the permutation. p = 1 B + 1 b I(θ(U (b), V (b)) > θ(d 1 j, D 1 j )) To reduce the large computations, just do B times permutation instead of n!, the author mentioned that the value above(i(θ(u (b), V (b))) is a random variable of binomial distribution b(b, π j ), so to limit the estimation error under 10%, he set the coefficient of variation CV = 1 πj π jb = 0.1, thus if we want to reduce 4
5 the permutation to 2000, only p-value estimated larger than 0.05 is acquired. Since when π j is small, it is not reliable in ranking, so using Z-score instead, Z-score = θ(d1 j, D 1 j ) mean(θ(u, V )) std(θ(u, V )) (7) to rank the estimate p-values under In his conclusion, difference in sample means(r M ) and Symmetric variant of the Kullback-Leibler distance(jmeasure,r J ) are better. So we adapted these two instead of all four. Therefore, for our implementation in significance test. We slightly combined the winner s methid and the paper s method. 1. We picked the features with at least 4 instances non-zero 2. Base on the paper s algorithm and use r M and r J [5] their formula is as follows, r M (X) = E[X Y = 1] E[X Y = 1], (8) r J (X) = Pr[X = x Y = 1] Pr[X = x Y = 1] Pr[X = x Y = 1] log 2 ( Pr[X = x Y = 1] ), 3. Apply shuffles on class label (the same as the winner). 4. Choose B=150 to estimate p-value(as for p-value under 0.4, use Z-score). Because the running time of 2000 permutations is exhaustive long(1 week or more). Since what we want is the larger θ(dj 1, D 1 j ) the better, so we consider bigger Z-score to be more important feature. 5. After compute the p-vlaue, we have 2 kinds of importance order, one is using r M as θ(d 1 j, D 1 j ), the other is r J. 2.2 Result Here we listed the result after picked by feature selectionin table 1, ranking is based on their Error Rate. Sel stands for Feature Selection algorithm, Ffeat means percentage of numbers of feature selected. Scale will scale data in 0 to 1. Parameter stands for the suggested parameters by libsvm. UST is abbreviation of Univariate Significance Test. (9) Table 1: First Observation After Data Feature Selection Data Set Sel Ffeat Scale Parameter CV Error Rate Arcene UST 25% Yes 2048, % Dexter UST 1.09% No 2, % Dorothea UST NA NA NA NA Gisette PCA 1% Yes 2, % Gisette UST NA NA NA NA Madelon PCA 1% Yes 0.5, % Madelon UST 1% Yes 32, % 5
6 Table 2: Result without F.S Data Set Scale Parameter CV Err Test Err. AUC Arcene Yes 32, % 16.0% 93.58% Dexter Yes 32, % 7.0% 97.81% Dorothea No 8, % % 93.95% Gisette No NA 19.55% NA NA Madelon Yes 32.0, % 3 Experiment Result 3.1 Using Libsvm We decided to used libsvm as our tool doing classification, one reason is it has almost all the strong and neat tools to help us run the experiment. Table 2 is the plain result which doesn t do any feature selecting work, actually in libsvm there is a feature selection tool called FScore. However, it is not in our project plan. We are supposed to follow our own discovered method to do selecting work. The result show that even without feature selection, libsvm performed very well on Aecene, Dexter, Dorothea. In Madelon, the result is a little bit not good, as compare to above data. In Gisette, training take almost a day to complete it, due to today, grid on Gisette is still running. 3.2 Result without F.S First we had run all datas without any feature selection on libsvm, in Table 2 it shows the experiment result. CV err stands for error of Cross Validation on training set. Parameter is c and gamma belongs to SVM. 3.3 Final Result The final result is on Table3, which we add two performance measure into it, AUC and BER. We had combined both selection method and chose the best on stands for each data set. However, Arcene and Dexter s BER and AUC is better in plain SVM, but we are closed to it. Without scaling, Gisette and Madelon won t achieve this result, but when in plain SVM, scaling in Madelon makes the result even worse, this is very tricky as if scaling doesn t work at one data set, it won t work at the same dat set if this data set has been after some transformation. Gisette s result is very good, it has reach above 95% accuracy, the Ffeat score also shows the Gisette result is indeed inspiring. Due to today, Dorothea data of UST has nt come out, or our result may be even better adding feature selection factor onto it. 3.4 Comparison with PCA+KNN Our PCA procedure only runs on Madelon and Gisette, so we compare the performance of knn and svm on these two dataset. 6
7 Table 3: Final Result Data Set Selection Ffeat Testing Err. AUC BER Arcene UST 25% 19.00% 90.99% 19.23% Dexter UST 1.09% 21.33% 88.10% 19.9% Dorothea SVM % % 93.95% 13.69% Gisette PCA 0.01% 5.3% 99.6% 4.79% Madelon PCA 0.01% 10.5% 95.82% 10.49% In table 4 and table 5, We can see that after feature selection, both classifiers Table 4: Madelon comparison between SVM and KNN Classifier Ffeat Testing Err. svm % knn(k=5,one of the best in 1nn 100nn) % knn(avg. of 1nn 100nn) % Table 5: Gisette comparison between SVM and KNN Classifier Ffeat Testing Err. svm % knn(k=5,one of the best in 1nn 100nn) % knn(avg. of 1nn 100nn) % improves to a similar level of accuracy. They nearly doesn t make any difference. But before feature selection, they both have very poor result. This shows our direction of feature selection really improves a lot as against no feature selection work. 4 Summary The whole project shows that if we apply feature selection method on certain data sets, it will improve the performance and lowers the dimensionality or complexity of classifying work. In order to complete this project properly, we will sent our result to the website and wait to see the response on how far we get in this project. 4.1 Critism on UST Because we reduce the permutation times to 150 to avoid exhaustive computation, it is inevitable that the error rate will be larger. Nevertheless, we still can see some potentials in our experiment results that if we have a more powerful computer to run more permutations, the result will be better. 7
8 4.2 Critism on PCA Although our PCA only works at two data sets, but it has shown a significant improvement on two data sets, it helps to decrease error rate and raise AUC to a acceptible level. 4.3 Group Contribution and Participation All three of us attends this project diligently, most ideas came from iterative discussion, and below is our individual contributions. Chun F.Hsu : AUC, fileconverter tool code implementation, report designer, plain SVM testing, data and result organizing and the coordinator in the project. Yao-Hui Yu: PCA part designer and ran whole PCA process, PCA+KNN experiment, BER tool design. Yin-Tzu Lin: Univariate Significance Test designer, paper collecting and surveying, SVM testing on UST data. References [1] Jin Huang Charles X.Ling, Using AUC and Accuracy in Evaluating Learning Algorithms IEEE Transaction on Knowledge and Data Engineering, Vol 17, No.3, 2005 [2] P. Radivojac, Z. Obradovic, A. K. Dunker, S. Vucetic. Feature selection filters based on the permutation test European Conference on Machine Learning, ECML 2004, Pisa, Italy,pp , September 2004 [3] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, Software available at cjlin/libsvm [4] [5] Lin, J., Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, (1): p [6] Lindsay I Smith. A tutorialon Principal Components Analysis, tutorials/principal components.pdf [7] Jon Shlens. TUTORIAL ON PRINCIPAL COMPONENT ANAL- YSIS Derivation, Discussion and Singular Value Decomposition 8
Combining SVMs with Various Feature Selection Strategies
Combining SVMs with Various Feature Selection Strategies Yi-Wei Chen and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan Summary. This article investigates the
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationChapter 22 Information Gain, Correlation and Support Vector Machines
Chapter 22 Information Gain, Correlation and Support Vector Machines Danny Roobaert, Grigoris Karakoulas, and Nitesh V. Chawla Customer Behavior Analytics Retail Risk Management Canadian Imperial Bank
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationSingular Value Decomposition, and Application to Recommender Systems
Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationENEE633 Project Report SVM Implementation for Face Recognition
ENEE633 Project Report SVM Implementation for Face Recognition Ren Mao School of Electrical and Computer Engineering University of Maryland Email: neroam@umd.edu Abstract Support vector machine(svm) is
More informationSoftware Documentation of the Potential Support Vector Machine
Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany
More informationStability of Feature Selection Algorithms
Stability of Feature Selection Algorithms Alexandros Kalousis, Jullien Prados, Phong Nguyen Melanie Hilario Artificial Intelligence Group Department of Computer Science University of Geneva Stability of
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationCombining SVMs with Various Feature Selection Strategies
º Đ º ³ È Combining SVMs with Various Feature Selection Strategies Ï ö ø ¼ ¼ á ¼ ö Ç Combining SVMs with Various Feature Selection Strategies by Yi-Wei Chen A dissertation submitted in partial fulfillment
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationClassification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging
1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant
More informationAutomatic Fatigue Detection System
Automatic Fatigue Detection System T. Tinoco De Rubira, Stanford University December 11, 2009 1 Introduction Fatigue is the cause of a large number of car accidents in the United States. Studies done by
More informationDimensionality Reduction, including by Feature Selection.
Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain
More informationFeature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter
Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter Marcin Blachnik 1), Włodzisław Duch 2), Adam Kachel 1), Jacek Biesiada 1,3) 1) Silesian University
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationExplore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan
Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents
More informationNoisy iris recognition: a comparison of classifiers and feature extractors
Noisy iris recognition: a comparison of classifiers and feature extractors Vinícius M. de Almeida Federal University of Ouro Preto (UFOP) Department of Computer Science (DECOM) viniciusmdea@gmail.com Vinícius
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationMulti-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns
Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns Artwork by Leon Zernitsky Jesse Rissman NITP Summer Program 2012 Part 1 of 2 Goals of Multi-voxel Pattern Analysis Decoding
More informationSegmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su
Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su Radiologists and researchers spend countless hours tediously segmenting white matter lesions to diagnose and study brain diseases.
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationSeparating Speech From Noise Challenge
Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationLinear Discriminant Analysis for 3D Face Recognition System
Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams
/9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationPCA-based algorithm for constructing ensembles of feature ranking filters
PCA-based algorithm for constructing ensembles of feature ranking filters Andrey Filchenkov, Vladislav Dolganov and Ivan Smetannikov ITMO University - International Laboratory Computer technology Kronverksky
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More informationFace Recognition using Eigenfaces SMAI Course Project
Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More informationFEATURE SELECTION TECHNIQUES
CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationSkin and Face Detection
Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationPROBLEM 4
PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25
More informationFeature Ranking Using Linear SVM
JMLR: Workshop and Conference Proceedings 3: 53-64 WCCI2008 workshop on causality Feature Ranking Using Linear SVM Yin-Wen Chang Chih-Jen Lin Department of Computer Science, National Taiwan University
More informationInformation theory methods for feature selection
Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation
More informationUse of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University
Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the
More informationFeature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262
Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel
More informationMathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul
Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we
More informationPredicting Gene Function and Localization
Predicting Gene Function and Localization By Ankit Kumar and Raissa Largman CS 229 Fall 2013 I. INTRODUCTION Our data comes from the 2001 KDD Cup Data Mining Competition. The competition had two tasks,
More informationSupervised Random Walks
Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate
More informationVoxel selection algorithms for fmri
Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationLorentzian Distance Classifier for Multiple Features
Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey
More informationNearest Neighbor Classifiers
Nearest Neighbor Classifiers CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Nearest Neighbor Classifier Let X be the space
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationEnhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,
More informationThe Effects of Outliers on Support Vector Machines
The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationDictionary learning Based on Laplacian Score in Sparse Coding
Dictionary learning Based on Laplacian Score in Sparse Coding Jin Xu and Hong Man Department of Electrical and Computer Engineering, Stevens institute of Technology, Hoboken, NJ 73 USA Abstract. Sparse
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationRobust Face Recognition via Sparse Representation
Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of
More informationRobust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma
Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected
More informationComparison of different preprocessing techniques and feature selection algorithms in cancer datasets
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationGaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017
Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other
More informationECE 285 Class Project Report
ECE 285 Class Project Report Based on Source localization in an ocean waveguide using supervised machine learning Yiwen Gong ( yig122@eng.ucsd.edu), Yu Chai( yuc385@eng.ucsd.edu ), Yifeng Bu( ybu@eng.ucsd.edu
More informationBUAA AUDR at ImageCLEF 2012 Photo Annotation Task
BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn
More informationMachine Learning and Bioinformatics 機器學習與生物資訊學
Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationSubspace Clustering. Weiwei Feng. December 11, 2015
Subspace Clustering Weiwei Feng December 11, 2015 Abstract Data structure analysis is an important basis of machine learning and data science, which is now widely used in computational visualization problems,
More informationTanagra Tutorials. Let us consider an example to detail the approach. We have a collection of 3 documents (in French):
1 Introduction Processing the sparse data file format with Tanagra 1. The data to be processed with machine learning algorithms are increasing in size. Especially when we need to process unstructured data.
More informationSupplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features
Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features Sakrapee Paisitkriangkrai, Chunhua Shen, Anton van den Hengel The University of Adelaide,
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationCS228: Project Report Boosted Decision Stumps for Object Recognition
CS228: Project Report Boosted Decision Stumps for Object Recognition Mark Woodward May 5, 2011 1 Introduction This project is in support of my primary research focus on human-robot interaction. In order
More informationKernel Principal Component Analysis: Applications and Implementation
Kernel Principal Component Analysis: Applications and Daniel Olsson Royal Institute of Technology Stockholm, Sweden Examiner: Prof. Ulf Jönsson Supervisor: Prof. Pando Georgiev Master s Thesis Presentation
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationFUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP
Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms 14 (2007) 103-111 Copyright c 2007 Watam Press FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationControlling False Alarms with Support Vector Machines
Controlling False Alarms with Support Vector Machines Mark Davenport Clayton Scott Rice University dsp.rice.edu Richard Baraniuk The Classification Problem Given some training data...... find a classifier
More informationLarge-scale visual recognition Efficient matching
Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More information