Data Mining Final Project on NIPS Competition 2003

Size: px
Start display at page:

Download "Data Mining Final Project on NIPS Competition 2003"

Transcription

1 Data Mining Final Project on NIPS Competition 2003 Chun F.Hsu R Yao-Hui Yu R Yin-Tzu Lin R June 18, 2006 Abstract Feature Selection is an important issues in data mining, its main purpose is to reduce dimensionality and to filter the noise. In this project, we will perform classification on data sets with preprocessing by feature selection. We focus on two feature selection methods, PCA and univariate significance test and combined them with SVM to do predictions. The output prediction balanced error rate and area under curve(auc) has improved by using Feature selection methods. We will also have short discuss on different classifiers performance, such as KNN. Contents 1 Introduction Competition Scoring Criteria and Analysis AUC BER Training Procedure Feature Selection Method PCA Univariate Significance Test Result Experiment Result Using Libsvm Result without F.S Final Result Comparison with PCA+KNN Summary Critism on UST Critism on PCA Group Contribution and Participation

2 List of Tables 1 First Observation After Data Feature Selection Result without F.S Final Result Madelon comparison between SVM and KNN Gisette comparison between SVM and KNN Introduction 1.1 Competition Scoring Criteria and Analysis The data sets we used come from NIPS2003, there is a competition using five data sets to do classification. Our goal is to figure out why 1st place has won and by tracing their work, we will do a implementation on our own. In NIPS2003, they performed a series of test on the output result and collect these as the final score. However, we don t have the True labels of testing set, and we don t know where they had inserted noise in which they called Probes. So the only measures we can do is BER and AUC. We will apply them on adjusting results AUC AUC(Area Under Curve)has been proposed by many researcher using it to evaluate the predictive ability of learning algorithms. It will calculate the area under ROC curve. Our implementation of AUC is based on [1]. The formula is  = S 0 n 0 (n 0 + 1)/2 n 0 n 1, (1) where n 0 and n 1 are the numbers of positive and negative examples, S 0 = r i, r i is the rank of i th example in the ranked list BER BER(Balanced Error Rate) is a measure that wants to pull within the importance of one large size data set and a small size data set. It set the weight of two data set to be equvalent as 50% each. The formula would be BER = F P R + F NR 2, while F P R is F P /F P + T P and F NR is F N/F N + T N, which are False Positive, True Positive,False Negative and True Negative of prediction result. 1.2 Training Procedure Our training procedure can be separated into several steps, 1. Data Preprocessing: Our classifier is libsvm[3], it is a commonly used SVM tool. We have to combine class labels with attributes or features, then throws into textsllibsvm. features, and transform them into sparse mode. However, in R[4] environment, they have a package e1071 to do this. The function is called write.matrix.csr. (2) 2

3 2. Feature selection: In this stage, we split the flow to two directions, one is using PCA and another is using permutation test. In both methods, when we received the data after selection, we used binary search to search for the best # of attributes. For examples, in Madelon, by using this method. We get the result that only need 5 attributes to acheive the best performance. 3. Run Classifiers: After feature selection, we used grid search for the best parameters in libsvm, therefore, when we grab the parameters in our hands, then we do training with probability output. However, since we don t have test data label, so we treated the valid data as test data and uses Cross Validation on training data to seek for good parameters. Scaling between 0 1 is also need to be considered. It may change the result significantly. 4. Compute BER and AUC: Using our implementations of these measures, we got a neat evaluations of our results and classfiiers. 2 Feature Selection 2.1 Method PCA The PCA (Principal Component Analysis) is a procedure computes the most meaningful basis to re-express a noisy, garbled data set. It is useful on pattern recognition, image compression, and feature selection. Here we use it to select useful features to improve our learning model. Below is the basic PCA steps used in this project. [6][7] 1. Construct the covariance matrix M for the training data. 2. Compute the eigenvectors of M. 3. Use the first k eigenvectors as the new basis, and transform the training and validate data to this new basis. K need to be preselected. 4. Now we can use the transformed data to do training Univariate Significance Test Significance Test is to set up a null hypothesis(h 0 ), then we use several statistical methods to compute the probability estimate(p-value) on this hypothesis H 0. i.e. if p = 0.5 means It is 50 % likely that H 0 is true. The winner s Method: They assume the relevant variables will be at least somewhat relevent on their own,and only apply significance test on features that were non-zero in at least 4 training instances. He use Pearson and Spearman 3

4 correlation between each attribute and class label, finally, using permutation test to estimate p-value, the formula of computing correlation are as follows: r s = 1 6 (X Y ) 2 n(n 2, (3) 1) p-value r p = X Y XY n ( X 2 ( X) 2 n )( Y 2 ( Y ) 2, (4) n ) p = 2 min( 1 I(r xyπ >= r xy ), 1 I(r xyπ <= r xy )) (5) n! n! π π x is feature, y is label, n is the total number of instance and there are m! possible permutations. y π represent one of this permutation. { I(.) = 1, if. = T rue I(.) = I(.) = 0, else (6) Thus, if we want the p-value to be larger, then the number of (r xyπ >= r xy ) and (r xyπ <= r xy ) should be identical. However, when we try the method mentioned above, we encountered a big problem: the number of permutations could be exhaustively large. Take the case if n = 6000 (number of instances of dataset Gisette), the permutations will be e 20065, which will be too much to implement. So, we try to find some alternatives [2].In this paper, the author use another approach of permutation test, let X = [X 1, X 2, X 3...X k ] t be features, Y is label 1 and -1, He assume that the relevence of feature X is measured as the difference between Pr[X = x Y = 1]and Pr[X = x Y = 1] Then he use the following four methods to compute this difference: difference in sample means(r M ), Symmetric variant of the Kullback- Leibler distance(j-measure,r J ), Information gain(r IG ) and chi 2 statistics-based measure(r CHI ). Afterall, he defined Dj 1 = x i,j y i = 1 and D 1 j = x i,j y i = 1 as the values of j t h features of all instances with class 1 and -1. θ(dj 1, D 1 j ) is the statistic methods mentioned above, π j represent p-value and the null hypothesis H 0 : Pr[X j Y = 1] = Pr[X j Y = 1]. It says the relevance of X j is inversely proposional to p-value π j then use the permutation test to estimate π j for each feature j. Formula 1 Let U (b) and V (b) denote the shuffled values of the j th features of all instances with class 1 and -1. b is one of the permutation. p = 1 B + 1 b I(θ(U (b), V (b)) > θ(d 1 j, D 1 j )) To reduce the large computations, just do B times permutation instead of n!, the author mentioned that the value above(i(θ(u (b), V (b))) is a random variable of binomial distribution b(b, π j ), so to limit the estimation error under 10%, he set the coefficient of variation CV = 1 πj π jb = 0.1, thus if we want to reduce 4

5 the permutation to 2000, only p-value estimated larger than 0.05 is acquired. Since when π j is small, it is not reliable in ranking, so using Z-score instead, Z-score = θ(d1 j, D 1 j ) mean(θ(u, V )) std(θ(u, V )) (7) to rank the estimate p-values under In his conclusion, difference in sample means(r M ) and Symmetric variant of the Kullback-Leibler distance(jmeasure,r J ) are better. So we adapted these two instead of all four. Therefore, for our implementation in significance test. We slightly combined the winner s methid and the paper s method. 1. We picked the features with at least 4 instances non-zero 2. Base on the paper s algorithm and use r M and r J [5] their formula is as follows, r M (X) = E[X Y = 1] E[X Y = 1], (8) r J (X) = Pr[X = x Y = 1] Pr[X = x Y = 1] Pr[X = x Y = 1] log 2 ( Pr[X = x Y = 1] ), 3. Apply shuffles on class label (the same as the winner). 4. Choose B=150 to estimate p-value(as for p-value under 0.4, use Z-score). Because the running time of 2000 permutations is exhaustive long(1 week or more). Since what we want is the larger θ(dj 1, D 1 j ) the better, so we consider bigger Z-score to be more important feature. 5. After compute the p-vlaue, we have 2 kinds of importance order, one is using r M as θ(d 1 j, D 1 j ), the other is r J. 2.2 Result Here we listed the result after picked by feature selectionin table 1, ranking is based on their Error Rate. Sel stands for Feature Selection algorithm, Ffeat means percentage of numbers of feature selected. Scale will scale data in 0 to 1. Parameter stands for the suggested parameters by libsvm. UST is abbreviation of Univariate Significance Test. (9) Table 1: First Observation After Data Feature Selection Data Set Sel Ffeat Scale Parameter CV Error Rate Arcene UST 25% Yes 2048, % Dexter UST 1.09% No 2, % Dorothea UST NA NA NA NA Gisette PCA 1% Yes 2, % Gisette UST NA NA NA NA Madelon PCA 1% Yes 0.5, % Madelon UST 1% Yes 32, % 5

6 Table 2: Result without F.S Data Set Scale Parameter CV Err Test Err. AUC Arcene Yes 32, % 16.0% 93.58% Dexter Yes 32, % 7.0% 97.81% Dorothea No 8, % % 93.95% Gisette No NA 19.55% NA NA Madelon Yes 32.0, % 3 Experiment Result 3.1 Using Libsvm We decided to used libsvm as our tool doing classification, one reason is it has almost all the strong and neat tools to help us run the experiment. Table 2 is the plain result which doesn t do any feature selecting work, actually in libsvm there is a feature selection tool called FScore. However, it is not in our project plan. We are supposed to follow our own discovered method to do selecting work. The result show that even without feature selection, libsvm performed very well on Aecene, Dexter, Dorothea. In Madelon, the result is a little bit not good, as compare to above data. In Gisette, training take almost a day to complete it, due to today, grid on Gisette is still running. 3.2 Result without F.S First we had run all datas without any feature selection on libsvm, in Table 2 it shows the experiment result. CV err stands for error of Cross Validation on training set. Parameter is c and gamma belongs to SVM. 3.3 Final Result The final result is on Table3, which we add two performance measure into it, AUC and BER. We had combined both selection method and chose the best on stands for each data set. However, Arcene and Dexter s BER and AUC is better in plain SVM, but we are closed to it. Without scaling, Gisette and Madelon won t achieve this result, but when in plain SVM, scaling in Madelon makes the result even worse, this is very tricky as if scaling doesn t work at one data set, it won t work at the same dat set if this data set has been after some transformation. Gisette s result is very good, it has reach above 95% accuracy, the Ffeat score also shows the Gisette result is indeed inspiring. Due to today, Dorothea data of UST has nt come out, or our result may be even better adding feature selection factor onto it. 3.4 Comparison with PCA+KNN Our PCA procedure only runs on Madelon and Gisette, so we compare the performance of knn and svm on these two dataset. 6

7 Table 3: Final Result Data Set Selection Ffeat Testing Err. AUC BER Arcene UST 25% 19.00% 90.99% 19.23% Dexter UST 1.09% 21.33% 88.10% 19.9% Dorothea SVM % % 93.95% 13.69% Gisette PCA 0.01% 5.3% 99.6% 4.79% Madelon PCA 0.01% 10.5% 95.82% 10.49% In table 4 and table 5, We can see that after feature selection, both classifiers Table 4: Madelon comparison between SVM and KNN Classifier Ffeat Testing Err. svm % knn(k=5,one of the best in 1nn 100nn) % knn(avg. of 1nn 100nn) % Table 5: Gisette comparison between SVM and KNN Classifier Ffeat Testing Err. svm % knn(k=5,one of the best in 1nn 100nn) % knn(avg. of 1nn 100nn) % improves to a similar level of accuracy. They nearly doesn t make any difference. But before feature selection, they both have very poor result. This shows our direction of feature selection really improves a lot as against no feature selection work. 4 Summary The whole project shows that if we apply feature selection method on certain data sets, it will improve the performance and lowers the dimensionality or complexity of classifying work. In order to complete this project properly, we will sent our result to the website and wait to see the response on how far we get in this project. 4.1 Critism on UST Because we reduce the permutation times to 150 to avoid exhaustive computation, it is inevitable that the error rate will be larger. Nevertheless, we still can see some potentials in our experiment results that if we have a more powerful computer to run more permutations, the result will be better. 7

8 4.2 Critism on PCA Although our PCA only works at two data sets, but it has shown a significant improvement on two data sets, it helps to decrease error rate and raise AUC to a acceptible level. 4.3 Group Contribution and Participation All three of us attends this project diligently, most ideas came from iterative discussion, and below is our individual contributions. Chun F.Hsu : AUC, fileconverter tool code implementation, report designer, plain SVM testing, data and result organizing and the coordinator in the project. Yao-Hui Yu: PCA part designer and ran whole PCA process, PCA+KNN experiment, BER tool design. Yin-Tzu Lin: Univariate Significance Test designer, paper collecting and surveying, SVM testing on UST data. References [1] Jin Huang Charles X.Ling, Using AUC and Accuracy in Evaluating Learning Algorithms IEEE Transaction on Knowledge and Data Engineering, Vol 17, No.3, 2005 [2] P. Radivojac, Z. Obradovic, A. K. Dunker, S. Vucetic. Feature selection filters based on the permutation test European Conference on Machine Learning, ECML 2004, Pisa, Italy,pp , September 2004 [3] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, Software available at cjlin/libsvm [4] [5] Lin, J., Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, (1): p [6] Lindsay I Smith. A tutorialon Principal Components Analysis, tutorials/principal components.pdf [7] Jon Shlens. TUTORIAL ON PRINCIPAL COMPONENT ANAL- YSIS Derivation, Discussion and Singular Value Decomposition 8

Combining SVMs with Various Feature Selection Strategies

Combining SVMs with Various Feature Selection Strategies Combining SVMs with Various Feature Selection Strategies Yi-Wei Chen and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan Summary. This article investigates the

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Chapter 22 Information Gain, Correlation and Support Vector Machines

Chapter 22 Information Gain, Correlation and Support Vector Machines Chapter 22 Information Gain, Correlation and Support Vector Machines Danny Roobaert, Grigoris Karakoulas, and Nitesh V. Chawla Customer Behavior Analytics Retail Risk Management Canadian Imperial Bank

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

ENEE633 Project Report SVM Implementation for Face Recognition

ENEE633 Project Report SVM Implementation for Face Recognition ENEE633 Project Report SVM Implementation for Face Recognition Ren Mao School of Electrical and Computer Engineering University of Maryland Email: neroam@umd.edu Abstract Support vector machine(svm) is

More information

Software Documentation of the Potential Support Vector Machine

Software Documentation of the Potential Support Vector Machine Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany

More information

Stability of Feature Selection Algorithms

Stability of Feature Selection Algorithms Stability of Feature Selection Algorithms Alexandros Kalousis, Jullien Prados, Phong Nguyen Melanie Hilario Artificial Intelligence Group Department of Computer Science University of Geneva Stability of

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Combining SVMs with Various Feature Selection Strategies

Combining SVMs with Various Feature Selection Strategies º Đ º ³ È Combining SVMs with Various Feature Selection Strategies Ï ö ø ¼ ¼ á ¼ ö Ç Combining SVMs with Various Feature Selection Strategies by Yi-Wei Chen A dissertation submitted in partial fulfillment

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Automatic Fatigue Detection System

Automatic Fatigue Detection System Automatic Fatigue Detection System T. Tinoco De Rubira, Stanford University December 11, 2009 1 Introduction Fatigue is the cause of a large number of car accidents in the United States. Studies done by

More information

Dimensionality Reduction, including by Feature Selection.

Dimensionality Reduction, including by Feature Selection. Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain

More information

Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter

Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter Marcin Blachnik 1), Włodzisław Duch 2), Adam Kachel 1), Jacek Biesiada 1,3) 1) Silesian University

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Noisy iris recognition: a comparison of classifiers and feature extractors

Noisy iris recognition: a comparison of classifiers and feature extractors Noisy iris recognition: a comparison of classifiers and feature extractors Vinícius M. de Almeida Federal University of Ouro Preto (UFOP) Department of Computer Science (DECOM) viniciusmdea@gmail.com Vinícius

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns Artwork by Leon Zernitsky Jesse Rissman NITP Summer Program 2012 Part 1 of 2 Goals of Multi-voxel Pattern Analysis Decoding

More information

Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su

Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su Radiologists and researchers spend countless hours tediously segmenting white matter lesions to diagnose and study brain diseases.

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Separating Speech From Noise Challenge

Separating Speech From Noise Challenge Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Linear Discriminant Analysis for 3D Face Recognition System

Linear Discriminant Analysis for 3D Face Recognition System Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University  Infinite data. Filtering data streams /9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

PCA-based algorithm for constructing ensembles of feature ranking filters

PCA-based algorithm for constructing ensembles of feature ranking filters PCA-based algorithm for constructing ensembles of feature ranking filters Andrey Filchenkov, Vladislav Dolganov and Ivan Smetannikov ITMO University - International Laboratory Computer technology Kronverksky

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Data preprocessing Functional Programming and Intelligent Algorithms

Data preprocessing Functional Programming and Intelligent Algorithms Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute

More information

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition using Eigenfaces SMAI Course Project Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

More information

PROBLEM 4

PROBLEM 4 PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25

More information

Feature Ranking Using Linear SVM

Feature Ranking Using Linear SVM JMLR: Workshop and Conference Proceedings 3: 53-64 WCCI2008 workshop on causality Feature Ranking Using Linear SVM Yin-Wen Chang Chih-Jen Lin Department of Computer Science, National Taiwan University

More information

Information theory methods for feature selection

Information theory methods for feature selection Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský

More information

Feature selection. LING 572 Fei Xia

Feature selection. LING 572 Fei Xia Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262 Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel

More information

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we

More information

Predicting Gene Function and Localization

Predicting Gene Function and Localization Predicting Gene Function and Localization By Ankit Kumar and Raissa Largman CS 229 Fall 2013 I. INTRODUCTION Our data comes from the 2001 KDD Cup Data Mining Competition. The competition had two tasks,

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Lorentzian Distance Classifier for Multiple Features

Lorentzian Distance Classifier for Multiple Features Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey

More information

Nearest Neighbor Classifiers

Nearest Neighbor Classifiers Nearest Neighbor Classifiers CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Nearest Neighbor Classifier Let X be the space

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

The Effects of Outliers on Support Vector Machines

The Effects of Outliers on Support Vector Machines The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

Dictionary learning Based on Laplacian Score in Sparse Coding

Dictionary learning Based on Laplacian Score in Sparse Coding Dictionary learning Based on Laplacian Score in Sparse Coding Jin Xu and Hong Man Department of Electrical and Computer Engineering, Stevens institute of Technology, Hoboken, NJ 73 USA Abstract. Sparse

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Robust Face Recognition via Sparse Representation

Robust Face Recognition via Sparse Representation Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017 Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other

More information

ECE 285 Class Project Report

ECE 285 Class Project Report ECE 285 Class Project Report Based on Source localization in an ocean waveguide using supervised machine learning Yiwen Gong ( yig122@eng.ucsd.edu), Yu Chai( yuc385@eng.ucsd.edu ), Yifeng Bu( ybu@eng.ucsd.edu

More information

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn

More information

Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 機器學習與生物資訊學 Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Artificial Neural Networks (Feedforward Nets)

Artificial Neural Networks (Feedforward Nets) Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Subspace Clustering. Weiwei Feng. December 11, 2015

Subspace Clustering. Weiwei Feng. December 11, 2015 Subspace Clustering Weiwei Feng December 11, 2015 Abstract Data structure analysis is an important basis of machine learning and data science, which is now widely used in computational visualization problems,

More information

Tanagra Tutorials. Let us consider an example to detail the approach. We have a collection of 3 documents (in French):

Tanagra Tutorials. Let us consider an example to detail the approach. We have a collection of 3 documents (in French): 1 Introduction Processing the sparse data file format with Tanagra 1. The data to be processed with machine learning algorithms are increasing in size. Especially when we need to process unstructured data.

More information

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features Sakrapee Paisitkriangkrai, Chunhua Shen, Anton van den Hengel The University of Adelaide,

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

CS228: Project Report Boosted Decision Stumps for Object Recognition

CS228: Project Report Boosted Decision Stumps for Object Recognition CS228: Project Report Boosted Decision Stumps for Object Recognition Mark Woodward May 5, 2011 1 Introduction This project is in support of my primary research focus on human-robot interaction. In order

More information

Kernel Principal Component Analysis: Applications and Implementation

Kernel Principal Component Analysis: Applications and Implementation Kernel Principal Component Analysis: Applications and Daniel Olsson Royal Institute of Technology Stockholm, Sweden Examiner: Prof. Ulf Jönsson Supervisor: Prof. Pando Georgiev Master s Thesis Presentation

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms 14 (2007) 103-111 Copyright c 2007 Watam Press FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Controlling False Alarms with Support Vector Machines

Controlling False Alarms with Support Vector Machines Controlling False Alarms with Support Vector Machines Mark Davenport Clayton Scott Rice University dsp.rice.edu Richard Baraniuk The Classification Problem Given some training data...... find a classifier

More information

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

Discriminate Analysis

Discriminate Analysis Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information