Classification of Hand-Written Numeric Digits

Similar documents
Exploring high-dimensional classification boundaries

A Method for Comparing Multiple Regression Models

Bagging for One-Class Learning

Technical Whitepaper. Machine Learning in kdb+: k-nearest Neighbor classification and pattern recognition with q. Author:

Wild Mushrooms Classification Edible or Poisonous

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Rank Measures for Ordering

Support Vector Selection and Adaptation and Its Application in Remote Sensing

Applying Supervised Learning

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Lorentzian Distance Classifier for Multiple Features

CS 340 Lec. 4: K-Nearest Neighbors

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

April 3, 2012 T.C. Havens

Classification Algorithms in Data Mining

Project Presentation. Pattern Recognition. Under the guidance of Prof. Sumeet Agar wal

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

Combined Weak Classifiers

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Feature-weighted k-nearest Neighbor Classifier

Neural Network Weight Selection Using Genetic Algorithms

3D Object Recognition using Multiclass SVM-KNN

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Character Recognition from Google Street View Images

Lecture 25: Review I

Intro to Artificial Intelligence

From dynamic classifier selection to dynamic ensemble selection Albert H.R. Ko, Robert Sabourin, Alceu Souza Britto, Jr.

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Nearest Neighbor Classification

An Empirical Study on Lazy Multilabel Classification Algorithms

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

Beyond Bags of Features


THE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS

Machine Learning in Biology

Distribution-free Predictive Approaches

Predict the box office of US movies

ECE 5470 Classification, Machine Learning, and Neural Network Review

Trade-offs in Explanatory

A study of classification algorithms using Rapidminer

An Empirical Study of Lazy Multilabel Classification Algorithms

Facial Expression Classification with Random Filters Feature Extraction

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

BRACE: A Paradigm For the Discretization of Continuously Valued Data

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Adaptive Gesture Recognition System Integrating Multiple Inputs

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Lecture 9: Support Vector Machines

Sentiment Analysis for Amazon Reviews

Classifying Depositional Environments in Satellite Images

CS294-1 Final Project. Algorithms Comparison

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

Otto Group Product Classification Challenge

Supervised vs unsupervised clustering

Slides for Data Mining by I. H. Witten and E. Frank

Dynamic Thresholding for Image Analysis

Package kernelfactory

Machine Learning Techniques for Data Mining

I211: Information infrastructure II

Convolution Neural Networks for Chinese Handwriting Recognition

Machine Learning Classifiers and Boosting

CS 229 Midterm Review

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

Comparison of various classification models for making financial decisions

Machine Learning. Classification

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

An Ensemble Method for Clustering

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*

Estimating Map Accuracy without a Spatially Representative Training Sample

Random Forest A. Fornaser

Random Search Report An objective look at random search performance for 4 problem sets

SVM: Multiclass and Structured Prediction. Bin Zhao

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Structural Health Monitoring Using Guided Ultrasonic Waves to Detect Damage in Composite Panels

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

University of California, Berkeley

Learning independent, diverse binary hash functions: pruning and locality

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

Online Mathematical Symbol Recognition using SVMs with Features from Functional Approximation

SVM-KNN : Discriminative Nearest Neighbor Classification for Visual Category Recognition

Sketchable Histograms of Oriented Gradients for Object Detection

Link Prediction for Social Network

Transcription:

Classification of Hand-Written Numeric Digits Nyssa Aragon, William Lane, Fan Zhang December 12, 2013 1 Objective The specific hand-written recognition application that this project is emphasizing is reading numeric digits written on a tablet or mobile device. We were interested in two goals. First of all, we wanted to build a model that can classify a hand-written digit on a tablet with high accuracy. Second of all, we were interested in how training on a specific set of users will improve the predictions of digits written by those users. 2 Data The primary dataset used is the Pen-Based Recognition of Handwritten Digits Data Set from the UCI Machine Learning Repository. [1] This dataset is produced by collecting roughly 250 digit samples each from 44 writers, written on a pressure sensitive tablet that sent the location of the pen at fixed time intervals of 100 milliseconds. [2] Each digit is written inside a 500 x 500 pixel box, and then was scaled to an integer value between 0 and 100 to create consistent scaling between each observation. Spatial resampling is used to obtain 8 regularly spaced points on an arc trajectory. The data can be visualized by plotting the 8 sampled points based on their (x, y) coordinates, along with lines from point to point. Because the tablet sensed the location of the pen throughout time, the order of the points gives the direction that the pen was moving. The data set contains the x and y coordinates of each of the 8 points, for a total of 16 integer variables ranging from 0 to 100. There are 7494 test data points and 3498 training data points. Table 1: Number of Digits Used 0 1 2 3 4 5 6 7 8 9 Total Training 780 779 780 719 780 720 720 778 718 719 7493 Test 363 364 364 336 364 335 336 364 335 336 3497 The training set consists of samples written by 30 writers and the testing set consists of samples written by 14 writers. The models were evaluated based on the performance on the testing dataset after being trained on the training dataset. We also explore the difference between errors when the training and 1

(a) Sample Points w/ Ordered Line Segments ( 3 ) (b) Sample Points w/ Ordered Line Segments ( 8 ) Figure 1 testing sets are sourced from separate writers and when the training and testing sets have some writers in common. 3 Model After using various supervised learning techniques, the optimized error for each approach is shown in the table. Table 2: Error Rates Over Diverse Classifiers SVM KNN Random Forest Softmax K-Means Test Error 0.0172 0.0215 0.0349 0.18 0.25 The Support Vector Machine and K-Nearest Neighbor models perform the best. The data, which consists of 8 (x, y) coordinates is perhaps not surprisingly most responsive to Euclidean-distance based classifiers. The SVM model is implemented using the e1071 package in R [3]. The one-against-one approach is used to extend binary classifying SVMs to multiclass predictors, meaning a series of SVMs for each pair group of the digits are created. The class probabilities for each test observation are computed using quadratic optimization. The digit with the highest probability is chosen as the prediction. Following the idea that a prediction made with low probability may be a weaker prediction, the data is separated into two groups: one group has a predicted digit with probability above the threshold of 0.525, another 2

with a predicted digit with a probability under the threshold. Table 3: Error Rates Grouped By Probability Threshold SVM Test Error KNN Test Error SVM Pr < 0.525 0.4722 0.2777 SVM Pr 0.525 0.0124 0.0194 3.1 Primary Model The main model capitalizes on this relationship. Predictions based on the SVM model is used where the probability of the prediction is above the threshold, while predictions based on the KNN model is used where the probability of the SVM prediction is below the threshold. The value of the threshold and the cost parameter for the SVM regularization is chosen based on which values give the lowest error, with the threshold at 0.525 and the cost parameter at 3.75. The model combining SVM and KNN models gives a final test error rate of 0.0152. Error rates in range of 1-2 % are considered competitive for this dataset [5]. 3.2 Sampling From the Whole Dataset The previous models were evaluated based on the error classifying the test set of 14 writers, after being trained on the training set of 30 independent writers. Because an individual may exhibit unique patterns when writing a digit, It is supposed that the correlation between two samples of a specific digit written by one individual is higher than the correlation between two samples of a specific digit written by two separate individuals. One of our motivations was to learn how much continuous learning on a tablet would improve the predictions. In order to gain insight into this problem, we remove samples from the test data and augment them into the training set, then find the error on classifying the remaining test data. In order to train on a representative sample of the test writer s digits, we randomly selected a portion of the test data to migrate to the training set, increasing this proportion by 5% each time until we are training on half of the test data and predicting on the other half. This simulates a situation where specific users gradually feed the learning system a number of their own inputs with the correct labels. The results show that as the model receives more and more input from the set of test users, the model performs better and better with a minimum error rate of just under 1%. On the whole, SVM still outperforms KNN. Additionally, we compared the error based on training and testing data come from separate sources to the error based on a 3-fold cross-validation. Using the primary model found in section 4, the error is found using a 3-fold cross-validation where the three sets are randomly sampled from the combined test and training datasets. A 3-fold cross-validation gives test and training sets of approximately the same size as the original sets. When the training and testing sets do not come from separate writing sources, the error reduces from 1.52% to 0.56%. 3

Figure 2: Error (Percent Misclassified) by Test Users Input 4

Using the same model that produced the optimal error of the isolated test dataset, the cross-validation error is 0.0056. When the training and testing sets do not come from separate writing sources, the error reduces from 1.52% to 0.56%. 4 Conclusion The supervised learning techniques that performed best with this data were Support Vector Machines and K-Nearest Neighbor. Our final model was one that used SVM predictions when the SVM was confident and KNN predictions when the SVM was less confident. The models performed better when it was trained on samples from the same writers who wrote the test samples, indicating that a model that continues to learn to a specific tablet-user will better predict that user s writing. References [1] Bache, K. & L i c h m a n, M. ( 2 0 1 3 ). U C I Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [2] I. S. Dhillon, D. S. Modha, and W. S. Spangler, Class Visualization of High-Dimensional Data with Applications, August 1999, pg. 18. [3] David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2012). e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-1. http://cran.r-project.org/package=e1071 [4] Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 [5] Keysers et al. Deformation Models for Image Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 8. August 2007, pg. 1430. 5