Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Similar documents
Dimension Reduction CS534

Nonlinear Dimensionality Reduction Applied to the Classification of Images

Non-linear dimension reduction

Dimension Reduction of Image Manifolds

Selecting Models from Videos for Appearance-Based Face Recognition

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Clustering and Visualisation of Data

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Head Frontal-View Identification Using Extended LLE

Locality Preserving Projections (LPP) Abstract

Extended Isomap for Pattern Classification

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Local Linear Embedding. Katelyn Stringer ASTR 689 December 1, 2015

Locality Preserving Projections (LPP) Abstract

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Nonlinear Dimensionality Reduction Applied to the Binary Classification of Images

Principal motion: PCA-based reconstruction of motion histograms

Figure (5) Kohonen Self-Organized Map

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

Applied Neuroscience. Columbia Science Honors Program Fall Machine Learning and Neural Networks

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

Singular Value Decomposition, and Application to Recommender Systems

Large-Scale Face Manifold Learning

10-701/15-781, Fall 2006, Final

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Face Recognition using Rectangular Feature

Manifold Learning for Video-to-Video Face Recognition

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Learning a Manifold as an Atlas Supplementary Material

Outlier detection using autoencoders

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Network Traffic Measurements and Analysis

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

Applying Supervised Learning

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Automatic Alignment of Local Representations

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

Laplacian Faces: A Face Recognition Tool

Locally Linear Landmarks for large-scale manifold learning

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Semi-Supervised Clustering with Partial Background Information

Cluster Analysis: Agglomerate Hierarchical Clustering

Linear Discriminant Analysis for 3D Face Recognition System

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Performance Level Descriptors. Mathematics

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

Deep Learning with Tensorflow AlexNet

Recognition, SVD, and PCA

Artificial Neural Networks (Feedforward Nets)

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Methods for Intelligent Systems

CIE L*a*b* color model

InterAxis: Steering Scatterplot Axes via Observation-Level Interaction

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Lecture Topic Projects

Grundlagen der Künstlichen Intelligenz

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Unsupervised Learning : Clustering

Non-Parametric Modeling

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

Work 2. Case-based reasoning exercise

Tensor Sparse PCA and Face Recognition: A Novel Approach

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Instantaneously trained neural networks with complex inputs

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis

METHODS FOR TARGET DETECTION IN SAR IMAGES

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Face Recognition Using SIFT- PCA Feature Extraction and SVM Classifier

Emotion Classification

Level 3 will generally

Manifold Preserving Edit Propagation

Dimension reduction : PCA and Clustering

Unsupervised Learning

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Chapter 1. Using the Cluster Analysis. Background Information

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Clustering CS 550: Machine Learning

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Facial Expression Detection Using Implemented (PCA) Algorithm

Application of Spectral Clustering Algorithm

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Nearest Neighbor Predictors

ECG782: Multidimensional Digital Signal Processing

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

Radial Basis Function Neural Network Classifier

Artificial Neural Networks Lab 2 Classical Pattern Recognition

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang

Does the Brain do Inverse Graphics?

Transcription:

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1

ABSTRACT. This work is to explore a methodology for recognizing patterns such as handwritten digits as a computer program just as we humans recognize them. The Locally Linear Embedding (LLE) algorithm is a simple yet powerful algorithm to recognize handwritten characters by finding a set of linearly independent components for a given set of characters. Then the algorithm expresses an arbitrary character image as a linear combination of the linearly independent set. That is, the algorithm discovers mathematically meaningful features for a large input example pattern set and then uses these features to recognize a new input character pattern. We use simulation via Matlab with an input of handwritten digits to understand how the LLE algorithm mathematically reduces the dimensionality of the inputs, finds these features, applies the back-propagation algorithm to learn,and then minimizes the errors in categorizing them into particular categories. The experiment demonstrates that the LLE combined with back-propagation algorithm can learn and adjust its results when it inaccurately categorizes an arbitrary input and eventually predict the correct category it belongs too as well as establishes its viability as a methodology to solving far more complex problems of pattern recognition. 2

Our goal in this experiment was to find some features that identify each character during the training phase and then to recognize the digit that represents an arbitrary input image from the previously identified features. We do a study of the LLE algorithm with back propagation using the training image set of handwritten digits from the benchmark MNIST database as an input within Matlab. The digits are each images of 28*28 pixels stored in a row of a matrix 784 cells long with 100 images for each digit from 0-9. A set of handwritten digits can be written in a multitude of ways, but each has certain recognizable features. A 0 is a circle, 1 is a straight line, and a 2 is a half circle and a line. Each digit is made up of many data points, but they can be defined by what makes them unique from each other. The LLE algorithm reduces the dimensionality of a series of inputs and defines them as features computed in a mathematically meaningful way and back-propagation to adjust the results when they inaccurately categorize an inputted digit and eventually predict the correct category to which it belongs. Some of the Digits: Reducing the dimensionality of an input is the act of taking each image which has many data points and simplifying it into attributes or features. Two of the most common ways to go about this are PCA (principal component analysis) and MDS (multidimensional scaling). MDS takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function. PCA allows us to compute a linear transformation that maps 3

data from a high dimensional space to a lower dimensional sub-space. Both methods are equivalent if the distances correspond to Euclidean distances, are both simple to implement, and neither involves local minima. However, neither is capable of generating highly nonlinear embeddings. With the LLE algorithm, we can do that as well as map data into a single global coordinate system. A single global coordinate system means that every digit you input is defined as a linear combination of the same features. In our experiment, the min state of a digit is nine features because the dimensionality of the min state should be less by one than the number of classes, of which there are 10, the digits 0 through 9. There are two ways of doing this. One way is to fix a radius for a hypersphere, the data within which are taken as neighbors. In this case, the number of neighbors may not be fixed. The other way is to define the number of neighbors (K) for each data point. The local geometry of each image is modeled as a linear combination of these K nearest neighbors. Using k=18, store references to each nearest neighbor in a K x N proximity matrix where every column has 18 references that will form an approximation of the image. You can represent each image as a linear combination of its neighbors using: If x j is not a neighbor of x i, then w ij is 0 and is not computed in the sum to find x i. To solve the equation we must mathematically discover the value of w ij for only our 18 neighbors. We want to find the reconstruction weights so for each image, we create a matrix that takes the indexes of the neighbors from the proximity matrix and puts each neighbor into its own row of 784 cells, using the indexes of the proximity matrix. Next, create a 18 1 vector of the original image repeated, calling it Z, and subtract each neighbor from it. Compute the local covariance with C = Z * Z and finally solve C *w = 1 for w. We create the following equation from the above: 4

We use the above equation which states that all the sums will add to together to be one, noting that though the resultant matrix is 1000*1000 only the 18 active weights will actually contain values. Next, map the coordinates for each image. To do this, create a sparse matrix of 1000 1000 that has 0s for all weights except for the neighbors. Find the covariance matrix using the sparse matrix as M in M = (I-W) (I-W). Perform Eigen Analysis on M. Choose d Eigenvectors corresponding to the lowest d non-zero Eigenvalues. The matrix thus obtained is a d X N matrix. Each column is the corresponding d dimensional vector obtained by reducing the dimensionality of the original vector. Use these eigenvectors and the weights to create a 1000*9 matrix that shows each image as a linear combination of these components. The discovered weights times the neighbors added together will result in the original image. The closer the weight is to one the closer the neighbor is to the image. We use the dimensionality reduced vector of an image obtained from the LLE algorithm as input into the hidden layer of the back propagation algorithm. The results of the hidden layer are then used as 5

input to find the digit represented by the nine features. The hidden layer in our experiment consists of 30 nodes which are each a linear combination of the nine features. The network layer represents each digit as a linear combination of the 10 digits. The goal is to use the training images to allow the neural network to learn and to accurately categorize an image as a digit. To do this, we must first define two matrices, v and w, consisting of random numbers between -1 and 1. The matrix v will contain the end result of the weights multiplied by the initial 9 features to find the 30 hidden layer nodes, and w will contain the weights multiplied by the 30 hidden layer nodes that define each digit uniquely. The deltav matrix is 30 * 784 and is used to hold the 784 bits that make each of the 30 hidden layer nodes and deltaw which is 10*30 contains the 9 features each of the 30 hidden layer nodes consists of. The algorithm will iterate 4000 times or until it reaches weights that find the correct digit for an input image. Each iteration slightly adjusts the weights, each time improving the accuracy of predicting a digit. Each iteration looks at all 1000 images in the training set and divides the by the maximum to normalize the input as between -1 and 1. For each image within the iteration, the algorithm calls getoutput which returns the values of the hidden layer nodes found by using the random weights in v defined earlier and the normalized matrix V which contains the images defined as nine features. It then calls getoutput again this time with the first output and the random weights in w to define each digit as a linear combination of the 30 hidden layer nodes. Each iteration adjusts the weights that incorrectly categorized images to increase accuracy. Each iteration reduces the mean error and as the iterations increase, the error tends to 0. 6

Each number has a variety of inputs with different thicknesses, orientations, shapes, and features that overlap with one another. Because the LLE algorithm classifies based on a small set of components that express differences between digits, automatic classification can be difficult Numbers that have 90 percent or greater recognition (0, 1, 6, 7, 8) have less variety of ways to be drawn. 0 is a circle, 1 is a straight line, 7 is a horizontal and vertical line, and 8 is two circles on top of one another. Numbers that have 50 percent or less recognition (2, 3, 4, 5, 9) can be drawn in such a way that they closely resemble another digit. The following digits are very similar to one another: 2 and 3: 5 and 6: 4 and 9: 7

Works Cited Saul, Lawrence K, Park Ave, Florham Park, and Sam T Roweis. 2001. An Introduction to Locally Linear Embedding. Kouropteva, O, O Okun, and M Pietikäinen. 2003. Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine 8