Character Recognition from Google Street View Images

Similar documents
Distribution-free Predictive Approaches

An Exploration of Computer Vision Techniques for Bird Species Classification

Real-time Object Detection CS 229 Course Project

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Apparel Classifier and Recommender using Deep Learning

Using Machine Learning for Classification of Cancer Cells

Image Classification using Bag of Visual Words

Tutorial on Machine Learning Tools

Learning to Recognize Faces in Realistic Conditions

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Deep Model Compression

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Network Traffic Measurements and Analysis

Convolution Neural Networks for Chinese Handwriting Recognition

Polytechnic University of Tirana

Handwritten Script Recognition at Block Level

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Classification of objects from Video Data (Group 30)

Machine Learning in Biology

Rapid Natural Scene Text Segmentation

Facial Expression Classification with Random Filters Feature Extraction

Facial Keypoint Detection

Hand Written Digit Recognition Using Tensorflow and Python

3D Object Recognition using Multiclass SVM-KNN

Handwritten Hindi Numerals Recognition System

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

INTRODUCTION TO DEEP LEARNING

Supervised Learning Classification Algorithms Comparison

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Beyond Bags of Features

A study of classification algorithms using Rapidminer

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION

Handwriting Recognition of Diverse Languages

Allstate Insurance Claims Severity: A Machine Learning Approach

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Computer Vision Lecture 16

CPSC 340: Machine Learning and Data Mining

All lecture slides will be available at CSC2515_Winter15.html

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Abstract. Problem Statement. Objective. Benefits

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

Spatial Localization and Detection. Lecture 8-1

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CS 340 Lec. 4: K-Nearest Neighbors

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Machine Learning Techniques for Data Mining

HCR Using K-Means Clustering Algorithm

Classifying Depositional Environments in Satellite Images

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Shape Context Matching For Efficient OCR

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

A Technique for Classification of Printed & Handwritten text

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

CS 223B Computer Vision Problem Set 3

Deep Learning With Noise

VEHICLE CLASSIFICATION And License Plate Recognition

Slides for Data Mining by I. H. Witten and E. Frank

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

ECG782: Multidimensional Digital Signal Processing

OCR For Handwritten Marathi Script

CS570: Introduction to Data Mining

CS229 Final Project: Predicting Expected Response Times

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Machine Learning Practice and Theory

An Implementation on Object Move Detection Using OpenCV

Bus Detection and recognition for visually impaired people

Applying Supervised Learning

Machine Learning. Chao Lan

Birds in a Forest. Abstract. 2. Methods. 1. Introduction. Sneha Mehta Aditya Pratapa Phillip Summers

Machine Learning 13. week

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

Bagging for One-Class Learning

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Performance Evaluation of Various Classification Algorithms

Music Genre Classification

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

Robust PDF Table Locator

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

END-TO-END CHINESE TEXT RECOGNITION

The Boundary Forest Algorithm

Lecture #11: The Perceptron

Static Gesture Recognition with Restricted Boltzmann Machines

Deep Neural Networks:

CS 231A Computer Vision (Fall 2012) Problem Set 3

arxiv: v1 [cs.cv] 20 Dec 2016

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Lab 9. Julia Janicki. Introduction

Transcription:

Character Recognition from Google Street View Images Indian Institute of Technology Course Project Report CS365A By Ritesh Kumar (11602) and Srikant Singh (12729) Under the guidance of Professor Amitabha Mukerjee

Abstract Character recognition is the conversion of printed or handwritten text (present in an image) into machine encoded format, so it can stored efficiently, searched and edited more quickly or used as an input to text mining and other such applications. Our project aims to recognize characters from natural images obtained from Google Street View. We have tried multiple algorithms to come forward with the most suitable one for this problem. We have employed Matlab and Python as the tools to implement these algorithms. i

Acknowledgement We would like to express our sincere thanks to our Instructor-In-Charge Professor Amitabha Mukerjee for allowing us the opportunity to undertake a project in his course. It was a quite a good learning experience as in a subject like Artificial Intelligence, not everything can be covered in class and hence a project gives us the opportunity to explore many interesting concepts. Sincerely, Ritesh Kumar (11602), Srikant Singh (12729), Indian Institute of Technology, Kanpur. ii

Table of Contents Introduction... 1 Motivation... 1 Related Works... 2 Dataset... 2 Rejected Methods... 3 Methodology... 3 Feature Extraction... 3 Vector of Pixel Values... 3 Histogram of Oriented Gradients (HOG)... 3 Learning on Feature Vectors... 4 Random Forests... 4 K-Nearest Neighbors with LOOF-CV... 5 Support Vector Machine... 5 Results and Analysis... 6 Future Prospects... 7 References... 7 iii

Introduction Our project aims at identifying characters of the English dataset (which comprises of the English alphabets and Hindu-Arabic numerals) from natural images (which includes images taken using handheld devices) obtained from Google Street View. The dataset has images of different sizes, along with different camera angles, lighting environments and image qualities. We have further reduced all the images to a size of 20*20px for further processing. Now our approach to this problem involves two steps, firstly we implement feature extraction from the images, then using the so obtained features, we train our model to predict the class of the given image. Figure 1 Some of the Images which we use in our project src: www.ee.surrey.ac.uk Motivation In the recent years, Google street view has increased in popularity manifold. With its increased usage, it becomes important to have a proper method to recognise images from Google street view. Such an implementation would have many benefits, like if we can recognize and tag such images, a lot of extra, useful information can be added to Google Maps. A good image recognition method can act as a precursor to many applications, like Image-to-Speech App for visually impaired people, which would help them to navigate and recognise their destinations. It can also be used to track various signs, hoardings and advertisements, by recognising their images and maintaining a count on them. 1

Related Works Though a new avenue for research, many interesting works have been done in this field. One of the works involved text detection using sliding window followed by SVM and text recognition using Tesseract (an open source OCR), with a nominal success rate [1].Another work involved the DistBelief [2] implementation of deep neural networks operating directly on the image pixels [3].A very interesting work involved using 2 types of neural network A thin deep neural network (Google Net) and a flat shallow neural net (Alex Network), with quite good results and high accuracies [4]. Dataset We have used a very widely available and popular dataset known Chars74K dataset [5], which contains characters of both English character set as well as of the Kannada script. We however have based our project solely on the English character set. Furthermore as we are only interested in images obtained from Google Street view, our work mainly revolves on the set of 7705 characters obtained from natural images. The dataset is divided 62 classes (0-9, A-Z and a-z). Figure 2 Images from the Chat74k dataset 2

Rejected Methods Use of Decision tree as a classifier Decision tree was considered to be used for classification, but was rejected on the grounds that, as they grow deeper, they start learning irregular patterns and show over fitting, a problem which arises due to the tendency for decision trees to show low bias and high variance. Use of SIFT for feature extraction We also used SIFT with SVM for image recognition, but on a very constrained and small subset of the dataset, we were able to get an accuracy around 50%. As we increased the subset, the accuracy started dropping manifold. Also we had trouble running the algorithm for the full dataset as we ran into memory errors and issues. Hence the use of SIFT along with SVM was rejected. [6] Methodology Feature extraction 1. Image pixel vector - The simplest method to extract image features was to use the image pixels themselves. We used the intensity of image pixels as a feature vector to train the learning models. We used the greyscale version of the image for processing. This method is quite fast as compared to other feature extraction methods but is one of the low accuracy models for feature extraction. 2. Histogram of Oriented Gradients HOG is a feature descriptor in which we basically divide the image into cells and find the local histograms of gradient features over pixels in each cell, and after normalization the descriptors are fed into the learning model. We used the extracthogfeatures function of MATLAB which returns a visual output to help determine the right cell size to use. We used cell sizes of 2*2, 4*4 and 8*8. As is visible in Figure 3, the best intuitive cell size is 2*2, but as we decrease the cell size, our computation complexity increases, while on larger cell sizes, enough information is not encoded for training. [7] Hence we used the cell size to be 4*4. 3

Figure 3 Image extraction using HOG Feature extraction We used three different techniques for training our model on the features that we obtained by our feature extraction methods 1. Random Forests They are extensions of decision trees incorporating averaging between multiple decision trees to give better results. They show lower variance than decision tree and there is no over fitting with increase in number of classifiers. The implementation of random forest was done by importing the sklearn.ensemble module in python. It gives better accuracies than LOOF-CV and the SVM classifier, while using common features. [10] We used n_estimator = 100 and entropy criterion for our random forest model as it gives better accuracy. [8] Figure 4 Learning using a Random Forest src: www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial 4

2. K nearest neighbours with LOOF-CV LOOF-CV is present in many standard machine learning libraries, but here we take the advantage of the fact that LOOF-CV is particularly fast with k-nn. In LOOF-CV we simply remove one data point to test and train on rest, so it is just k-fold cross validation with k = 1. The k-nearest Neighbour algorithm gives the most common label of the k nearest training point, which is then assigned to the test point. We tried different values of k in k-nn but k = 1 produced the best results. We used Euclidean distance function in k- NN. [8] Figure 5 k-nn classifier src: http://www.statistics4u.com/fundstat_eng/cc_classif_knn.html 3. Support Vector Machine Support vector machines are supervised learning that learn data, recognise patterns in them and based on those classify them. It is a non-probabilistic binary linear classifier. SVMs can also perform non-linear classification by using kernel methods implicitly mapping their inputs into high-dimensional feature spaces. [9] We used the fitcecoc function from Statistics Toolbox to create a multiclass classifier using binary SVMs and finally predict was used to predict the class of the test images. [7] We got an accuracy around 55% for a constrained subset of the dataset and the accuracy increased as we increased the subset to the full dataset. Figure 4 Scatterplot showing a linear SVM s decision boundary src: http://en.wikipedia.org/wiki/support_vector_machine#/media/file:linear-svmscatterplot.svg 5

Results and Analysis Method used Accuracy Pixel value vector Random forest 46.368% HOG SVM 77.029% Pixel value vector knn 43.586% Table 1 Methods vs. Accuracies Though random forests have been empirically proven [10] to be better than SVMs in classification, we can see that HOG-SVM gives quite better results than Pixel Value Vector Random forests. From this we can safely conclude that HOG is a better feature extractor than Pixel Value Vector and feature extraction plays a significant role in final character recognition. The dataset we had didn t have any contextual information, as we only had single characters to be processes rather than text regions. Our methods suffered from declining accuracies as such, due to conflicting characters such as 0, O and D ; 1, I (uppercase i )and l (lowercase L ). Figure 5 [4] In some cases (on manual inspection of misclassified images), it was found that cursive images were prone to be misclassified. For instance, this can happen if the classifier confuses a curly, cursive 4 with a similar looking 2, as shown in the figure below. Figure 6 6

Future Prospect To solve the problem of declining accuracies and conflicting characters, we can implement boosting. Boosting allows the use of more than one classifiers, and in case of a conflict, we would have probabilities for an image to be classified as any particular character, which would be different for each classifier and a weighted sum across those can help us in resolving the conflicts. We intend to use neural networks as a classifier, as many character recognition algorithms have been developed using neural nets, among which deep convolution nets have a special mention. With the proper feature extractor, very good accuracies (of about 97.28% using GoogleNet and 91.22% using AlexNet) [4] have been obtained. References [1] Lintern, James. "Recognizing Text in Google Street View Images." Statistics 6 (2008). [2] Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in Neural Information Processing Systems. 2012. [3] Goodfellow, Ian J., et al. "Multi-digit number recognition from street view imagery using deep convolutional neural networks." arxiv preprint arxiv:1312.6082 (2013) [4] Wang, Guan, and Jingrui Zhang. "Recognizing Characters From Google Street View Images." [5] http://www.ee.surrey.ac.uk/cvssp/demos/chars74k/ [6] https://github.com/shackenberg/minimal-bag-of-visual-words-image- Classifier [7] http://in.mathworks.com/help/vision/examples/digit-classificationusing-hog-features.html [8] https://www.kaggle.com/c/street-view-getting-started-with-julia [9] http://en.wikipedia.org/wiki/support_vector_machine [10] Caruana, Rich, and Alexandru Niculescu-Mizil. "An empirical comparison of supervised learning algorithms." Proceedings of the 23 rd international conference on Machine learning. ACM, 2006. 7