OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

Similar documents
Developing Open Source code for Pyramidal Histogram Feature Sets

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

Biologically-Inspired Translation, Scale, and rotation invariant object recognition models

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations

Object Detection Design challenges

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

MORPH-II: Feature Vector Documentation

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Human Vision Based Object Recognition Sye-Min Christina Chan

Exploration of Hardware Acceleration for a Neuromorphic Visual Classification System

An Implementation on Histogram of Oriented Gradients for Human Detection


2D Image Processing Feature Descriptors

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human Motion Detection and Tracking for Video Surveillance

Object Category Detection: Sliding Windows

Online Learning for Object Recognition with a Hierarchical Visual Cortex Model

The SIFT (Scale Invariant Feature

Computational Models of V1 cells. Gabor and CORF

Sketchable Histograms of Oriented Gradients for Object Detection

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Discriminative classifiers for image recognition

Automated Canvas Analysis for Painting Conservation. By Brendan Tobin

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Object recognition in images by human vision and computer vision Chen, Q.; Dijkstra, J.; de Vries, B.

Face Detection Using Convolutional Neural Networks and Gabor Filters

Robust PDF Table Locator

Analysis of features vectors of images computed on the basis of hierarchical models of object recognition in cortex

Car Detecting Method using high Resolution images

A Multi-Scale Generalization of the HoG and HMAX Image Descriptors for Object Detection Stanley M Bileschi

Sparse Models in Image Understanding And Computer Vision

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

CS 523: Multimedia Systems

Overview. Object Recognition. Neurobiology of Vision. Invariances in higher visual cortex

Histogram of Oriented Gradients for Human Detection

A Novel Extreme Point Selection Algorithm in SIFT

Object Tracking using HOG and SVM

Tongue Recognition From Images

Recap Image Classification with Bags of Local Features

Learning Feature Hierarchies for Object Recognition

Category vs. instance recognition

Histogram of Oriented Phase (HOP): A New Descriptor Based on Phase Congruency

Tri-modal Human Body Segmentation

Facial Expression Classification with Random Filters Feature Extraction

Window based detectors

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Bio-inspired Binocular Disparity with Position-Shift Receptive Field

Classification of objects from Video Data (Group 30)

Massively Parallel Multiclass Object Recognition

Introduction to Deep Learning

Robotics Programming Laboratory

A New Algorithm for Shape Detection

ANALYSIS DIRECTIONAL FEATURES IN IMAGES USING GABOR FILTERS

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Computer Science Faculty, Bandar Lampung University, Bandar Lampung, Indonesia

Obtaining Feature Correspondences

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Enhanced object recognition in cortex-like machine vision

EXPRESSION-INDEPENDENT FACE RECOGNITION USING BIOLOGICALLY INSPIRED FEATURES

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Hand Posture Recognition Using Convolutional Neural Network

Visual object classification by sparse convolutional neural networks

Object-Based Saliency Maps Harry Marr Computing BSc 2009/2010

Speeding up the Detection of Line Drawings Using a Hash Table

Histograms of Oriented Gradients for Human Detection p. 1/1

Sparsity-Regularized HMAX for Visual Recognition

TEXTURE CLASSIFICATION METHODS: A REVIEW

Modern Object Detection. Most slides from Ali Farhadi

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Human detections using Beagle board-xm

Introduction to visual computation and the primate visual system

Face Recognition Based On Granular Computing Approach and Hybrid Spatial Features

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Motion Estimation and Optical Flow Tracking

Filtering, scale, orientation, localization, and texture. Nuno Vasconcelos ECE Department, UCSD (with thanks to David Forsyth)

A new approach to reference point location in fingerprint recognition

Face and Nose Detection in Digital Images using Local Binary Patterns

Object Recognition II

Study of Viola-Jones Real Time Face Detector

Histograms of Oriented Gradients

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

A Robust Method for Circle / Ellipse Extraction Based Canny Edge Detection

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

International Journal Of Global Innovations -Vol.4, Issue.I Paper Id: SP-V4-I1-P17 ISSN Online:

HOG-based Pedestriant Detector Training

Coupling of Evolution and Learning to Optimize a Hierarchical Object Recognition Model

Decorrelated Local Binary Pattern for Robust Face Recognition

GPU Accelerated Number Plate Localization in Crowded Situation

Transcription:

Image Processing & Communication, vol. 18,no. 1, pp.31-36 DOI: 10.2478/v10248-012-0088-x 31 OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES RAFAŁ KOZIK ADAM MARCHEWKA Institute of Telecommunications, University of Technology & Life Sciences in Bydgoszcz, ul. Kaliskiego 7, 85-789 Bydgoszcz, Poland rkozik@utp.edu.pl Abstract. In this paper an object recognition algorithm for mobile devices is presented. The algorithm is based on a hierarchical approach for visual information coding proposed by Riesenhuber and Poggio [1] and later extended by Serre et al. [2]. The proposed method adapts an efficient algorithm to extract the information about local gradients. This allows the algorithm to approximate the behaviour of simple cells layer of Riesenhuber and Poggio model. Moreover, it is also demonstrated that the proposed algorithm can be successfully deployed on a low-cost Android smartphone. 1 Introduction This paper is a continuation of author s recent research on object detection algorithms dedicate for mobile devices. The tentative results have been presented in [4]. In contrast to previous work, modifications to overall method architecture have been proposed. Moreover, the experiments described in this paper have been conducted for extended database of images. The proposed approach follows the idea of HMAX model proposed by Riesenhuber and Poggio [1]. It adapts the hierarchical fed-forward processing approach. However, modifications are introduced in order to reduce computational complexity of the original algorithm. Moreover, in contrast to Mutch and Lowe [10] model an additional layer that mimics the "retina codding" mechanism is added. The results showed that this step increases the robustness of the proposed method. However, it is considered as an optional step of visual information processing, because it introduces additional computational burden that can increase the response time (the frame rate drops significantly) of at low-cost mobile devices. In contrast to original HMAX model, a different method for calculating the S 1 layer response is proposed. The responses of 4 Gabor filters are approximated with algorithm that mimic the behaviour of HOG descriptors. The HOG [3] descriptors are commonly use for object recognition problems. The image is decomposed into small cells. In each cell an histogram of oriented gradients is used. Typically for HOG features the result is normalised and the descriptor is returned for each cell. However, in contrast to original HOG descriptor proposed by Dalal and Triggs, in this method (in order to simplify the computations) the block of cells are not overlapping, and

32 R. Kozik, A. Marchewka the input image is not normalised. Moreover, in previous work [7] the S 2 and C 2 layers were replaced with machine-learned classifier. Such approach significantly simplifies the learning process and decreases the amount of computations. However, this impacts the invariance to shape changes of a recognised object and the feature vectors presented to classifier are relatively of grater dimensionality. 2 The HMAX model overview The HMAX Visual Cortex model proposed by Riesenhuber and Poggio [1] exploits a hierarchical structure for the image processing and coding. It is arranged in several layers that process information in a bottom-up manner. The lowest layer is fed with a grayscale image. The higher layers of the model are either called "S" or "C". These names correspond to simple the (S) and complex (C) cells discovered by Hubel and Wiesel [9]. Both type of cells are located in the striate cortex (called V1), which is the part of visual cortex that lies in the most posterior area of the occipital lobe. The simple cells located in "S" layers apply local filters that responses form a vector of texture features. As noted by Hubel and Wiesel the individual cell in the cortex respond to the presence of edges. They also discovered that these cells are sensitive to edge orientation (some of cells fire only when a given orientation of an edge is observed). The complex cells located in "C" layers calculate in a limited range of a previous layer the strongest responses of a given type (orientation). That way more complex combination of simple features are obtained combined from three simple features. The hierarchical HMAX model proposed by Mutch and Lowe [10] is a modification of the model presented by Serre et al. in [2]. It introduces two layers of simple cells (S 1 and S 2 ) and two layers of complex cells (see Fig. 1). It uses set of filters designed to emulate V1 simple cells. The Fig. 1: The structure of a hierarchical model proposed by Mutch and Lowe [10]. (used symbols: I - image, S - simple cells, C - complex cells, GF - Gabor Filters, F - prototype features vectors, - convolution operation) simple layers are computed using a hard max filter [11]. It means that the "C" cells responses are the maximum values of the associated "S" cells. As it is shown in the Fig. 1 the images are processed by the subsequent simple and complex cells layers and reduced to feature vectors, which are further used in the classification process. The set of features (F ) is shared across all images and object categories. Features are computed hierarchically in subsequent layers built from the previous one by alternating the template matching and the max pooling operations. The S 1 layer in the Mutch and Lowe [10] model adapts the 2D Gabor filters computed for four orientations (horizontal, vertical, and two diagonals) at each possible position and scale. The Gabor filters are 11 11 in size, and are described by: G(x, y) = e (X2 +γy 2 )/(2σ 2) cos( 2π λ ) (1) where X = x cos φ y sin φ and Y = x sin φ+y cos φ; x, y < 5; 5 >, and φ < 0; π >. The aspect ration (γ), effective width (σ), and wavelength (λ) are set to 0.3, 4.5 and 5.6 respectively. The response of a patch of pixels X to a particular filter G is computed using the formula (2). Xi G i R(X, G) = abs( ) (2) X 2 i The complex cells located in C 1 layer pool associated units in the S 1 layer. For each orientation, the S 1 re-

Image Processing & Communication, vol. 18, no. 1, pp. 31-36 33 sponses are convolved with a max filter, that is 10 10 of size in x, y dimension (position) and has 2 units of deep in scale. As it is shown in the Fig. 1, the intermediate S 2 layer is formed by convolving the C 1 layer response with a set of intermediate-level features (depicted as F in Fig. 1). The set of intermediate-level features is established during the learning phase. For a given set of learning images C 1 responses are computed. The most meaningful features are selected using SVM weighting. Mutch and Lowe [10] suggested to sub-sample the C 1 responses before feature selection. Therefore, authors select at random positions and scales of the C 1 layer patches of size 4 4, 8 8, 12 12, and 16 16. Selected and weighted features compose so called prototypes that are used in the Mutch and Lowe model as filters which responses create the S 2 layer. The C 2 layer composes a global feature vector which particular element corresponds to the maximum response to a given prototype patch. In order to identify the visual object on the basis of feature vector a classifier is learnt (e.g. SVM). 3 The proposed method In order to approximate the S 1 layer behaviour of the HMAX model the algorithms that mimics the HOG descriptors is used. However, to make this step as simple as possible (due to the low computation power of mobile devices) only gradient orientation binning schema is adapted. In fact, the intention here is no to compute HOG descriptor, but to approximate the response of four Gabor filters used in original HMAX model. In the S 1 layer there are N M 4 simple cells arranged in a grid of size N M blocks. In each block there are 4 cells. Each cell is assigned a receptive field (pixels inside the block). Each cell activates depend on a stimuli. In this case there are four possible stimulus, namely vertical, horizontal, left diagonal, and right diagonal edges. As a result the S 1 simple cells layer output has dimensionality of a size 4 (x, y, scale and 4 cells). In order to compute responses of all four cells inside a given block (receptive field), an Algorithm 1 is applied. Algorithm 1 Algorithm for calculating S 1 response Require: Grayscaled image I of W H size Ensure: S 1 layer of size N M 4 Assign G min the low-threshold for gradient magnitude. for each pixel (x, y) in image I do Compute horizontal G x and vertical G y gradients using Prewitt operator Compute gradient magnitude G x,y in point (x, y) n x N W ; m y M W if G < G min then go to next pixel else active get_cell_type(g x, G y ) S 1 [n, m, active] S 1 [n, m, active] + G x,y end if end for The algorithm computes the responses of all cells using only one iteration over the whole input image I. For each pixel at position (x, y) a vertical and horizontal gradients are computed (G x and G y ). Given the pixel position (x, y) and gradient vector [G x, G y ] the algorithm indicates the block position (n, m) and type of cell (active) that response has to be incremented by G. In order to classify given gradient vector [G x, G y ] as horizontal, diagonal or vertical the get_cell_type(, ) function uses simple schema for voting. If a point ( G x, G y ) is located between line y = 0.3 x and y = 3.3 x it is classified as a diagonal. If G y is positive then the vector is classified as a right diagonal (otherwise it is a left diagonal). In case the point ( G x, G y ) is located above line y = 3.3 x the gradient vector is classified as vertical and as horizontal when it lies below line y = 0.3 x. The processing in S 1 layer is applied for three scales (an original image, and two images scaled by a factor of 0.7 and 1.5). This is a necessary operation to achieve the response of C 1 layer. The C 1 complex layer response is

34 R. Kozik, A. Marchewka computed using max-pooling filter, that is applied to S 1 layer. This layer allows the algorithm to achieve scale invariance. The output of the C 1 layer cells is used in order to select features that are meaningful for a given task of an object detection. Firstly, randomly selected images of an object are reduced to a features vectors by a consecutive layers of the proposed model. Afterwards, the PCA is applied to calculate the eigenvectors. Each eigenvector is used as a single feature in a features vector. The response of S 2 layer is obtained using eigenvectors to compute the dot product with the C 1 layer. Finally, an classifier is learned to recognise an object using S 2 response layer. For this paper different machinelearnt algorithms are investigated. 4 Experiments and Results was possible to achieve 95,6% of detection rate while having 2.3% of false positives. It can be noticed that adding additional trees does not improve the effectiveness. The early prototype of proposed algorithm was deployed on an Android device. The code was written in pure Java and tested on Samsung Galaxy Ace device. This device is equipped with 800 MHz CPU, 278 MB of RAM and Android 2.3 operating system. Current version implements brute force Nearest Neighbour classifier, operates only for one scale (PC scans three scale - an original image, and two images scaled by a factor of 0.7 and 1.5), and achieves about 10 FPS when less than 10 training samples are provided. Some examples of object detection are shown in Fig. 3. During the testing it was noticed that the algorithm is able to recognise an object on a cluttered background even if only few learning samples are provided. In contrast to the previous work [4], the data base of images has been extended and new experiments have been conducted, which aimed at evaluating the effectiveness of different classifiers. In this work tree-based and rule-based classification approaches have been compared, namely N (5,10,15) Random Trees, J48 and PART (previously used by author in [5, 6, 4]) algorithms. The comparison has been presented in Fig.2. For the experiments the dataset was randomly split for training and testing samples (in proportion 70% to 30% respectively). For training samples the C 1 layers responses were calculated and eigenvectors were extracted. The eigenvectors were used to calculate the S 2 layers responses that were further used to learn classifier. During the testing phase the eigenvectors were again used in order to extract S 2 layers responses for testing samples. The whole procedure was repeated 10 times ( the dataset was randomly reshuffled before split) and the results were averaged. The best results were obtained for 10 Random Trees. It Fig. 3: Example of object detection (mug and doors) with proposed algorithm deployed on an Android device 5 Conclusions In this paper an algorithm for object recognition dedicated form mobile devices was presented. The algorithm is based on a hierarchical approach for visual information coding proposed by Riesenhuber and Poggio [1] and later

False Positive Rate Image Processing & Communication, vol. 18, no. 1, pp. 31-36 35 0,04 0,035 0,03 0,025 0,02 0,015 0,01 0,005 10 random trees 5 random trees 15 random trees J48 PART Expon. (10 random trees) Expon. (5 random trees) Expon. (15 random trees) Expon. (J48) Expon. (PART) 0 0,78 0,8 0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96 0,98 Detection Rate Fig. 2: Different classifiers effectiveness extended by Serre et al. [2]. The experiments show that the introduced algorithm allows for efficient feature extraction and a visual information coding. Moreover, it was shown that it is also possible to deploy proposed approach on a low-cost mobile device. The results are promising and show that described in this paper algorithm can be further investigated as one of assistive solutions dedicated for visually impaired people. References [1] M. Riesenhuber, T. Poggio, Hierarchical models of object recognition in cortex, Nat Neurosci, Vol. 2, pp.1019 1025, 1999 [2] T. Serre, G. Kreiman, M. Kouh, C. Cadieu, U. Knoblich T. Poggio, A quantitative theory of immediate visual recognition, In:Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function, Vol. 165, pp. 33 56, 2007 [3] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, CVPR 2005, IEEE Computer Society Conference on, Vol.1, pp. 886 893, June 2005 [4] R. Kozik, A Simplified Visual Cortex Model for Efficient Image Codding and Object Recognition, Image Processing and Communications Challenges 5, Advances in Intelligent Systems and Computing, pp. 271 278, 2013 [5] R. Kozik, Rapid Threat Detection for Stereovision Mobility Aid System, In: T. Czachorski et al. (Eds.): Man-Machine Interactions 2, AISC 103, pp. 115 123, 2011 [6] R. Kozik, Stereovision system for visually impaired. Burduk, Robert (ed.) et al., Computer recognition systems 4, Advances in Intelligent and Soft Computing 95, pp. 459 468, 2011 [7] R. Kozik, A Proposal of Biologically Inspired Hi-

36 R. Kozik, A. Marchewka erarchical Approach to Object Recognition, Journal of Medical Informatics & Technologies, Vol. 22/2013, ISSN 1642-6037, pp. 171 176, 2013 [8] R. Kozik, A Simplified Visual Cortex Model for Efficient Image Codding and Object Recognition, Image Processing and Communications Challenges 5, Advances in Intelligent Systems and Computing S. Choras, Ryszard, pp. 271-278, 2013 [9] D.H. Hubel, T.N. Wiesel Receptive fields, binocular interaction and functional architecture in the cat s visual cortex, J Physiol, Vol. 160, pp. 106 154, 1962 [10] J. Mutch, D.G. Lowe, Object class recognition and localization using sparse features with limited receptive fields,international Journal of Computer Vision (IJCV), Vol. 80, No. 1, pp. 45 57, October 2008 [11] MAX pooling, http://ufldl.stanford.edu/wiki/index. php/pooling