Pattern Recognition 46 (2013) Contents lists available at SciVerse ScienceDirect. Pattern Recognition

Similar documents
Pedestrian Detection and Tracking in Images and Videos

An Object Detection System using Image Reconstruction with PCA

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Histograms of Oriented Gradients for Human Detection p. 1/1

Human Motion Detection and Tracking for Video Surveillance

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method


Human detection using local shape and nonredundant

Object detection using non-redundant local Binary Patterns

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Object Category Detection: Sliding Windows

Haar Wavelets and Edge Orientation Histograms for On Board Pedestrian Detection

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Object Detection Design challenges

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Human Object Classification in Daubechies Complex Wavelet Domain

Fast Pedestrian Detection using Smart ROI separation and Integral image based Feature Extraction

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

Multiple-Person Tracking by Detection

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Object Category Detection: Sliding Windows

2 OVERVIEW OF RELATED WORK

Facial Feature Extraction Based On FPD and GLCM Algorithms

A Novel Extreme Point Selection Algorithm in SIFT

Human detections using Beagle board-xm

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features

Histogram of Oriented Gradients for Human Detection

[2008] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangjian He, Wenjing Jia,Tom Hintz, A Modified Mahalanobis Distance for Human

Face detection and recognition. Detection Recognition Sally

Fast Human Detection Algorithm Based on Subtraction Stereo for Generic Environment

Recap Image Classification with Bags of Local Features

A novel template matching method for human detection

Visuelle Perzeption für Mensch- Maschine Schnittstellen

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Pedestrian Detection with Improved LBP and Hog Algorithm

Deformable Part Models

Selection of Scale-Invariant Parts for Object Class Recognition

Real Time Stereo Vision Based Pedestrian Detection Using Full Body Contours

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients

Towards Practical Evaluation of Pedestrian Detectors

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

HUMAN POSTURE DETECTION WITH THE HELP OF LINEAR SVM AND HOG FEATURE ON GPU

Classification of objects from Video Data (Group 30)

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

Ceiling Analysis of Pedestrian Recognition Pipeline for an Autonomous Car Application

Categorization by Learning and Combining Object Parts

REAL TIME TRACKING OF MOVING PEDESTRIAN IN SURVEILLANCE VIDEO

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors

Face Recognition using SURF Features and SVM Classifier

Recent Researches in Automatic Control, Systems Science and Communications

Subject-Oriented Image Classification based on Face Detection and Recognition

Window based detectors

The Population Density of Early Warning System Based On Video Image

An efficient face recognition algorithm based on multi-kernel regularization learning

Face Detection and Recognition in an Image Sequence using Eigenedginess

Real-Time Human Detection using Relational Depth Similarity Features

Multidirectional 2DPCA Based Face Recognition System

Pedestrian Detection in Infrared Images based on Local Shape Features

Pixel-Pair Features Selection for Vehicle Tracking

Face Detection and Alignment. Prof. Xin Yang HUST

Motion Estimation and Optical Flow Tracking

Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection

Category-level localization

Face Detection using Hierarchical SVM

International Journal Of Global Innovations -Vol.4, Issue.I Paper Id: SP-V4-I1-P17 ISSN Online:

Study of Viola-Jones Real Time Face Detector

A Novel Algorithm for Color Image matching using Wavelet-SIFT

Image Processing. Image Features

DPM Score Regressor for Detecting Occluded Humans from Depth Images

Histogram of Oriented Gradients (HOG) for Object Detection

Distance-Based Descriptors and Their Application in the Task of Object Detection

Detecting Pedestrians Using Patterns of Motion and Appearance

Object Detection. Part1. Presenter: Dae-Yong

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS

Visual Detection and Species Classification of Orchid Flowers

Detecting Pedestrians by Learning Shapelet Features

An Experimental Study on Pedestrian Classification

Component-based Face Recognition with 3D Morphable Models

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network

Category vs. instance recognition

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

High-Level Fusion of Depth and Intensity for Pedestrian Classification

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Efficient and Fast Multi-View Face Detection Based on Feature Transformation

Detecting and Segmenting Humans in Crowded Scenes

Local Feature Detectors

arxiv: v1 [cs.cv] 20 Dec 2016

Detecting People in Images: An Edge Density Approach

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Adaptive Image Sampling and Windows Classification for On board Pedestrian Detection

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Eye Detection by Haar wavelets and cascaded Support Vector Machine

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Transcription:

Pattern Recognition 46 (2013) 2220 2227 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Novel and efficient pedestrian detection using bidirectional PCA Thi-Hai-Binh Nguyen 1, Hakil Kim n School of Information & Communication Engineering, INHA University, #525, Hitech Center, Yonghyeon 4-dong, Nam-gu, Incheon [402-751], Republic of Korea article info Article history: Received 28 February 2012 Received in revised form 12 November 2012 Accepted 7 January 2013 Available online 31 January 2013 Keywords: Pedestrian detection Object detection Bidirectional PCA abstract The detection of pedestrian has attracted much research in the past decade due to the essential role it plays in intelligent video surveillance and vehicle vision systems. However, the existing algorithms do not meet the requirement of real applications in terms of detection performance. This paper proposes a new robust algorithm for pedestrian detection based on image reconstruction using bidirectional PCA (BDPCA). Unlike PCA, since it is a straightforward image projection technique, BDPCA preserves the shape structure of objects and is computationally effective. Due to these advantages, BDPCA is a promising tool for object detection and recognition. The algorithm was tested on two datasets, INRIA and PennFudanPed. Our experiment proved that using BDPCA with vertical edge images was the most suitable for pedestrian detection. The comparison between BDPCA, PCA, and histogram of oriented gradient (HOG) based methods demonstrates superior accuracy and robustness of the proposed algorithm to the others. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction For the past decade, many applications in which detecting people plays an essential role has been developed. Such major applications are video surveillance systems, airport security, driving assistance systems, automatic driving cars, smart home, and robotics. The importance of these applications makes pedestrian detection a topic worthy of studying. Although there has been much effort to outperform pedestrian detection algorithms, the accuracy of the existing algorithms is still far from the requirement of real applications. The reasons why pedestrian detection is difficult can be summarized as follows: 1. Diversity in appearance: The appearance of a human can extremely be varied by changing pose, clothes, or the objects being carried, or viewpoints of camera. In addition, people have a large variation in size. Therefore, a pedestrian detection algorithm has to be able to cope with these variations. 2. Environment diversity: In this research, the environment includes the background where people are detected, illumination, and weather conditions. Since pedestrian detection is used in a wide range of applications, the background can be as diverse and complex as inside a building, campus, airport, road, or urban. This complexity is one of the biggest challenges to pedestrian detection. Due to this wide range of applications, n Corresponding author. Tel.: þ82 32 860 7385. E-mail addresses: calmseahn@gmail.com (T.-H.-B. Nguyen), hikim@inha.ac.kr (H. Kim). 1 Tel.: þ84 973 222528. pedestrian detection also suffers from the problem of illumination or weather changing. 3. Partial occlusion: Since people appear in dynamic and uncontrolled backgrounds, partial occlusions surely happen at any time. Therefore, as same as any object detection problem, partial occlusion needs to be considered in developing pedestrian detection algorithms. 4. Camera motion: In some applications, such as surveillance systems or airport security, the cameras are fixed, hence the backgrounds are static. In this case, motion can be used as an efficient cue for pedestrian detection. However, in other applications, such as driving assistance systems, automatic driving cars, or robots, both the camera and the objects in a scene are moving, which makes it difficult to extract pedestrian motion in this case. 5. Real-time processing: The major applications that require pedestrian detection also demands real-time processing as their vital question. Pedestrian detection is an essential part in these systems; however, it is only a single step in the whole system. Thus, it needs to be done as fast as possible to preserve the real-time processing characteristic of the whole system. Fig. 1 shows difficult cases of detecting pedestrian due to occlusion, pose variation, camera viewpoint, and illumination change. In literature, numerous methods have been proposed for detecting pedestrians from images. This paper provides a brief summary and analysis of the existing methods in Section 2. In addition, it proposes a new method for automatically detecting pedestrians in still images based on Bidirectional PCA (BDPCA). Bidirectional PCA was originated from [1] for image recognition 0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.patcog.2013.01.007

T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 2221 Fig. 1. Illustration of difficulties in human detection. (mainly focuses on the face recognition problem). Unlike classical PCA, BDPCA is a straightforward image projection technique; hence it does not require converting an image to a high dimensional image vector. BDPCA extracts features from an image by reducing the dimension in both column and row directions, thus it requires less computation than PCA [2]. The proposed method includes two steps: training and classification. In the training step, BDPCA is applied to a set of pedestrian images and another set of non-pedestrian images. For each space (pedestrian or non-pedestrian), this step produces a descriptor consisting of the mean image, row, and column projectors, so called, 2D eigen-descriptor. In the classification step, an input image is scanned by a window of a certain size (in this work, 64 180 pixels) defining ROI (Region of interest). And, each ROI, is reconstructed by the 2D eigen-descriptors of pedestrian and non-pedestrian space, then determined whether it belongs to the pedestrian space or not based on reconstruction errors. This paper implements BDPCA using different source images such as grayscale, edge, and vertical-edge images of the original images, and a complete performance analysis was carried out. The analysis shows that the 2D eigen-descriptor of the vertical-edge image is the most suitable for pedestrian detection. Furthermore, a comparison between BDPCA using the vertical-edge, PCA, and HOGbased methods demonstrates the superiority of the proposed method to the other methods in both accuracy and robustness for detecting pedestrians in unconstrained environment. The rest of this paper is organized as follows: Section 2 provides a brief summary and analysis of the existing methods for automatic pedestrian detection. Section 3 describes the BDPCA-based pedestrian detection, and Section 4 compares the performance between the proposed method and the existing methods such as PCA and HOG. Finally, Section 5 states the conclusions and future works. 2. Literature review In general, the process of pedestrian detection is divided into two subsequent steps: ROI selection and classification. There are several approaches to generate candidate ROIs for the classification step. The simplest approach is brute-force window sliding, which uses a fixed size detector to scan across the image at multiple scales and locations. This approach usually suffers from high processing time. In the case of static cameras, background subtraction can be performed to extract ROIs, which yields a significant speedup compared to the exhaustive scanning. However, due to illumination change or color similarity, this method may miss some extraction. Beside these approaches, 3D information obtained by a laser range finder or a stereo camera is also used for ROI selection. Although the 3D-based extraction partly solves the problems of the two approaches above, it is only suitable in the applications for robots or cars where the detection area is relatively small. Meanwhile, the classification step receives the extracted ROI(s) and classifies them into either pedestrian or non-pedestrian classes. According to the detection cues, the existing approaches can be categorized into two groups, image-based and feature-based approaches. Image-based approach. This approach simply uses pixel intensity as the feature of classification. Common learning algorithms in this approach are support vector machine (SVM), neural network, and principal component analysis (PCA). In 2005, Tian et al. [3] attempted to develop a nighttime pedestrian detection system using only a normal camera. In this system, the authors assumed that the intensity values of foreground pixels should be larger than a threshold; hence, a thresholding method is used for ROI selection. For comparison, they designed four classification approaches. Three approaches used only a single SVM classifier with the input from one of intensity image, binary image, and intensity gradient image, respectively. The fourth approach used 12 SVM classifiers each of which corresponded to one of pedestrians poses. Their experiments revealed that the single classifier with the intensity image as an input produces the highest performance. Instead of SVM, Gavrila [4], Oh et al. [5] and Szarvas et al. [6] used neural networks in their systems. Gavrilas system also consists of two stages. The first step extracts contour features and applies a hierarchical template matching (Fig. 2) to efficiently lock onto candidate ROIs. Then, the second step utilizes the intensity features and RBF-neural network (Radial Basis Function) to validate candidate ROIs. Both Szarvas et al. and Oh et al. used the convolution neural network (CNN). Comparing the computational demand of the SVM and CNN-based methods, Szarvas et al.

2222 T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 Fig. 2. (a) A hierarchy for pedestrian shape. (b) Some detection results [4]. demonstrated that CNN requires about 40 times less computation than that of SVM while providing a superior accuracy [6]. Unlike the above methods, Malagon-Boa and Fuentes [7] adopted PCA-based reconstruction to detect pedestrians. Firstly, they applied PCA separately for the pedestrian and non-pedestrian samples. Given a test image, the classification step is completed by comparing the reconstruction results produced by both sets of principal components. Fig. 3 displays examples of PCA-based image reconstruction. The first column contains the original images. The reconstructions using 100 principal components obtained from the pedestrian images are shown in Fig. 3(b), while Fig. 3(c) are the reconstructions using 100 principal components obtained from the non-pedestrian images. Because it does not require any feature extraction method, using pixel intensities as features is convenient. However the performance analysis of the above mentioned algorithms shows that pixel intensities are inefficient features for pedestrian detection, since they are not invariant to the change of illumination and also cannot overcome the partial occlusions. Feature-based approach. The feature-based approach is more popular than the image-based in the previous studies. Similar to the other approach, this approach also uses some learning algorithms, such as SVM, Adaboost, and neural networks for classification. However, instead of intensity values, it utilizes gradient, Haar-like features, histogram of oriented gradient (HOG) descriptors, or shape features. Papageorgiou and Poggio [8] developed a trainable system for object detection, which uses three types of Haar wavelets, vertical, horizontal, and diagonal features, with SVM. Fig. 4(a) shows three types of 2-dimensional Haar wavelets and Fig. 4(b) shows the average wavelet coefficient of each Haar wavelet type. Analysis of the average wavelet coefficients shows that the vertical wavelets responded to the sides of the body, the horizontal wavelets to the top of the head and shoulders, and the diagonal wavelets to the head, shoulders, hands, and feet. Coming up with a similar idea, Mohan et al. [9] used Haar wavelet and quadric SVMs to detect frontal, rear, slightly rotated, and partially occluded people. The hierarchical classifier implemented in their system uses four distinct component detectors at the first level. These detectors are trained to independently find components of pedestrians, i.e., heads, legs, and left and right arms. The four component detectors are combined at the next level by another SVM. For real-time processing, Haar-like features and Adaboost learning have been utilized by several researchers [10,11], and to improve the detection performance of the Haar-like feature-based methods, additional features, such as edge orientation histogram and motion, have been used [11,12]. The HOG descriptors were first proposed by Dalal and Triggs [13]. They rapidly gained much attention from other researchers due to their excellent performance compared to other existing feature sets. Fig. 5(b) displays an example of HOG descriptor that corresponds to the image in Fig. 5(a). Fig. 5(c) and (d) contains the HOG descriptors weighted by the positive and negative SVM weights. These weighted HOG descriptors show that the HOG-based classifier cues mainly on silhouette contours. For simplicity and speed, Dalal and Triggs used a linear SVM in their study. However, it has been reported that a Gaussian kernel SVM improved the performance at the cost of a much higher run time. One year later, in 2006, Dalal, et al. [14] combined their original HOG (termed appearance HOG) with motion HOG, which encodes motion information using oriented histograms of differential optical flow. The combined detector reduces the false alarm rate by a factor of 10 compared to the best appearance HOGbased detector. Processing time, however, is much higher than that of appearance HOG. To use efficient HOG descriptors in real-time systems, Zhu et al. [15] proposed a fast pedestrian detection using a cascade of HOG. They use the Viola Jones object detection framework to select the best HOG features and construct the cascade of classifiers. Their system is comparable to Dalals system in terms of detection performance with up to 70 times of the speedup. Felzenszwalb et al. introduced discriminatively trained part based models for object detection [16]. The models are successfully applied to detect people in still images. The proposed models use the low dimensional HOG features, which are obtained by the principal component analysis on the original HOG features. To train the models, a latent SVM, which allows discriminative learning with latent variables, is used. To speed up the object detection based on deformable part models, Felzenszwalb et al. proposed a method for building cascade classifiers from part-based deformable models [17]. Their algorithm can speed up object detection by more than one order of magnitude without sacrificing detection accuracy. For the systems with a non-moving camera where the pedestrian candidates can be obtained by the background subtraction technique, shape features can be used in a learning algorithm, such as k-means clustering, SVM, or neural networks. The background subtraction results in object blobs from which the shape features can be extracted. Commonly adopted shape features are Hu invariant moments, height-to-width ratio, fill ratio, segment area, perimeter, compactness, segment convexity, convex deviation, and project histogram [18 20]. Even though shape feature-based methods allow real-time processing, a robust and accurate background subtraction algorithm is essential to guarantee high classification performance. 3. Pedestrian detection using bidirectional PCA 3.1. Bidirectional PCA Bidirectional PCA (BDPCA) was proposed in [1] as a generalization of Yang s 2DPCA [21]. Unlike classical PCA, BDPCA does not require to map an image to a high dimensional vector. BDPCA adopts the concept of row and column eigenvectors to directly compute the feature matrix from an image.

T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 2223 Fig. 3. Examples of PCA-based image reconstruction [7]. Fig. 4. Haar wavelets in 2D and average wavelet coefficients [8]. Let X 1,X 2,...,X N be a training set of N images. The size of X i ði ¼ 1,...,NÞ is m n. X is the mean of all training images. The row total scatter matrix S row t and column total scatter matrix S col t are defined as follows: S row t ¼ 1 X N ðx Nm i XÞ T ðx i XÞ i ¼ 1 S col t ¼ 1 Nn X N i ¼ 1 ðx i XÞðX i XÞ T The row projector W row contains the row eigenvectors corresponding to the first k row ð5nþ largest eigenvalues of S row t, and the column projector contains the column eigenvectors corresponding to the first k col ð5mþ largest eigenvalues of S col t. ð1þ

2224 T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 Fig. 5. (a) A test image, (b) HOG descriptor, (c, d) HOG descriptor weighted by the positive and negative SVM weights [13]. Fig. 6. Process of BDPCA-based pedestrian classification. W row ¼½w row 1,...,w row krow Š W col ¼½w col 1,...,wcol kcol Š ð2þ The feature matrix Y of an m n image X can be obtained using the row projector W row and column projector W col as in (3a). The reconstructed image X ~ is represented by (3b). Y ¼ W T col ðx XÞW row ð3aþ ~X ¼ X þw col YW T row ð3bþ As described in [1], PCA requires m n d PCA multiplications, where d P CA5mn is the number of principal components; BDPCA requires m n k row þm k col k row. Since, in general, k row is less or equal to d PCA, BDPCA-based feature extraction is more computationally efficient than that of PCA. 3.2. BDPCA-based pedestrian classification BDPCA-based pedestrian classification includes two steps: training and classification (Fig. 6). The training step does BDPCA for the positive and negative set separately. After this step, the two 2D eigen-descriptors, which consist of the mean images, row and column projectors, of the pedestrian and non-pedestrian spaces are obtained. In this paper, X P, W P row and WP col denote the mean image, row, and column projector of the pedestrian space; and X N, W N row and WN col denote those of the non-pedestrian space. Classification is based on image reconstruction using BDPCA. Given an image, ROIs are selected using the exhaustive scanning. For each ROI X, two reconstruction images are computed by (3b). The first one, X ~ P, is reconstruction using the descriptor of the pedestrian space. The second one, X ~ N, is reconstructed using the descriptor of the non-pedestrian space. The different reconstruction error, which is defined in (4), is then used to determine whether or not X is an image of a pedestrian. E P ¼ JX X ~ P J,E N ¼ JX X ~ N J d ¼ E N e E P ð4þ where E P and E N are the reconstruction errors using the descriptors of the pedestrian and non-pedestrian space, respectively, J J denotes the L2-norm, e is a constant. 3.3. Implementation BDPCA-based pedestrian detection involves the reconstruction process. The detection performance, of course, will depend on the number of row and column eigenvectors that are used for reconstruction. Therefore, this paper analyzes the performance according to the set of row and column projectors. In addition,

T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 2225 five different classification schemes were implemented to find the best feature for pedestrian detection. The first scheme uses pixel values of the grayscale image as the input for BDPCA. The two next schemes use edge and vertical edge images as the input. The training and classification processes are presented in Section 3. The edge and vertical edge image are obtained by convolving the grayscale image I with the edge filters as in (5c) and (5d), respectively. 0 1 1 1 2 1 1 B C F x ¼ @ 0 0 0 0 0 A, F y ¼ F T x ð5aþ 1 1 2 1 1 G x ¼ I F x G y ¼ I F y qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi G ¼ G 2 x þg2 y ð5bþ ð5cþ ð5dþ The last two schemes use combined features of grayscale and edge images or grayscale and vertical edge images. The combination is done as described below: Training step Do BDPCA for the training set of grayscale images. The outputs are grayscale descriptors of the pedestrian and non-pedestrian spaces. Compute the edge (vertical edge) images of the grayscale images. Do BDPCA for these edge (vertical edge) images. The outputs are edge (vertical edge) descriptors of the pedestrian and non-pedestrian spaces. Classification step Given an image, convert it into grayscale if it is a color image. Compute the edge (vertical edge) of the grayscale version. Reconstruct the grayscale image using the grayscale descriptor of the pedestrian and non-pedestrian space. The different reconstruction error, denoted by d grayscale,iscomputedusing(4). Similarity, reconstruct the edge (vertical edge) image using the edge (vertical edge) descriptor of the pedestrian and nonpedestrian space. The different reconstruction error, denoted by d edge (d Vedge ), is computed using (4). Table 1 Specifications of training and test sets. Training set Test set #1 Test set #2 Number of samples Positive 1237 589 423 Negative 3891 453 Pedestrian size (pixel) Minimum 58 29 71 29 31 11 Maximum 794 395 775 248 369 211 The total different reconstruction error (DRE) is calculated in (6), where d x is d edge or d Vedge. d total ¼ d grayscale þd x ð6þ If the total DRE is equal or greater than zero, the input is an image of pedestrian. Otherwise, it is a non-pedestrian image. 4. Experimental results 4.1. Database preparation For a fair evaluation, a training set and two test sets are prepared using the images from INRIA [13] and PennFudanPed databases [22]. The training set consists of 1237 positive samples and 3891 negative samples, which are obtained from the training set of INRIA database. The first test set contains 589 positive samples and 453 negative images. All these samples are from the test set of INRIA database. The second test set has 423 positive images, which are constructed using the images from the Penn- FudanPed database, and 453 negative samples from the first test set. The purpose of building the second test set is to demonstrate the independence of the proposed method from the training database. The positive images are resized so that the size of pedestrians are approximately 128 64 pixels. The test images are scanned using a detection window of size 128 64 pixels. The detection window is shifted 8 pixels across the image. Table 1 summarizes the specifications of both training and test sets. Some sample images are shown in Fig. 7. 4.2. Analysis For each scheme described in Section 3.3, the performance is analyzed using a different number of row and column eigenvectors. The number of row eigenvectors is from three to 64. The number of column eigenvectors is from 3 to 128. The analysis shows that the algorithm achieves the highest performance with the vertical edge descriptors and when the size of row projector is 64 and that of column projector is 90. Fig. 8 displays the ROC curve of different BDPCA descriptor types. Only the best results from each descriptor type are shown. The vertical edge descriptor eliminates information about clothes colors and keeps only information about pedestrian contours; hence it provides the best performance. In the experimental results, combination of grayscale and vertical edge decreases the performance of using vertical edge only. This result is understandable since gray levels are sensitive to noise, clothes colors and illumination. In this section, a comparison between PCA, HOG, and BDPCAbased pedestrian detection is also provided. The databases used for training and test are described in the above section. As reported in [7], the highest performance of PCA is obtained when combining grayscale and edge images. Fig. 9 shows the result when the number of principal components is 200, and the result Fig. 7. Sample images in (a) training set and (b) test sets.

2226 T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 Fig. 8. ROC curve of different BDPCA descriptors. Fig. 10. Processing time comparison between BDPCA and PCA. Fig. 9. Performance comparison between BDPCA, PCA, and HOG. of HOG is achieved with the linear kernel. The experiment in two test sets proves the accuracy and robustness of the proposed algorithm. Compared to PCA, BDPCA has advantages in training and classification time. Since BDPCA does not require the mapping of an image X to a high dimension vector x, its training is fast and does not encounter the problem of out of memory when using a large training set. Because of no mapping required, the classification times are almost invariable although the numbers of row and column vectors used for the reconstruction step increase. Fig. 10 depicts the comparison in terms of processing time between BDPCA and PCA. The classification time is measured over a set of 1000 images. The size of each image in both training and test set is 128 64. These advantages of BDPCA make it a promising tool for mitigating the object detection problem. Table 2 shows a comparison between BDPCA, PCA and HOG. The detection performance is compared using the true positive rate at the false positive rate of 0.01. At the same false positive rate, performance of BDPCA is higher than that of HOG about 5% and that of PCA about 52%. The processing time of BDPCA is considered as the unit. Although, BDPCA requires more time than HOG because the reconstruction involves of matrix multiplication operators, this problem can be solved by code optimization and parallel processing. 5. Conclusions and future works This paper proposes a new efficient algorithm for pedestrian detection based on reconstructing images using BDPCA. A complete performance analysis was carried out and found that BDPCA vertical-edge descriptor was the most suitable feature for pedestrian detection. A comparison between BDPCA, PCA, and HOGbased methods proved the accuracy and robustness of the BDPCAbased method. Using vertical edge with BDPCA improves the performance of pedestrian detection about 5%. Although the proposed algorithm has a limitation on processing time, this problem can be solved by code optimization and parallel

T.-H.-B. Nguyen, H. Kim / Pattern Recognition 46 (2013) 2220 2227 2227 Table 2 Comparison between BDPCA, PCA and HOG. Algorithm BDPCA PCA HOG Training Fast Slow and encounter the problem of out of memory Fast when using large training set Feature Vertical gradient magnitude Intensity Histogram of gradient TP rate at FP rate of 0.01 92.02% 40.2% 86.9% Processing time 1 18 0.31 processing. In the future, in addition to improve processing time, this algorithm will be extended for other object detection problems, such as vehicle and animal detection. Besides solving the detection problem, the algorithm is also expected to be suitable for human action analysis. Conflict of interest statement None declared. Acknowledgement This work was supported by the Industrial Strategic Technology Development Program, 10039149, funded by MKE, Republic of Korea. References [1] W. Zuo, K. Wang, D. Zhang, Bi-directional PCA with assembled matrix distance metric, in: IEEE International Conference on Image Processing, vol. 2, 2005, pp. II-958 961. [2] W. Yang, C. Sun, L. Zhang, K. Ricanek, Laplacian bidirectional PCA for face recognition, Neurocomputing 74 (1 3) (2010) 487 493. [3] Q. Tian, H. Sun, Y. Luo, D. Hu, Nighttime pedestrian detection with a normal camera using svm classifier, in: Advances in Neural Networks ISNN 2005, vol. 3497, 2005, pp. 189 194. [4] D. Gavrila, Pedestrian detection from a moving vehicle, in: Sixth European Conference on Computer Vision-Part II, 2000, pp. 2241 2248. [5] S. Oh, S. Kim, S. Oh, Pedestrian detection with convolutional neural networks, pedestrian collision warning systems using neural networks based on a single camera, in: Fifteenth World Congress on Intelligent Transport Systems and ITS America s 2008 Annual Meeting, 2008, p. 12. [6] M. Szarvas, A. Yoshizawa, M. Yamamoto, J. Ogata, Pedestrian detection with convolutional neural networks, in: IEEE Intelligent Vehicles Symposium, 2005, pp. 224 229. [7] L. Malagon-Borja, O. Fuentes, Object detection using image reconstruction with PCA, Image and Vision Computing 27 (1 2) (2007) 2 9. [8] C. Papageorgiou, T. Poggio, A trainable system for object detection, International Journal of Computer Vision 38 (2000) 15 33. [9] A. Mohan, C. Papageorgiou, T. Poggio, Example-based object detection in images by components, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 349 361. [10] M. Enzweiler, D. Gavrila, Monocular pedestrian detection: survey and experiments, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (12) (2009) 2179 2195. [11] M. Jones, P. Viola, Detecting pedestrians using patterns of motion and appearance, in: IEEE International Conference on Computer Vision, 2003, pp. 734 741. [12] D. Geronimo, A.D. Sappa, A. Lopez, D. Ponsa, Pedestrian detection using adaboost learning of features and vehicle pitch estimation, in: Sixth IASTED International Conference on Visualization, 2006, pp. 400 405. [13] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 886 893. [14] N. Dalal, B. Triggs, C. Schmid, Human detection using oriented histograms of flow and appearance, in: European Conference on Computer Vision, 2006. [15] Q. Zhu, M.-C. Yeh, K.-T. Cheng, S. Avidan, Fast human detection using a cascade of histograms of oriented gradients, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2006, pp. 1491 1498. [16] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part based models, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (9) (2010) 1627 1645. [17] P. Felzenszwalb, R. Girshick, D. McAllester, Cascade object detection with deformable part models, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 37 49. [18] Q. Zhang, R. Klette, Object classification and tracking in video surveillance, in: Computer Analysis of Images and Patterns, vol. 2756, 2003, pp. 198 205. [19] V.V.R.N. Hota, A. Rajagopal, Shape based object classification for automated video surveillance with feature selection, in: Tenth International Conference on Information Technology (ICIT), 2007, pp. 97 99. [20] S. Boragno, B. Boghossian, D. Makris, S. Velastin, Object classification for realtime video surveillance applications, in: Fifth International Conference on Visual Information Engineering, 2008, pp. 192 197. [21] J. Yang, D. Zhang, A. Frangi, J.-Y. Yang, Two-dimensional PCA: a new approach to appearance-based face representation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (1) (2004) 131 137. [22] L. Wang, J. Shi, G. Song, I. Shen, Object detection combining recognition and segmentation, in: The Asian Conference on Computer Vision, 2007, pp. 189 199. Thi-Hai-Binh Nguyen received her BS in Applied Mathematics from Hanoi National University, Vietnam, and MS degree in Information Engineering at Inha University, Korea. She is currently pursuing PhD degree in Information Engineering at Inha University, Korea. Her study includes biometrics, pattern recognition and video surveillance system. Hakil Kim received BS degree in Control & Instrumentation Engineering from Seoul National University, Korea, in 1983, and MS and PhD degrees in Electrical and Computer Engineering from Purdue University in 1985 and 1990, respectively. He is currently a professor of School of Information & Communication Engineering at Inha University, Incheon, Korea, and a member of Biometrics Engineering Research Center (BERC) at Yonsei University, Seoul, Korea. He has been actively participating in the WG5 (Testing and Reporting) of ISO/IEC JTC1-SC37 and ITU-T/SG17 WP2/Q.9 Telebiometrics as a Rapporteur.