A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM M.Ranjbarikoohi, M.Menhaj and M.Sarikhani Abstract: Pedestrian detection has great importance in automotive vision systems due to the extreme variability of targets, lighting conditions, occlusion, and high-speed vehicle motion. In this paper, we aim to propose a simple and efficient strategy to accelerate and improve the existing pedestrian process. For this purpose, we used the features which were inspired from pseudowavelet transform to deal with pedestrian detection even in night or difficult conditions. In our work, some pedestrian and non-pedestrian candidates are collected and used to extract fundamental features. Fundamental features play main roles in performance of the proposed method and extracted based on pseudo-wavelet to detect edges and textures information on each objects. Our experiment result shows that applying SVM on these kinds of features leads to the better performance compared to other methods. Keywords: feature detection, pseudo wavelet, svm, pedestrian detection 1. Introduction Detecting pedestrians in images is one of the interesting topics which has been investigated by many researchers [1]. In spite of, it s simple definition it includes some complexities because of random influences such as scene structure, lighting or people s choice of clothing. Therefore, pedestrian detection problem remains still a challenging issue and continues to attract research. Although, many applications are considered for pedestrian detection but advanced driver assistance systems (ADASs) is the most prominent once. The overarching goal is to equip vehicles with sensing capabilities to detect and act on pedestrians in dangerous situations, where the driver would not be able to avoid a collision. A full ADAS with regard to pedestrians would as such not only include detection but also tracking, orientation, intent analysis, and collision prediction. The main issues corresponding to the pedestrian detection are: high variability in appearance among pedestrians, cluttered backgrounds, high dynamic scenes with both pedestrian and camera motion, and strict requirements in both speed and reliability. Some systems which work in Part-based detection seem intuitive to cope well with occlusion as they do not necessarily require the full body to be present to make detection. In addition, many existing systems are involved by a high false positive per frame (FPPF), something that a part-based system can reduce if requirements of several body parts to be detected are put in place [2,3]. In this paper, we propose a new strategy of feature extraction which improves the efficiency. In better words, we argue that by incorporating fundamental information as to the edges or textures on each object, we can design more Copyright c JEET IU This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
accurate features to detect pedestrians. State of art in our proposed method is from the point of view of visual perception, which pedestrians form a class of high intra-class similarity due to the strong regularities of up-right body shapes. In this work, we were inspired from original wavelet transform that was used on detecting objects. For instance, cascaded Haar-like features [4] have become the de-facto methods of choice in this area. The corresponding features are either determined by means of exhaustive searches over all possible variations [5] or by means of less exhaustive random sampling [6]. 2. Existing methods A lot of different methods are presented so far which pursuit pedestrian detection. These methods contain two part: feature extraction and classification. Most popular features for visual pedestrian detection are based on Histograms of Oriented Gradients (HOGs) as introduced in [3]. HOG features brought about significant improvements and therefore establish an important baseline. Felzenszwalb [6, 7] successfully employed HOG features in a part-based model for object detection; Walk [3] combined HOG features with self-similarity features related to color channels as well as motion features in order to better integrate spatial and temporal information. Deviating from the popular framework of HOG+SVM computations, Doll ar et al. An extension of this approach has been called the Fastest Pedestrian Detection in the West [7] and was shown to enable particularly fast multi scale detection. Due to its efficiency and reasonable performance, many new detectors [3, 6] therefore consider as a baseline and several authors obtained even better performance by extending the feature pool in various ways. The first attempts of using wavelets for pedestrian detection are found in [7] where it was demonstrated that wavelet templates can be used to define the shape of an object. Later, Papageorgiou et al. [8] proposed a similar yet more general system for object detection and, subsequently, Haar-like features became popular in the object detection community. The epitome of such approaches is found in the work by Viola and Jones [7] who used Haar like features in combination with boosting algorithms to build a successful face detector. However, Haar-like features are often discarded in pedestrian detection as they seem not to improve performance when combined with first-order channel features. In a closer analysis as to possible reasons for this behavior, we found that Haar-like templates that perform well for face detection are not necessarily suited for pedestrian detection as they may fail to capture visual characteristics of human body. As a remedy, we propose to design particularly tailored templates for up-right body shapes. 3. Proposed method This section of the paper presents methods used in the system for detection. The overall processing pipeline is shown in Figure 1. Fig.1: overall processing pipeline Our proposed method contains two major parts, coarse detection and Fine verification, besides our preprocessing. Our
preprocessing contains converting the color image to grey scale image, Histogram Equalization. Our feature extractor employs the multi-modal Haar-like features which are built on channel features as in [6], but interpret local differences between rectangular regions over multiple channels rather than over channel values themselves. We use here a Pseudo-wavelet based feature extraction technique. In better words, we used the following family of twodimensional kernels: '2 2 '2 ' x y x W ( x, y,,,,, ) exp( )cos(2 ) 2 2 ' x xcos( ) ysin( ) ' y xsin( ) ycos( ) (1) where x and y specify the position of a light impulse in the visual field and,,,, are parameters of the Pseudo-wavelet. We have chosen these parameters as bellow; Parameters Symbol Values Orientation 0, Wavelengt h Phase Gaussian Radius Aspect Ratio 2 3 4 5 6 7,,,,,, 8 8 8 8 8 8 8 4,4 2,8, 2 0, 2 1 8,16 A set of kernels is used with 5 spatial frequencies and 8 distinct orientations, this makes 40 different Pseudo-wavelets represented in the bellow figure. Fig.2: the Pseudo-wavelets After that, we convolve these filters with our test image, we obtain the filter responses. We find out that these representations display desirable locality and orientation performance. We have selected 5 sets of Pseudo-wavelets with different orientations as following. Pseudo-wavelets Orientations 5 0 10 7 0,, 8 8 15 2 4 0,, 8 8 20 25 0, 2 4,, 8 8 8 6 2 4 0,,,, 8 8 8 8 6 Then, we apply SVM for learning since it offers a convenient and fast approach to select from a large number of candidate features. Initial negative training samples are randomly generated and, afterwards, hard negative samples are searched for three rounds over all negative example images so as to collect negative samples in total. This multi-round training strategy is pivotal as it leads to a better performance than a simple one round training procedure with the same 984
number of negative samples. From our experiments, two rounds of retraining were observed to yield optimal performance; additional rounds did not show significant improvements. For training, we created several parts of the videos taken from the camera, then we clipped thousands of samples from the videos for the training process. We use positive samples and more than negative samples as our training dataset (Fig.3). Table 1: training and testing details. The training methodology consists of the following steps, shown in Fig 4. Fig.3: Training Samples From a set of labeled training images, we extract features and use them to train linear SVM s. We have used the MATLAB s svmtrain and svmclassify functions with their default settings for training and binary classification of testing data respectively. Details regarding training and testing methodology are presented in the following sections. For both feature vector descriptors, we have employed a similar training methodology as used in [4]. For initial training of SVM, we have used positive and negative sample windows. Retraining of SVM with hard examples reduces of false positive rate by almost 10%. Table 1 shows details of our dataset used for training and testing. 1. Take initial positive and negative window examples from training dataset and generate label vector. 2. Generate a feature vector set by encoding all positive and negative windows with the selected feature vector descriptor. 3. Generate a linear SVM model, using feature vector set and label vector. 4. Using selected the descriptor and the SVM model, search 15 negative training images exhaustively for false positives ( hard examples ). 5. Augment the initial training data with collected hard examples and retain the SVM. Fig.4. Training and Testing Methodology 985
A pedestrian can be detected in a scene by using brute force searching and testing of scale space in camera based pedestrian detection systems. For example, all sliding window based models involve feature extraction, dense multi-scale scanning of detection windows, and binary classification, followed by non-maximum suppression [4]. The other way can be to use some simple tests to generate possible candidate locations and then verify them by using more sophisticated methods [5], [6]. For detecting pedestrians in a scene, depending upon the technique used, the number and the values of parameters are quite diverse and their erroneous selection can make even a well-trained state of the art detector perform badly. 4. Experimental result For testing our proposed method, we have adopted per-window evaluation methodology. We measure the performance on cropped positive and negative image windows based on equally trained binary linear SVM classifiers. To quantify detector performance we plot Detection Error Tradeoff curves on a log-log scale miss rate or false negative rate versus False Positive Rate (FPR). Lower values are better. They present the same information as Receiver Operating Characteristics (ROC's) but allow small probabilities to be distinguished more easily. The bellow Figure presents a comparison of training dependencies of variants and its average value. They were trained on the fixed scale and tested the multi scale dataset. It seems average is more resilient to scale variations compared to proposed method and its variants. It also highlights the significance of training methodology and dataset in the performance of a detector. Fig5.Proposed method results Details of the detection result of our proposed method in facing different condition are shown in bellow table. In this table, Correct means the number of pedestrian which detected correctly and Missed means the number of missed pedestrian. Table 2: Detection rate of our proposed method Category Correct Missed Detection rate% Normal 450 5 98.9 Dusty 654 152 81.1 Vibrated 245 62 79.8 camera Noisy 236 47 83.4 Based on the results on Table 2, detection rate of the proposed method is up to 98.9% at normal condition. But, our performance gets worse when condition is varied. Although our strategy is new in facing undesirable conditions, but we used some other detection methods based on [7,8] to compare the performance of our proposed method more. These algorithms are well known and referred by many researchers at same works. Detection rates for these methods at same condition are mentioned in the bellow table. 986
Table 3: Detection rate of other methods Category Kalman based [8] HOG based [8] SIFT based [7] Normal 96.6 92.4 94.2 Dusty 70.4 60.1 64.1 Vibrated 58.4 48.7 55.3 camera Noisy 60.8 54.4 57.8 Some of our algorithm results are depicted in the bellow; 5. Conclusion In this paper, we have implemented a new pseudo wavelet based pedestrian detection algorithm. We have trained a linear SVM classifier using features. An experimental comparison with traditional average based feature descriptor was carried out regarding the detection rate. The comparative analysis shows that the proposed method exhibits better detection accuracy than others. REFERENCES [1] Wu, J. et al, 2011. Real-Time Human Detection Using Contour Cues. In Proc. ICRA, Shanghai, China, pp. 860-867. [2] I. Riaz, J. Piao and H. Shin, "Human Detection by Using CENTRIST Features for Thermal Images," in International Conference Computer Graphics, Visualization, Computer Vision and Image Processing, 2013. [3] Bertozzi, M., 2003. Pedestrian detection in infrared images. In Proc. IEEE Intelligent Vehicles Symp., Columbus, OH, pp. 662-667 [4] Dalal, N. and Triggs, B., 2005. Histograms of oriented gradients for human detection. In CVPR, USA, pp. 886 893. [5] Felzenszwalb, P.F. et al, 2008. A discriminatively trained, multiscale, deformable part model. In CVPR, Anchorage, Alaska. USA, pp. 1-8. [6] Maji, S. and Berg, A.C., 2009. Maxmargin additive classifiers for detection. In ICCV. Kyoto, Japan, pp. 40-47. 987
[7] Schwartz, W.R. et al, 2009. Human detection using partial least squares analysis. In ICCV, Kyoto, Japan, pp. 24-31. [8] Wang, X. et al, 2009, An HOG-LBP human detector with partial occlusion handling. In ICCV, Japan, pp. 32-39. [9] Mu, Y. et al, 2008. Discriminative local binary patterns for human detection in personal album. In CVPR, Anchorage, Alaska. USA, pp. 1-8. [10] Doll ar, P. et al, 2009. Integral channel features. In BMVC, London, England, pp. 1-11. 988