A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

Similar documents
Human detection using local shape and nonredundant

Object Detection Design challenges


HOG-based Pedestriant Detector Training

Visual Detection and Species Classification of Orchid Flowers

Recent Researches in Automatic Control, Systems Science and Communications

Category-level localization

Histogram of Oriented Gradients (HOG) for Object Detection

Object Category Detection: Sliding Windows

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple-Person Tracking by Detection

Face Detection and Alignment. Prof. Xin Yang HUST

Previously. Window-based models for generic object detection 4/11/2011

Development in Object Detection. Junyuan Lin May 4th

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Histogram of Oriented Gradients for Human Detection

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Histograms of Oriented Gradients for Human Detection p. 1/1

Human detection solution for a retail store environment

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Selective Search for Object Recognition

REAL TIME TRACKING OF MOVING PEDESTRIAN IN SURVEILLANCE VIDEO

Object detection using non-redundant local Binary Patterns

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Object Category Detection: Sliding Windows

Exploring Bag of Words Architectures in the Facial Expression Domain

Haar Wavelets and Edge Orientation Histograms for On Board Pedestrian Detection

A novel template matching method for human detection

Pedestrian Detection in Infrared Images based on Local Shape Features

Recap Image Classification with Bags of Local Features

Window based detectors

Generic Object-Face detection

Real-Time Human Detection using Relational Depth Similarity Features

Category vs. instance recognition

Pedestrian Detection with Occlusion Handling

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Pedestrian Detection and Tracking in Images and Videos

A Novel Extreme Point Selection Algorithm in SIFT

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

The Population Density of Early Warning System Based On Video Image

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Find that! Visual Object Detection Primer

Real Time Stereo Vision Based Pedestrian Detection Using Full Body Contours

Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection

Object recognition (part 1)

Adaptive Cell-Size HoG Based. Object Tracking with Particle Filter

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Detection of a Single Hand Shape in the Foreground of Still Images

Object Tracking using HOG and SVM

Human Motion Detection and Tracking for Video Surveillance

Multi-Object Tracking Based on Tracking-Learning-Detection Framework

Traffic Sign Localization and Classification Methods: An Overview

Deformable Part Models

Parallel Tracking. Henry Spang Ethan Peters

High-Level Fusion of Depth and Intensity for Pedestrian Classification

Human detections using Beagle board-xm

Real-time pedestrian detection with the videos of car camera

Research on Robust Local Feature Extraction Method for Human Detection

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Subject-Oriented Image Classification based on Face Detection and Recognition

Detecting and Segmenting Humans in Crowded Scenes

Epithelial rosette detection in microscopic images

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Towards Practical Evaluation of Pedestrian Detectors

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

Distance-Based Descriptors and Their Application in the Task of Object Detection

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

GPU-based pedestrian detection for autonomous driving

Detecting Object Instances Without Discriminative Features

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Spatio-temporal Feature Classifier

Human Object Classification in Daubechies Complex Wavelet Domain

Learning to Recognize Faces in Realistic Conditions

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Relational HOG Feature with Wild-Card for Object Detection

Minimizing hallucination in Histogram of Oriented Gradients

Detecting Pedestrians by Learning Shapelet Features

Fast Human Detection with Cascaded Ensembles. Berkin Bilgiç

Discriminative classifiers for image recognition

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

DPM Score Regressor for Detecting Occluded Humans from Depth Images

A robust method for automatic player detection in sport videos

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Detecting Lane Departures Using Weak Visual Features

CS 223B Computer Vision Problem Set 3

SLIDING WINDOW BASED MICRO-EXPRESSION SPOTTING: A BENCHMARK

A Real Time Human Detection System Based on Far Infrared Vision

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Transcription:

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM M.Ranjbarikoohi, M.Menhaj and M.Sarikhani Abstract: Pedestrian detection has great importance in automotive vision systems due to the extreme variability of targets, lighting conditions, occlusion, and high-speed vehicle motion. In this paper, we aim to propose a simple and efficient strategy to accelerate and improve the existing pedestrian process. For this purpose, we used the features which were inspired from pseudowavelet transform to deal with pedestrian detection even in night or difficult conditions. In our work, some pedestrian and non-pedestrian candidates are collected and used to extract fundamental features. Fundamental features play main roles in performance of the proposed method and extracted based on pseudo-wavelet to detect edges and textures information on each objects. Our experiment result shows that applying SVM on these kinds of features leads to the better performance compared to other methods. Keywords: feature detection, pseudo wavelet, svm, pedestrian detection 1. Introduction Detecting pedestrians in images is one of the interesting topics which has been investigated by many researchers [1]. In spite of, it s simple definition it includes some complexities because of random influences such as scene structure, lighting or people s choice of clothing. Therefore, pedestrian detection problem remains still a challenging issue and continues to attract research. Although, many applications are considered for pedestrian detection but advanced driver assistance systems (ADASs) is the most prominent once. The overarching goal is to equip vehicles with sensing capabilities to detect and act on pedestrians in dangerous situations, where the driver would not be able to avoid a collision. A full ADAS with regard to pedestrians would as such not only include detection but also tracking, orientation, intent analysis, and collision prediction. The main issues corresponding to the pedestrian detection are: high variability in appearance among pedestrians, cluttered backgrounds, high dynamic scenes with both pedestrian and camera motion, and strict requirements in both speed and reliability. Some systems which work in Part-based detection seem intuitive to cope well with occlusion as they do not necessarily require the full body to be present to make detection. In addition, many existing systems are involved by a high false positive per frame (FPPF), something that a part-based system can reduce if requirements of several body parts to be detected are put in place [2,3]. In this paper, we propose a new strategy of feature extraction which improves the efficiency. In better words, we argue that by incorporating fundamental information as to the edges or textures on each object, we can design more Copyright c JEET IU This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

accurate features to detect pedestrians. State of art in our proposed method is from the point of view of visual perception, which pedestrians form a class of high intra-class similarity due to the strong regularities of up-right body shapes. In this work, we were inspired from original wavelet transform that was used on detecting objects. For instance, cascaded Haar-like features [4] have become the de-facto methods of choice in this area. The corresponding features are either determined by means of exhaustive searches over all possible variations [5] or by means of less exhaustive random sampling [6]. 2. Existing methods A lot of different methods are presented so far which pursuit pedestrian detection. These methods contain two part: feature extraction and classification. Most popular features for visual pedestrian detection are based on Histograms of Oriented Gradients (HOGs) as introduced in [3]. HOG features brought about significant improvements and therefore establish an important baseline. Felzenszwalb [6, 7] successfully employed HOG features in a part-based model for object detection; Walk [3] combined HOG features with self-similarity features related to color channels as well as motion features in order to better integrate spatial and temporal information. Deviating from the popular framework of HOG+SVM computations, Doll ar et al. An extension of this approach has been called the Fastest Pedestrian Detection in the West [7] and was shown to enable particularly fast multi scale detection. Due to its efficiency and reasonable performance, many new detectors [3, 6] therefore consider as a baseline and several authors obtained even better performance by extending the feature pool in various ways. The first attempts of using wavelets for pedestrian detection are found in [7] where it was demonstrated that wavelet templates can be used to define the shape of an object. Later, Papageorgiou et al. [8] proposed a similar yet more general system for object detection and, subsequently, Haar-like features became popular in the object detection community. The epitome of such approaches is found in the work by Viola and Jones [7] who used Haar like features in combination with boosting algorithms to build a successful face detector. However, Haar-like features are often discarded in pedestrian detection as they seem not to improve performance when combined with first-order channel features. In a closer analysis as to possible reasons for this behavior, we found that Haar-like templates that perform well for face detection are not necessarily suited for pedestrian detection as they may fail to capture visual characteristics of human body. As a remedy, we propose to design particularly tailored templates for up-right body shapes. 3. Proposed method This section of the paper presents methods used in the system for detection. The overall processing pipeline is shown in Figure 1. Fig.1: overall processing pipeline Our proposed method contains two major parts, coarse detection and Fine verification, besides our preprocessing. Our

preprocessing contains converting the color image to grey scale image, Histogram Equalization. Our feature extractor employs the multi-modal Haar-like features which are built on channel features as in [6], but interpret local differences between rectangular regions over multiple channels rather than over channel values themselves. We use here a Pseudo-wavelet based feature extraction technique. In better words, we used the following family of twodimensional kernels: '2 2 '2 ' x y x W ( x, y,,,,, ) exp( )cos(2 ) 2 2 ' x xcos( ) ysin( ) ' y xsin( ) ycos( ) (1) where x and y specify the position of a light impulse in the visual field and,,,, are parameters of the Pseudo-wavelet. We have chosen these parameters as bellow; Parameters Symbol Values Orientation 0, Wavelengt h Phase Gaussian Radius Aspect Ratio 2 3 4 5 6 7,,,,,, 8 8 8 8 8 8 8 4,4 2,8, 2 0, 2 1 8,16 A set of kernels is used with 5 spatial frequencies and 8 distinct orientations, this makes 40 different Pseudo-wavelets represented in the bellow figure. Fig.2: the Pseudo-wavelets After that, we convolve these filters with our test image, we obtain the filter responses. We find out that these representations display desirable locality and orientation performance. We have selected 5 sets of Pseudo-wavelets with different orientations as following. Pseudo-wavelets Orientations 5 0 10 7 0,, 8 8 15 2 4 0,, 8 8 20 25 0, 2 4,, 8 8 8 6 2 4 0,,,, 8 8 8 8 6 Then, we apply SVM for learning since it offers a convenient and fast approach to select from a large number of candidate features. Initial negative training samples are randomly generated and, afterwards, hard negative samples are searched for three rounds over all negative example images so as to collect negative samples in total. This multi-round training strategy is pivotal as it leads to a better performance than a simple one round training procedure with the same 984

number of negative samples. From our experiments, two rounds of retraining were observed to yield optimal performance; additional rounds did not show significant improvements. For training, we created several parts of the videos taken from the camera, then we clipped thousands of samples from the videos for the training process. We use positive samples and more than negative samples as our training dataset (Fig.3). Table 1: training and testing details. The training methodology consists of the following steps, shown in Fig 4. Fig.3: Training Samples From a set of labeled training images, we extract features and use them to train linear SVM s. We have used the MATLAB s svmtrain and svmclassify functions with their default settings for training and binary classification of testing data respectively. Details regarding training and testing methodology are presented in the following sections. For both feature vector descriptors, we have employed a similar training methodology as used in [4]. For initial training of SVM, we have used positive and negative sample windows. Retraining of SVM with hard examples reduces of false positive rate by almost 10%. Table 1 shows details of our dataset used for training and testing. 1. Take initial positive and negative window examples from training dataset and generate label vector. 2. Generate a feature vector set by encoding all positive and negative windows with the selected feature vector descriptor. 3. Generate a linear SVM model, using feature vector set and label vector. 4. Using selected the descriptor and the SVM model, search 15 negative training images exhaustively for false positives ( hard examples ). 5. Augment the initial training data with collected hard examples and retain the SVM. Fig.4. Training and Testing Methodology 985

A pedestrian can be detected in a scene by using brute force searching and testing of scale space in camera based pedestrian detection systems. For example, all sliding window based models involve feature extraction, dense multi-scale scanning of detection windows, and binary classification, followed by non-maximum suppression [4]. The other way can be to use some simple tests to generate possible candidate locations and then verify them by using more sophisticated methods [5], [6]. For detecting pedestrians in a scene, depending upon the technique used, the number and the values of parameters are quite diverse and their erroneous selection can make even a well-trained state of the art detector perform badly. 4. Experimental result For testing our proposed method, we have adopted per-window evaluation methodology. We measure the performance on cropped positive and negative image windows based on equally trained binary linear SVM classifiers. To quantify detector performance we plot Detection Error Tradeoff curves on a log-log scale miss rate or false negative rate versus False Positive Rate (FPR). Lower values are better. They present the same information as Receiver Operating Characteristics (ROC's) but allow small probabilities to be distinguished more easily. The bellow Figure presents a comparison of training dependencies of variants and its average value. They were trained on the fixed scale and tested the multi scale dataset. It seems average is more resilient to scale variations compared to proposed method and its variants. It also highlights the significance of training methodology and dataset in the performance of a detector. Fig5.Proposed method results Details of the detection result of our proposed method in facing different condition are shown in bellow table. In this table, Correct means the number of pedestrian which detected correctly and Missed means the number of missed pedestrian. Table 2: Detection rate of our proposed method Category Correct Missed Detection rate% Normal 450 5 98.9 Dusty 654 152 81.1 Vibrated 245 62 79.8 camera Noisy 236 47 83.4 Based on the results on Table 2, detection rate of the proposed method is up to 98.9% at normal condition. But, our performance gets worse when condition is varied. Although our strategy is new in facing undesirable conditions, but we used some other detection methods based on [7,8] to compare the performance of our proposed method more. These algorithms are well known and referred by many researchers at same works. Detection rates for these methods at same condition are mentioned in the bellow table. 986

Table 3: Detection rate of other methods Category Kalman based [8] HOG based [8] SIFT based [7] Normal 96.6 92.4 94.2 Dusty 70.4 60.1 64.1 Vibrated 58.4 48.7 55.3 camera Noisy 60.8 54.4 57.8 Some of our algorithm results are depicted in the bellow; 5. Conclusion In this paper, we have implemented a new pseudo wavelet based pedestrian detection algorithm. We have trained a linear SVM classifier using features. An experimental comparison with traditional average based feature descriptor was carried out regarding the detection rate. The comparative analysis shows that the proposed method exhibits better detection accuracy than others. REFERENCES [1] Wu, J. et al, 2011. Real-Time Human Detection Using Contour Cues. In Proc. ICRA, Shanghai, China, pp. 860-867. [2] I. Riaz, J. Piao and H. Shin, "Human Detection by Using CENTRIST Features for Thermal Images," in International Conference Computer Graphics, Visualization, Computer Vision and Image Processing, 2013. [3] Bertozzi, M., 2003. Pedestrian detection in infrared images. In Proc. IEEE Intelligent Vehicles Symp., Columbus, OH, pp. 662-667 [4] Dalal, N. and Triggs, B., 2005. Histograms of oriented gradients for human detection. In CVPR, USA, pp. 886 893. [5] Felzenszwalb, P.F. et al, 2008. A discriminatively trained, multiscale, deformable part model. In CVPR, Anchorage, Alaska. USA, pp. 1-8. [6] Maji, S. and Berg, A.C., 2009. Maxmargin additive classifiers for detection. In ICCV. Kyoto, Japan, pp. 40-47. 987

[7] Schwartz, W.R. et al, 2009. Human detection using partial least squares analysis. In ICCV, Kyoto, Japan, pp. 24-31. [8] Wang, X. et al, 2009, An HOG-LBP human detector with partial occlusion handling. In ICCV, Japan, pp. 32-39. [9] Mu, Y. et al, 2008. Discriminative local binary patterns for human detection in personal album. In CVPR, Anchorage, Alaska. USA, pp. 1-8. [10] Doll ar, P. et al, 2009. Integral channel features. In BMVC, London, England, pp. 1-11. 988