A Hybrid Approach for Real-Time Object Detection and Tracking to Cover Background Turbulence Problem

Indian Journal of Science and Technology, Vol 9(45), DOI: 10.17485/ijst/2016/v9i45/106346, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 A Hybrid Approach for Real-Time Object Detection and Tracking to Cover Background Turbulence Problem Pushkar Protik Goswami * and Dushyant Kumar Singh MNNIT Allahabad, Allahabad - 211004, Uttar Pradesh, India; pushkarprotik@gmail.com, dushyant@mnnit.ac.in Abstract Objectives: A number of techniques available in literature do not discuss the problem of incorrect object detection in the scenes having unstable or moving background. A novel hybrid technique is proposed in the paper to cover this problem. Methods/Statistical Analysis: The new approach proposed is the hybrid of two well known techniques for object detection. One is frame-differencing approach and other is skin colour modelling. This newer technique exploits the fact that the demerit of one technique works as merit for other and hence hybrid technique resolves the problem of moving background or turbulence in background. A real time video having turbulence in background is used for testing the efficiency of the approach. Findings: With accuracy of 0.97, the proposed approach outperforms the individual approaches i.e. frame-differencing and skin colour modelling. A very low value of False Positive Rate (FPR) for proposed approach compared to other approaches confirms the least incorrect detections. High value of True Positive Rate (TPR) conveys that fall out of correct object is least in the video by proposed approach. Results show that the proposed approach better applicable for background with turbulence. Application/Improvements: Automatic object detection and tracking in applications as surveillance are tractable with the proposed hybrid approach. Keywords: Frame-Differencing, Object Detection, Object Tracking, Skin Colour, Thresholding 1. Introduction The problem of moving object detection is becoming important in many application areas such as traffic monitoring, visual surveillance, driver assistance, auto driving and human computer interaction. These applications have some object of interest for detection and to be tracked for their activity. In surveillance and traffic monitoring, the moving objects are tracked for various important information as object position, momentum, speed, direction of movement etc. A number of approaches are proposed by researchers to detect and track moving objects in a scene. One approach is object classification based on the features matched for that object or class of object. Distinguished features are defined for every distinct object. A classification algorithm distinguishes object from non-object/background part of the image. This technique is good in detecting both moving and static objects. This technique is more time complex as it involves feature computation and n-stage classification. Since the object of interest is the moving object, the other less complex approaches are also proposed in the literature, such as background subtraction and three frame differencing 1,2. In background subtraction approach, the background or the static part of the image is modeled. And for detecting object in the subsequent frames containing object are background subtracted i.e. the modelled background is subtracted from the current frame to be processed. Pixels against static portion in the resulting frame become zero while pixels of moving object remain to be non-zero. Such pixels can fairly be distinguished to detect object. In three-frame differencing approach, no initial * Author for correspondence

A Hybrid Approach for Real-Time Object Detection and Tracking to Cover Background Turbulence Problem computation for background modelling is done. This is exploited from the fact that object of interest is actually the moving object 3. And the difference of two consecutive frames will give non-zero value at the boundary of the moving object. This approach sometimes face problem of Fallout in detection i.e. difference frame is completely black even if object is in the scene. This is due to the instantaneous static behaviour of the object or if the frame sampling rate is not adjusted accordingly. There are other problems also, that are faced by the above discussed approaches. Improper detection accuracy i.e. low TPR or high FPR arising due to noise, shadows of objects, changing illumination and background clutter. The modified versions of background subtraction and frame differencing approaches are able to cover some these issues 4,5. But a problem of non-static background or turbulence in background is not yet covered by any of these. Here, in this paper we propose a solution to the problem of background turbulence in the object detection. The experiments are done here for detecting human object in an outdoor surveillance video. Rest of the paper is organized as follows: Section 2 details the literature survey. The proposed method is discussed in Section 3 and experimental results are shown in Section 4. The paper is concluded in Section 5 with directions to the future work. Detection of object in a single frame can be achieved by characterizing object based on some features. Feature is the encoded information that characterizes a class of an object. Colour, texture, shape is the primary information used to create features 6. The features popularly used for human objects are Haar-like features and HoG features. Haar-like features were given by Viola Jones in his research for a robust human face detection technique 7. Haar features are specifically used for face detection and uses texture information of face. HoG (Histogram of Gradients) features are used for face and pedestrians 8. Classifiers are trained for particular object class with the features extracted from training dataset. The classifiers used are Cascade classifier, SVM, Neural Network etc. SVM is used with HoG features in 9 and Cascaded classifiers with Haar-like features in 10,11. Computation of these features and then classification increases the complexity of this approach. A number of researchers in the literature have adopted the background subtraction method for object detection. Background modelling is the first step of this approach and is achieved by various ways/methods 12. Temporal filtering 13, single Gaussian Model 14, Gaussian Mixture Model (GMM) 15 and Local Binary Pattern (LBP) 16 are some of those methods. Metrically trimmed mean as the estimate to the background model is used in 17. The problem of shadow removal for accurate detection is covered in 5. Frame differencing is a recursive variant of background subtraction. In this, the previous frame is treated as background and differencing is done to identify change at boundary of moving object 12. The simple pixel wise difference of the two consecutive image frames is done in 18 for moving object detection. The problem due to illumination and slight background movements (e.g. of tree leaves) are resolved by reducing spatial resolution of image to be processed 19. The problem of detection becomes more cumbersome when the movement of background grows to quite reasonable size. This means that the binary image of difference frame generated have white spots in large quantity other than for actual object. This is noise to the detection. To overcome this problem and achieve accurate detection, a hybrid approach is proposed here. This uses three frame differencing and skin colour modelling for human. 2. Proposed Method The process of object detection and tracking, either by using three-frame differencing or background subtraction approach requires some kind of pre-processing on input and post-processing on the output to get higher accuracy in detection. The complete process followed here for realtime object detection and tracking which also covers the background movement problem is presented in Figure 1. 2 Vol 9 (45) December 2016 www.indjst.org Indian Journal of Science and Technology

Pushkar Protik Goswami and Dushyant Kumar Singh Figure 1. Data flow with processing blocks for object detection and tracking. 2.1 Pre-processing The frames captured from real-time streamed video are first processed for sampling. The need of frame sampling is to fasten up the complete process. Other reason is to avoid the null difference in consecutive frames, occurred due to momentarily static behaviour of moving object or a very slow motion. The frame rate of video captured in our experiment is 15 FPS. And the sampling done is 1 frame every 3 frames of video. The sampled frames are next processed for filtering with a 3 x 3 median filter and Histogram Equalization. Filtering is done to avoid the probability of effect due to noise, more often seen in images of outdoor surveillance. Histogram equalization is done for contrast enhancement i.e. to normalize effect of ambient light. If the sampled frame is then the frame after applying median filter becomes. After histogram equalization the frame we get is. Histogram equalization is a standard operation with readymade operator available in MATLAB and other image processing platforms. 2.2 Algorithmic Processing The frames after pre-processing are applied to threeframe differencing and skin colour modelling in parallel. 2.2.1 Three-Frame Differencing Suppose at any time, three frames received after preprocessing are: = - (1) = - (2) = (3) Then and are the first order differences of two consecutive frames and is the union of these two differences. In the block diagram, is actually shown as a second order difference Δ 2 x. Final is the binary image where all the intensities other than black are made white. 2.2.2 Skin Colour Modelling The second parallel operation is the skin colour modelling. In our experiment, this is actually not the modelling but using the skin threshold achieved after the skin colour modelling. The skin colour modelling is the approach to generate best threshold limits that if applied in any image can distinguish skin and non-skin regions of the image 20. This can be done in any of the colour space, but choosing an appropriate colour space is again a question. The objective of choosing a right colour space is to achieve the best classification accuracy for skin color. The most of the researches in skin color modelling had used YCbCr color space and more specifically the C b and C r planes of YCbCr space 21. This is because the Cb and Cr are the chrominance components and have negligible impact of ambient lighting and helps in accurate thresholding. In 22 defines the Cb, Cr threshold for skin color by the following inequalities: 77 < C b < 127 (4) 133 < C r < 173 (5) A binary image is obtained by applying these C b and C r threshold limits. 2.2.3 Intersection The binary images received from two blocks are operated for intersection. The information we get in the final image is the one common to both. This removes off the information irrelevant to the object detection. Vol 9 (45) December 2016 www.indjst.org Indian Journal of Science and Technology 3

A Hybrid Approach for Real-Time Object Detection and Tracking to Cover Background Turbulence Problem Skin colour modelling is used to nullify the effect of background movement. But skin color based thresholding adds one problem that some background spaces which nearly match the skin color are also identified as object. This problem anyhow is normalized by frame-differencing. Figure 2 gives a better picture of this. Therefore this hybrid approach is better approach for object detection and tracking to cover up various environmental and processing constraints. 2.3 Post Processing Post processing involves filtering, dilation and then geometry fitting for marking the object detected. Filtering is done to remove any very fine patch (white pixel) other than object. Dilation is done to expand the detected blobs of white pixels, which are the candidate to the object in the frame. Then blob geometry is calculated and matched for geometry of face. And the matched one is marked by a yellow rectangle. The aspect ratio of face width and height is the measure of face geometry. 3. Experimental Results The results are shown on a video captured at my own location and by my students. The video captured is 46 second long and the frame rate is 15 FPS. Figure 3 shows (a) (b) (c) (d) Figure 2. (a) Is the original frame, (b) Is after 3-frame difference, (c) Is after skin threshold and (d) Is intersection of (b) and (c). Figure 3. Image results of the proposed approach, in row 1 are the original frames, three-frame difference in row 2, skin color threshold in row 3, in row 4 are the intersection of row 2 and row 3 frames. And in row 5 final detected object with yellow rectangle. 4 Vol 9 (45) December 2016 www.indjst.org Indian Journal of Science and Technology

Pushkar Protik Goswami and Dushyant Kumar Singh the result of detection with the intermediate results for some of the selected frames of the video. The results are also derived for accurate detection of the moving object in the video by the proposed method. These results are also compared with the results is detection is exclusively been performed with three-frame differencing approach and the skin colour modelling approach. The measures used for detection are True Positive Rate (TPR), False Positive Rate (FPR) and Accuracy. Table 1 shows the comparison results of these approaches for TPF, FPR and Accuracy. TPR = TP / (TP + FN) (6) FPR = FP / (FP + TN) (7) Accuracy = (TP + TN) / (TP+FN+TN+FP) (8) Where TP, FP, TN and FN are defined in compliance with problem as: TP is number of frames in which object detected when actually it is present. FP is number of frames in which object detected when actually it is not present. TN is number of frames in which object not detected when actually it is not present. FN is number of frames in which object not detected when actually it is present. Table 1. Comparison results of the three methods for object Methods TPR FPR Accuracy Three-frame differencing 0.66 0.9 0.46 Skin Colour Modelling 0.81 0.78 0.64 Proposed Method 0.99 0.25 0.97 The same result is also shown by the graphs in Figure 4 and Figure 5. Figure 5. Graph showing accuracy for the three approaches. The 0.99 value of TPR denotes a high rate of correct detection and a very small value of FPR shows the correct identification of object. The accuracy also reveals the same fact with a value of 0.97. The proposed approach has a higher accuracy than other two. The image results also clarifies that the effect of moving background i.e. tree and leaves is nullified for an accurate detection of object and higher accuracy. 4. Conclusion The problem of wrong detection and certainly low TPR and high FPR in case of three-frame differencing due to background movement is resolved by fusing skin color modelling approach. The result on the given video shows good result. This further could realistically work for the video streams having much more background movement. Another important fact is that the skin color modelling alone could not do the right job as the problem of background color matching skin color comes in picture. So, the fusion of the two approaches better derives the good results. Figure 4. Graph of TPR vs. FPR and accuracy. 5. References 1. Power PW, Schoonees JA. Understanding background mixture models for foreground segmentation. Proceedings Image and Vision Computing; New Zealand. 2002. p. 267 71. 2. Xiong W. Moving object detection algorithm based on background subtraction and frame differencing. Proceedings of IEEE 30th Chinese Control Conference (CCC); China. 2011. p. 3273 6. 3. Rita C. Improving shadow suppression in moving object detection with HSV color information. Proceedings of Intelligent Transportation Systems; Oakland Calif. 2001. p. 334 9. Vol 9 (45) December 2016 www.indjst.org Indian Journal of Science and Technology 5

A Hybrid Approach for Real-Time Object Detection and Tracking to Cover Background Turbulence Problem 4. Marko H, Pietikainen M. A texture-based method for modelling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence. USA. 2006:28(4):657 62. 5. Jianhua Y, Gao T, Zhang J. Moving object detection with background subtraction and shadow removal. Proceedings of 9th International Conference on Fuzzy Systems and Knowledge Discovery); China. 2012. 6. Jianxin W. C 4 : A real-time object detection framework. IEEE Transactions on Image Processing. 2013; 22(10):4096 107. 7. Paul V, Jones M. Robust real-time object detection. International Journal of Computer Vision. 2001; 4:51 2. 8. Navneet D, Triggs B. Histograms of oriented gradients for human detection. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. USA. 2005; 1(1):886 93. 9. Piotr D. Pedestrian detection: A benchmark. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; USA. 2009. p. 1 8. 10. Jianxin W. Fast asymmetric learning for cascade face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008; 30(3):369 82. 11. Jianxin W, Geyer C, Rehg JM. Real-time human detection using contour cues. Proceedings of IEEE Conference on Robotics and Automation; Shanghai, China. 2011. p. 860 7. 12. Intan K, Mohamed SS. Frame differencing with post-processing techniques for moving object detection in outdoor environment. Proceedings of IEEE 7th International Colloquium on Signal Processing and its Applications. 2011; 2(1):263 6. 13. Ramprasad P, Nelson R. Low level recognition of human motion (or how to get your man without finding his body parts). Proceedings of the IEEE Workshop on Motion of Non-Rigid and Articulated Objects; USA. 1994. 14. Richard WC. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence.1997; 19(7):780 5. 15. Chris S, Grimson ELW. Adaptive background mixture models for real-time tracking. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; Cambridge. 1999. p. 1 7. 16. Marko H, Pietikainen M. A texture-based method for modelling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006; 28(4):657 62. 17. Rosito JC. Efficient background subtraction and shadow removal for monochromatic video sequences. IEEE Transactions on Multimedia. 2009; 11(3):571 7. 18. Alan JL, Fujiyoshi H, Patil RS. Moving target classification and tracking from real-time video. Proceedings of Fourth IEEE Workshop on Applications of Computer Vision WACV 98; 1998. 19. Budi S. Tracking of moving objects by using a low resolution image. Proceedings of Second IEEE International Conference on Innovative Computing, Information and Control, ICICIC 07; 2007. 20. Vezhnevets V, Sazonov V, Andreeva A. A survey on pixel-based skin color detection techniques. Proceedings of Graphicon; 2003. 21. Lam PS, Bouzerdoum A, Chai D. A novel skin color model in ycbcr color space and its application to human face detection. Proceedings of IEEE International Conference on. Image Processing; 2002. 22. Douglas C, Ngan KN. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology; 1999. p. 551 64. 6 Vol 9 (45) December 2016 www.indjst.org Indian Journal of Science and Technology