Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Daegeon Kim Sung Chun Lee Institute for Robotics and Intelligent Systems University of Southern California, Los Angeles CA 90089 {daegeonk, SungChun.Lee}@usc.edu Abstract In this paper, we describe Object Pixel Mixture Classifiers (OPMCs) which classify an object not only apart from background but also from other objects based on Gaussian Mixture Model (GMM) classification. The proposed OPMC is different from general GMM based classifiers in the respect that novel pairwise threshold is applied for final classification. Pairwise thresholds are different thresholds depending on predicted mixture component index combination by a positive and a negative GMMs. We train the pairwise threshold using discriminative model so that generative GMM can take advantage from it. We demonstrate that OPMCs are robust to noise in train data and can keep tracking objects after missing tracks even with occlusion. Also, we show that OPMCs can generate meaningful blob of object, and can separate the region of objects from merged blobs. Keywords-Gaussian Mixture Classification, Pairwise Threshold, Human Tracking I. INTRODUCTION Human tracking is important for many applications such as video surveillance and human-computer interaction, and tracking information can support the development and performance of higher level intelligent systems. The extraction of reliable human tracks is often challenging due to unpredictable environments, human articulations, and selfor inter-occlusions of human. Recently, object detection based tracking approaches have been widely used for human tracking [1], [2], [3]. But, these approaches often generate broken tracklets due to low performance of the human detector where there are noisy or low contrast images, inter- or intra-occlusions, or very different human poses from the learnt examples. In the case that a surveillance camera is installed statically, we exploit scene knowledge regarding entrances and exits of the scene, such as left/right or top/bottom fringes of image. Sometimes, there exists doors in the middle of image which humans can enter or exit, but it is still possible to manually locate them beforehand. Under this assumption, when a human track stops in nonexit areas, we consider that the human track is lost due to lack of reliable detection responses caused by the issues mentioned above. In this paper, we propose a novel method to enhance broken tracklets when human tracks are lost at non-exit areas. Many data association mechanisms have been suggested to address the broken tracklets problem [1], [2], [3]. For example, Breitenstein et al. [1] employed particle filtering with online-trained object specific classifier for data association. Yang et al. [2] associated tracklets using hybrid boosting algorithm that ranks priority of tracklets to be connected and classifies false association. Similarly, Yang et al. [3] trained Conditional Random Field (CRF) with which track segments, tracklets, as features. We propose an Gaussian Mixture Model (GMM) based Object Pixel Mixture Classifiers (OPMCs) which classifies a specific object not only apart from background, but also apart from other objects to continuously track humans after its track given as an input being broken (or stopped). Note that this method does not use on-line learning of GMM in the sense that a GMM model is trained once when a track gets lost. The proposed OPMCs are different from general GMM classifier in the respect that a novel pairwise adaptive threshold method is applied for final classification of object pixels. Various GMM researches have been conducted in the field of how-to extract features [4], [5] and how-to train parameters [6], [7], [8]. But there was less attention on how to select the threshold value for classification. Shi et al. [9] and Stauffer et al. [8] proposed the time varying threshold for Background Subtraction (BS) and tracking to solve the problem caused by gradually changing pixel wise intensities. They used notion of adaptive (or dynamic) in the sense that it keeps changing to classify pixels into foreground or background considering their appearance changes. But it is one fixed threshold at a time and this is the significant difference from our pairwise threshold. When a feature vector of a pixel is given, each human and background GMM classifies it into the closest mixture component in the respective GMM. General GMM approaches would use single threshold value to decide whether the given pixel belongs to human or background region simply based on the ratio between marginal probabilities of GMMs (positive and negative). The human model consists of a set of relatively small regions in the image while the background model take a large portion of the image. Low weight mixture components of human GMM can be discarded by high

weight with high covariance mixture component of the background GMM using a fixed threshold although posterior probability of the human part mixture component is higher. To overcome this problem, we derived a threshold value per each pair of a positive mixture component as human and a negative component as background. In other words, we apply different threshold value for each combinations of predicted mixture component index from human and background GMMs. We adapt this concept to inter-object classification by the proposed OPMCs. The main contributions of our work are: 1) Using novel concept of OPMC to classify individual human after a certain amount of tracks, 2) Applying OPMCs to enhance human tracking by addressing the broken tracklets issue, and 3) Deriving novel concept of pairwise threshold. The rest of this paper is organized as followings. General GMM classification and the problem of using a fixed threshold is discussed in Section 1 followed by pairwise threshold in Section 2. Section 3 describes about OPMC, and experimental result is presented in Section 4. Finally, Section 5 concludes this paper. II. GAUSSIAN MIXTURE MODEL CLASSIFICATION A. Basics GMM classifications are widely used for Background Subtraction (BS), moving object detection, and skin pixel detection, etclet@tokeneonedot. GMM classifiers classify a data using a positive and a negative GMMs defined as: p(x +) = p(x ) = N ω + i N(x; µ+ i, Σ+ i ), i=1 N i=1 ω + i = 1 (1) M M ω j N(x; µ j, Σ j ), ω j = 1 (2) j=1 j=1 where p(x +) and p(x ) are the positive and the negative marginal probabilities of a data x, N and M are the number of mixtures in each model, ω + i and ω j are i th and j th mixture weights which are the same as prior probability of the mixtures, and N(x; µ + i, Σ+ i ) and N(x; µ j, Σ j ) are posterior probabilities of data given mixture component. A data (x) is classified as either class by Bayes rule: x + if p(+ x) p( x) p(x +) p(x ) ɛ otherwise where ɛ is a predefined threshold implying loss function. Often only a positive GMM (Eq.1) is modeled and a decision is made by its marginal probability. B. The problem of a fixed threshold When a multicolor object and its background are modeled by GMMs with color intensities as feature, some parts of the object can be classified as negative even though their (3) posterior probabilities are high in the positive GMM. Let us give you a simple example. Graphs in Figure 1 represent the positive (blue) and the negative (red) GMMs trained from each train data. When a test vector is given which corresponds to the arrow in Figure 1, the 3rd mixture s posterior probability of the data (N(x; µ + 3, Σ+ 3 )) is higher than those of others. Nevertheless, since the prior probability of the mixture (ω 3 + ) is low, the marginal probability of the negative class (p(x )) is higher than that of positive (p(x +)). With a fixed threshold, it is hard to discriminate between the high posterior with the low prior and the low posterior with the high prior. Decreasing the threshold to classify such a vector as the positive causes high false positive rates, and increasing it to capture the opposite case introduces high false negative rates. Figure 1. The positive and the negative GMMs for the toy example III. PAIRWISE THRESHOLD To overcome the problem of a fixed threshold, we propose novel notion of pairwise threshold that allows every index combination of positive and negative mixture components has one threshold; therefore N M thresholds are trained when a positive and a negative GMMs have N and M mixtures respectively. Our new GMM classification decision rule is: + if p(+ x) p( x) p(x +) p(x ) ɛ l + (x),l (x) x otherwise where l + (x) and l (x) are mixture component index prediction functions of a positive and a negative GMMs, and ɛ l + (x),l (x) is a pairwise threshold. Pairwise threshold is (4)

trained by following discriminative model: min x D ( p(+ x) p( x) ) if Υ(l a, l b ) δ ɛ la,l b = max x D ( p(+ x) p( x) ) otherwise Υ(l a, l b ) = p(+ l a, l b ) p( l a, l b ) p(l a, l b +) p(l a, l b ) (5) 1{(l + (x) = l a ) (l (x) = l b )} p(l a, l b +) = x D 1(x +) 1{(l + (x) = l b ) (l (x) = l b )} p(l a, l b ) = x D 1(x ) where D is a train dataset, and l a and l b denote predicted mixture component indices by the positive and the negative GMMs, Υ(l a, l b ) is a discriminative strength measurement, and δ is a control parameter for pairwise threshold. When Υ(l a, l b ) is high, the data predicted as l a and l b are more likely to be positive so generous threshold is assigned for this index combination. With this pairwise threshold train model, we overcome an issue of the strong dependency on model prior in GMM (general model) by examining train data (discriminative model). IV. OBJECT PIXEL MIXTURE CLASSIFIERS (OPMCS) In common video surveillance environments, human can be occluded by another human. There are two types of OPMCs; type-1 classifies an human against background (no occlusion) and type-2 classifies against another human (with occlusion). We assume that track segments and BS data are available to collect train data. Track segments are not necessary to be complete but several frames of non-occlusion human tracks are required. Also, BS data can be cluttered by noise. A. Type-1 OPMC We sample several frames along the trajectory such that diversities of the location and the size of bounding boxes can be captured. We choose foreground pixels inside of the detection bounding box as positive data and background pixels within certain range of outside of the box as negative data. We train positive and negative GMMs with collected data using Expectation Maximization algorithm [6]. Once EM process is done, noise in the data could be removed by applying lower control parameter, δ in Eq.5. Refined OPMC can be acquired by retraining it with de-noised data. The procedure of the type-1 OPMC train and the noise removal algorithm is provided in Figure 2 and Algorithm 1. B. Type-2 OPMC We train type-2 OPMCs for a human object only when the track gets occluded. It uses two sets of positive data from type-1 OPMCs; positive data used to train type-1 OPMC of Figure 2. OPMC train procedure. Algorithm 1 Noise Removal Input: Noisy D Output: De-noised D Compute Υ(i, j) from D (i = 1..N, j = 1..M) for x D do l a = l + (x), l b = l (x) if Υ(l a, l b ) < δ (δ < δ) then Remove x from D end if end for the human and negative data from type-1 OPMC of another occluding human. The train procedure is the same as that used for type-1 OPMC. Figure 3 shows two types of noise in positive data caused by cluttered BS and inter-occlusion between objects during collecting train data. We exhibit plausible prediction is possible by pairwise threshold and noise removal. Figure 3. The noise in train data. Shaded regions in the red box is the positive data and unshaded regions outside red box and inside of white box is the negative data.

Type-1 OPMC Type-2 OPMC N 17 15 M 25 17 δ 1.5 60 δ 70 70 Table I PARAMETER SETTINGS. V. EXPERIMENTAL RESULTS We experimented type-1 OPMC to generate blobs of an object of interest, and type-1 and 2 OPMCs to separate merged blobs. Again, please note that this is post-processing method rather than on-line learning since OPMCs are not updated after training. Feature extraction is highly important to GMM based classifications. We use color intensities as feature in our work. The following channels are used together with RGB channels; H channel of HSV color space, a and b channels of Lab color space, and u and v channels of Luv color space. We set the number of frames used for collecting train data as 5 frames at least, but it is limited to 5 percent of entire length of track fragments at most. After extracting color intensity features, the feature vector dimension is reduced by applying linear PCA until 98 percent of variances are captured which allows only 3 to 5 dimensions of features are remained. The parameter setting is shown in Table I based on our experiment. δ for type-2 OPMC is exceedingly higher than that for type-1 to retain discriminative positive data against negative objects. We observed that the maximum pairwise threshold is about 10 times higher than the lowest. A fixed threshold cannot handle this much variation. We tested the proposed OPMCs in three tasks; object blob generation, merged blob separation, and human tracking enhancement. A. Object Blob Generation Given an input image, type-1 OPMC classifies image pixels into an object class and the background class. Figure 4 illustrates improvement in blob generation by proposed type- 1 OPMC. Large part of the blob from BS is missed in the top row of Figure 4 since the object is not in motion, and the object blob in the bottom row of the Figure 4 is severely cluttered by disturbing blobs. Type-1 OPMC can generate improved object regions as shown in Figure 4. B. Merged Blob Separation Figure 5 shows separated regions of human who is pointed by arrow. The region of the human object is separated by: 1) applying type-1 OPMC, 2) applying type-2 OPMCs that classify the human apart from the others in the blob, 3) extract the region strictly belongs to the object by AND operating the results. The shape of extracted regions may or may not cover whole visible parts of an object. But this Figure 4. Generated object blob by type-1 OPMC compared to a fixed threshold and BS blob. From left to right column, input image, BS blob, prediction by a fixed threshold, and prediction by type-1 OPMC. Figure 5. Classified merged blob region including two humans. does not forbid our objective to use OPMC since portion of visible region of an object is enough to estimate association between fragmented tracks or the approximation of object location. In Figure 5, the first column is input image, the second is merged blobs, the third is prediction results of type-1 OPMCs of pointed humans, the fourth is prediction results of type-2 OPMCs of them, and the last is AND operation results of two classifiers. C. Human Tracking Enhancement To evaluate the tracking enhancement, we applied OPMCs upon the tracking results from the previous method [10] that have fragmented tracks. We used the publicly available dataset (Mind s Eye year one data 1 ) containing many challenging data such as pose variances, inter-, intra-occlusion, and undiscriminating object appearance against background. We tested on a subset (60 videos) of them. The overall procedure is presented in Figure 6 and Table II shows the evaluation results. Each evaluation criteria in the table are: 1 http://www.visint.org

threshold so that different thresholds can be applied for classification depending on combination of predicted mixture component index by a positive and a negative GMMs. We model OPMCs which exploited pairwise threshold GMM and apply it to the human tracking enhancement. ACKNOWLEDGEMENT The first author is granted a full scholarship by Republic of Korea Army. REFERENCES [1] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. V. Gool, Robust tracking-by-detection using a detector confidence particle filter, in IEEE 12th International Conference on Computer Vision (ICCV), Sept 2009, pp. 1515 22. Figure 6. Human Tracking enhancement procedure. Chang General Our method et al.[10] GMM Recall 64.1% 68.4% 66.7% Precision 88.4% 82.5% 79.8% GT 77 77 77 MT 37.7% 41.6% 41.6% PT 42.9% 42.9% 40.2% ML 19.4% 15.5% 18.2% IDS 12 12 15 Table II PERFORMANCE EVALUATION ON MIND S EYE YEAR ONE DATASET. Recall: correctly matched detections / total detections ground truth. Precision: correctly matched detections / total detections in tracking results. GT: The number of groundtruth tracks in the dataset. MT: Mostly Tracked, percentage of GT tracks that covered more than 80% by tracking results. ML: Mostly Lost, percentage of GT tracks that covered more than 20% by tracking results. PT: Partially Tracked (1-MT-ML). IDS: ID switches, the number of times that a tracked trajectory changes its matched ID. Our method improves overall detection recall, especially, fragmented tracks are linked or extended so ML and PT are become to MT. However, false alarms near the correct tracks are also continuously tracked which causes the decrease of precision rate. Compared to the result of general GMM, our method achieved higher precision and recall rates owing to pairwise threshold. VI. CONCLUSIONS We brought out an issue related to a fixed threshold of general GMM classifications and introduced novel concept of pairwise threshold. Unlike to adaptive (dynamic) thresholds suggested by other approaches, we used pairwise [2] Y. Li, C. Huang, and R. Nevatia, Learning to associate: Hybridboosted multi-target tracker for crowded scene, in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), June 2009, pp. 2953 60. [3] B. Yang, C. Huang, and R. Nevatia, Learning affinities and dependencies for multi-target tracking using a crf model, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2011, pp. 1233 40. [4] H. Permuter, J. Francos, and I. Jermyn, A study of gaussian mixture models of color and texture features for image classification and segmentation, Pattern Recognition (PR), vol. 39, no. 4, pp. 695 706, 04 2006. [5] H. Zeng and Y.-M. Cheung, A new feature selection method for gaussian mixture clustering, Pattern Recognition, vol. 42, no. 2, pp. 243 50, 02 2009. [6] A. J. Bilmes, A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, Tech. Rep., 1997. [7] F. Pernkopf and D. Bouchaffra, Genetic-based em algorithm for learning gaussian mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1344 8, 2005. [8] C. Stauffer and W. E. L. Grimson, Adaptive background mixture models for real-time tracking, in IEEE Conference on Computer Vision and Pattern Recognition (ICPR), vol. 2, June 1999, pp. 246 52. [9] S.-X. Shi, Q.-L. Zheng, and H. Huang, A fast algorithm for real-time video tracking, in Workshop on Intelligent Information Technology Application (IITA). Institute of Electrical and Electronics Engineers Inc, 12 2007, pp. 120 124. [10] C. Huang, B. Wu, and R. Nevatia, Robust object tracking by hierarchical association of detection responses, in The European Conference on Computer Vision (ECCV), 2008, pp. 788 801.