Least Soft-thresold Squares Tracking

Size: px

Start display at page:

Download "Least Soft-thresold Squares Tracking"

Candace Bridges
6 years ago
Views:

1 Least Soft-thresold Squares Tracking Dong Wang Dalian University of Technology Huchuan Lu Dalian University of Technology Ming-Hsuan Yang University of California at Merced Abstract In this paper, we propose a generative tracking method based on a novel robust linear regression algorithm. In contrast to eisting methods, the proposed Least Soft-thresold Squares LSS) algorithm models the error term with the Gaussian-Laplacian distribution, which can be solved efficiently. Based on maimum joint likelihood of parameters, we derive a LSS distance to measure the difference between an observation sample and the dictionary. Compared with the distance derived from ordinary least squares methods, the proposed metric is more effective in dealing with outliers. In addition, we present an update scheme to capture the appearance change of the tracked target and ensure that the model is properly updated. Eperimental results on several challenging image sequences demonstrate that the proposed tracker achieves more favorable performance than the state-of-the-art methods.. Introduction Visual tracking plays a critical role in computer vision that finds many practical applications e.g., motion analysis, video surveillance, vehicle navigation and human-computer interaction). Although significant progress has been made in the past decades, developing a robust tracking algorithm is still a challenging problem due to numerous factors such as partial occlusion, illumination variation, pose change, comple motion, and background clutter. Tracking algorithms can be classified as either generative e.g., [7,, 20, 9, 2, 5]) or discriminative e.g., [2, 0, 4,, 27, 8, 30, 4]) methods. Generative methods focus on searching for the regions which are the most similar to the tracked targets, while discriminative methods cast tracking as a classification problem that distinguishes the tracked targets from the surrounding backgrounds. In this work, we propose a robust generative tracker which is able to handle partial occlusion and other challenging factors effectively. Among the generative methods, the trackers based on linear representation maintain holistic appearance information and therefore provide a compact notion of the thing being tracked, which may facilitate some advanced vision tasks [7, 20]. These methods often adopt a dictionary e.g., a set of basis vectors from a subspace or a series of templates) to describe the tracked target. A given candidate sample is linearly represented by the dictionary, and the representation coefficient and reconstruction error are computed, from which the corresponding likelihood belonging to the object class) is determined. Ross et al. [20] propose an incremental visual tracking IVT) method which represents the tracked target by a low dimensional PCA subspace a set of PCA basis vectors) and assumes that the error is Gaussian distributed with small variances i.e., small dense noise). Therefore, the representation coefficient can be obtained by a simple projection operator, which is equivalent to the ordinary least squares solution under the assumption that the dictionary atoms are orthogonal. The reconstruction error is computed by the objective function of the ordinary least squares methods. While the IVT method is effective to handle appearance change caused by illumination variation and pose variation, it is not robust to some challenging factors e.g., partial occlusion and background clutter) due to the following two reasons. First, ordinary least squares methods have been shown to be sensitive to outliers due to the formulation based on reconstruction error with Gaussian noise assumption. Second, the IVT method uses new observations to update the observation model without detecting outliers and processing them accordingly. Other recent tracking algorithms [6, 2] based on the Gaussian noise assumption or the ordinary least squares methods have similar problems as the IVT method. Motivated by the success of sparse representation-based face recognition [28], Mei et al. [9] develop a novel l tracker that uses a series of target templates and trivial templates to model the tracked target, where the target templates are used to describe the object class to be tracked and trivial templates are used to deal with outliers e.g., partial occlusion) with the sparsity constraints. For tracking, a candidate sample can be sparsely represented by both target and trivial templates, and its corresponding likelihood is determined by the reconstruction error with respect to target templates. We note that this formulation is a linear regres-

2 sion problem with sparsity constraints on the representation coefficients. Recently, several methods have been proposed to improve the l tracker in terms of both speed and accuracy by using accelerated proimal gradient algorithm [5], replacing raw piel templates with orthogonal basis vectors [24, 26, 25], modeling the similarity between different candidates [3], to name a few. Although these algorithms consider outliers by using additional trivial templates, this formulation can be generalized with better understanding. In this work, we show that the linear regression with the Gaussian-Laplacian noise assumption is more effective in dealing with outliers for object tracking. In addition, from the viewpoint of linear regression, it is not suitable to estimate the likelihood based on the reconstruction error with respect to target templates. We present a novel distance function to compute the distance between a candidate and the object class. In this paper, we present a generative tracking algorithm based on linear regression. The contributions of this work are as follows. First, we introduce a novel linear regression method, Least Soft-thresold Squares LSS), which assumes that the error vectors follow the i.i.d Gaussian- Laplacian distribution. Second, we present an efficient iteration method to solve the LSS problem and propose a LSS distance to measure the dissimilarity between the observation vector and dictionary. We note that the LSS method is related to the robust regression with the Huber loss function and is effective in detecting outliers. Compared with the least squares distance, the LSS distance is more effective in measuring the distance between the observation vector and dictionary when outliers occur. Third, we design a generative tracker by using the LSS method, where the dictionary consists of PCA basis vectors. The likelihood of each candidate is computed based on the LSS distance. Furthermore, we update the tracker by using an effective update scheme. Numerous eperiments on challenging image sequences with comparisons to state-of-the-art tracking methods demonstrate the effectiveness of the proposed model and algorithm. 2. Least Soft-thresold Squares 2.. Regression with Gaussian-Laplacian Noise The objective of linear regression is to fit a linear model i.e, estimate model parameters) to a series of noisy observations: y = A + e, ) where y R d is a d-dimensional observation vector, R k denotes the k-dimensional parameter or coefficient) vector to be estimated and A = [r ; r 2 ;...; r d ] R d k represents the input data matri r i is the i-th row of A). We denote e = y A = [e ; e 2 ;...; e d ] i.e, e i = y i r i, i =,..., d) as the error or residual term. The coefficient can be obtained by maimizing the posteriori probability p y), which is also equivalent to maimizing the joint likelihood probability p, y). Assume there is a uniform prior, the coefficient is estimated by = arg ma py ) = arg ma p e), which is the maimum likelihood estimation MLE). The errors e, e 2,..., e d are usually assumed to be independently and identically distributed i.i.d) according to some probability density function PDF) f θ e i ), where θ is the parameter set that characterizes the probability distribution. Thus, the likelihood of the estimator the joint probability of the error term e) is p e) = d i= f θ e i ). To maimize the likelihood function is equivalent to minimizing the objective function L θ e, e 2,..., e d ) = d i= ρ θ e i ), where ρ θ e i ) = log f θ e i ). When the error e = y A follows the Gaussian distribution e i N 0, σ 2 N ) ), the MLE solution is equivalent to the ordinary least squares OLS) solution = arg min 2 y A 2 2, 2) and the closed form solution is = A A ) A y. Although the OLS method is easy to solve, it is sensitive to outliers due to the Gaussian noise assumption. If the error e follows the Laplacian distribution e i L 0, σ L ) 2 ), the MLE solution is equivalent to least absolute deviations LAD) solution, = arg min y A. 3) Compared with the OLS method, the LAD method is robust to outliers. However, it is difficult to be solved by using either the simple-based methods [6] or the iteratively reweighted least squares methods [22]. In this paper, we model error vector e as an additive combination of two independent components: an i.i.d Gaussian noise vector n n i N 0, σ 2 N ) ) and an i.i.d Laplacian noise vector s s i L 0, σ L )), y = A + n + s, 4) where the Gaussian component models small dense noise and the Laplacian one aims to handle outliers 3. e i is a zero-mean Gaussian random variable with variance σn 2 and its ) PDF is f N e i ) = ep e2 i 2πσN 2σ N 2. 2 e i is a zero-mean Laplacian random variable with variance σl 2 and its PDF is f L e i ) = 2σL ep 2 ei σ L ). 3 The similar idea of decomposing the noise into two components e.g., decomposing the noise into sparse and non-sparse ones [3, 8] for achieving robust motion estimation) have appeared in computer vision community. While the goals are similar, the formulations and derivations are different. In this work, the proposed algorithm focuses on not only handling occlusion but also deriving a novel distance metric to compare the target candidate and the target template under the outlier condition, which will presented later.

3 The additive combination of i.i.d Gaussian and i.i.d Laplacian noise variables is also called Gaussian-Laplacian distribution [23]. Its joint PDF is p e) = d i= f N L e i ), where the PDF f N L e i ) is given by the convolution, f N L e i ) = f N n i ) f L s i ) = f L s i ) f N e i s i ) ds i = 2 2σ N ep e2 i 2σ 2 N ) erfc σn σl ei 2σN ) +erfc σn σ L + ei 2σN ) 5) where erfc ) = ep 2) erfc ) and erfc ) = 2 π, e t2 dt. Compared with the Gaussian and Laplacian distributions, the PDF of the Gaussian-Laplacian distribution is comple. Thus, it is difficult to obtain a simple objective function such as Eqs. 2 and 3) directly. Because of this, we treat the Laplacian noise term s as missing values with the same Laplacian prior, and therefore the joint likelihood p y, ) is converted to p y,, s). p y,, s) = p y, s) p, s) = p y A s) p s) = K ep { σ 2 N 2 y A s λ s )}, ) d ) d where K = 2σ 2 2σL 2πσN and λ = N. Thus, to maimize the joint likelihood p y,, s) is equivalent to minimizing the function 2 y A s λ s with respect to both and s Least Soft-thresold Squares Regression To maimize the joint likelihood of Eq. 6, we consider the objective function: σ L 6) L, s) = 2 y A s λ s, 7) and the optimal solution is [, ŝ] = arg min L, s). The,s above objective function is based on the standard least squares criterion and an l regularization term on s, and thus is conve but not differentiable everywhere. To the best of our knowledge, there is no closed-form solution for this optimization problem, so we present an iterative algorithm. Proposition : Given ŝ, the optimal can be computed by the ordinary least squares solution = A A ) A y ŝ). Proposition 2: Given, the optimal ŝ can be obtained by a soft-thresholding or shrinkage) operation S λ y A ), where S τ ) = ma τ, 0)sgn), where sgn ) is the sign function. Let P denote A A ) A, which can be precomputed before the iterative process. By Propositions and 2, the optimization can be solved efficiently. The iterative operation is terminated when a stopping criterion is met e.g., the difference of objective values between two iterations or the number of iterations). As our algorithm consists of two main components: ordinary least squares and soft-thresholding operation, we denote it as the Least Soft-threshold Squares Regression method. Table. Least Soft-threshold Squares Regression Input: An observation vector y, matri A, percomputed matri P = A A ) A, and a small constant λ. : Initialize s 0 = 0 and i = 0 2: Iterate 3: Obtain i+ via i+ = P y s i ) 4: Obtain s i+ via s i+ = S λ y A i+ ) 5: i i + 6: Until convergence or termination Output:, ŝ It is clear that L i+, s i+ ) L i+, s i ) L i, s i ) 4. Hence, the proposed iterative algorithm converges to a local minimal value. As the objective function is conve, it also obtains the global minimal solution. Remark : The proposed least soft-threshold squares regression is equivalent to robust regression with the Huber loss function: where = arg min d i= f e i), e i = y i r i, 8) f e) = { e 2 / 2, e λ λ e λ 2/ 2, e > λ is the Huber loss function and r i denotes the i-th row of A. A detailed eplanation can be found in the supplementary material. We note the approach to compute robust regression with the Huber loss function is less efficient than the proposed method as it is generally solved by using the iteratively reweighted least squares scheme which requires solving a weighted least squares problem i.e., pseudo-inverse) at each iteration. As mentioned before, the pseudo-inverse matri can be percalculated before the iterative process in the proposed LSS method. Remark 2: The non-zero components in s can be used to identify outliers. 4 L i+, s i ) = min L, s i ) L i, s i ), and L i+, s i+ ) = min s L i+, s) L i+, s i ) 9)

4 y 5 0 y 5 0 s z 5 0 st Iteration 5th Iteration 0th Iteration 20th Iteration z a) Observation samples. b) Iterative process in the LSS method. c) The Laplacian noise term. Figure. Robust line fitting by using the least soft-threshold squares LSS) regression. z Figure shows an eample of fitting a straight line when small Gaussian noise and outliers occur simultaneously. It requires to estimate an accurate parameter for a straight line function y = az + b, where z is the input variable, y is the output variable, and a and b are parameters to be determined. Figure a) shows ten observation sample pairs {z i, y i }, i =,..., 0, where z i = i in this case z 5 and z 9 are two outliers). Denote that y = [y ; y 2 ;...; y 0 ], z = [z ; z 2 ;...; z 0 ], A = [z, ] and = [a; b], we use the proposed least soft-threshold squares LSS) method to estimate the parameter. Figure b) illustrates the iterative process of the LSS method λ is set to ), from which it is clear that the proposed LSS is not sensitive to outliers. We note that the result after the first iteration is the same as the result obtained using the OLS method, which also shows that the proposed LSS method is more robust than the OLS method. In addition, as shown in Figure c), the non-zero Laplacian noise components correspond to the outliers Least Soft-thresold Squares Distance We represent the matri A = [a, a 2,..., a k ], where a i is the i-th column of A. The vector, A, can be viewed as a linear combination of the columns of A A = a + 2 a k a k ). The matri A is known as dictionary or basis matri, and the vector a i is called an atom or basis vector. For some vision applications such as tracking), it requires not only to estimate the coefficient accurately but also to define a distance between a noisy observation and the dictionary or the subspace. This generative perspective has commonly been eploited in subspace-based methods [7, 20]. The distance is usually defined to be inversely proportional to the maimum joint likelihood with respect to the coefficient, d y; A) log ma p y, ) 0) = log ma p y ) p ). Take the ordinary least squares method for eample i.e., uniform prior), we have log ma p y ) p ) log ma p y ) ) log ma ep 2 y A 2 2 min 2 y A 2 2. ) Recall that the OLS method assumes the observation vector with i.i.d Gaussian noise, the distance y and A can therefore be defined as, d OLS y; A) = min 2 y A 2 2 y A A ) A y. = ) Similarly, under the i.i.d Laplacian noise assumption, the distance between y and A can be defined as, d LAD y; A) = min y A, 3) which is difficult to be calculated as mentioned in Section 2.). In this work, we adopt the Gaussian-Laplacian distribution to model observation noise. Therefore, we define the distance between y and A under the i.i.d Gaussian- Laplacian noise assumption as, d LSS y; A) = min,s 2 y A s λ s. 4) As it is related to the Least Soft-thresold Squares LSS) regression, we denote it as the LSS distance. Figure 2 illustrates a toy eample of good and bad candidates for template matching with partial occlusion. For simplification, we merely use a single template shown in Figure 2a). Figure 2b) and c) show good and bad candidates shown in red and blue boes respectively). In Table 2, we report the OLS distance and the LSS distance between the template and different candidates. We denote the

Table 2. The OLS and LSS distances between the template and different candidates of Figure 2. d OLS d LSS d LSS λ = 0.05) λ = 0.) Good Candidate 27.78 5.83 0.88 Bad Candidate 25.5 7.2 2.

In this eample, d OLS y B ; t) is smaller than d OLS y G ; t), which means the bad candidate is picked if the OLS distance is used.

Thus, we note that the proposed LSS distance is better than the OLS distance for handling outliers e.g., partial occlusion).

belonging to the object class. The affine motion model is used in this work and the state transition is formulated by random walk, i.e., p t t ) = N t ; t, Σ), where Σ is a diagonal covariance matri that indicates the variances of affine parameters.

5 Table 2. The OLS and LSS distances between the template and different candidates of Figure 2. d OLS d LSS d LSS λ = 0.05) λ = 0.) Good Candidate Bad Candidate template as t, the good candidate as y G and the bad candidate as y B respectively. In this eample, d OLS y B ; t) is smaller than d OLS y G ; t), which means the bad candidate is picked if the OLS distance is used. On the other hand, the good candidate is selected d LSS y G ; t) < d LSS y B ; t)) when the proposed LSS distance is used. Thus, we note that the proposed LSS distance is better than the OLS distance for handling outliers e.g., partial occlusion). where p t t ) is the motion model that describes the state transition between consecutive frames, and p y t t ) is the observation model that estimates the likelihood of an observed image patch belonging to the object class. The affine motion model is used in this work and the state transition is formulated by random walk, i.e., p t t ) = N t ; t, Σ), where Σ is a diagonal covariance matri that indicates the variances of affine parameters. Observation model: In this paper, we assume that the tracked target object is generated by a PCA subspace spanned by U and centered at µ) with i.i.d Gaussian- Laplacian noise y = µ + Uz + n + s, 7) where y denotes an observation vector, U represents a matri of column basis vectors, z indicates the coefficients of basis vectors, n is the Gaussian noise component and s is the Laplacian noise component. Based on the discussion in Section 2, under the i.i.d Gaussian-Laplacian noise assumption, the distance between the vector y and the subspace U, µ) is the least softthreshold squares distance, d y; U, µ) = min z,s 2 y Uz s λ s, 8) a) Template b) Good Candidate c) Bad Candidate Figure 2. A toy eample of good and bad candidates for template matching. 3. Least Soft-thresold Squares Tracking In this paper, visual tracking is treated as a dynamic Bayesian inference task with a hidden Markov model. Given a set of observed image vectors y :t = {y, y 2,..., y t } up to the t-th frame, the aim is to estimate the target state variable t by using the maimum a posteriori estimation, t = arg ma p i ) t y :t, 5) i t where i t indicates the i-th sample of the state t. Based on the Bayes theorem, the posterior distribution p t y :t ) can be estimated recursively by, p t y :t ) p y t t ) p t t ) p t y :t ) t, 6) where y = y µ. Thus, for each observation y i corresponding to a predicted state i, we firstly solve the following optimization problem, ] [ẑi, ŝi = arg min y z i,s i 2 i Uz i s i 2 + λ s i 2, 9) where i denotes the i-th sample of the state without loss of generality, we drop the frame inde t). As the PCA basis vectors U is orthogonal, the per-computed matri P can be simply set to U. After the optimal ẑi and ŝi are obtained, the least soft-threshold squares distance can be calculated by d y i ; U, µ ) = 2 y i 2 Uẑi ŝi + λ ŝi. Then 2 the observation likelihood can be measured by p y i i) = ep γd y i ; U, µ )), 20) where γ is a constant controlling the shape of the Gaussian kernel. Model Update: We note that the non-zero components in the Laplacian noise term can be used to identify outliers. Thus, we present a simple yet effective update scheme. After obtaining the best candidate state of each frame, [ we etract ] its corresponding observation vector y o = y [ o ; yo; 2...; y ] o d and infer the Laplacian noise term so = s o ; s 2 o...; s d o. Then we reconstruct the observation vector

6 by replacing the outliers with its corresponding parts of the mean vector µ, { yr i y i = o, s i o = 0 µ i, s i o 0, 2) where y r = [ y r; y 2 r;...; y d r ] denotes the reconstructed vector and µ = [ µ r; µ 2 r;...; µ d r] is the mean vector. The reconstructed sample is cumulated and then used to update the tracker i.e., PCA basis vectors U and the mean vector µ) by using an incremental principal component analysis PCA) method [20]. 4. Eperiments The proposed tracker is implemented in MATLAB and runs at 3.5 frames per second on a PC with Intel i CPU 3.4 GHz) with 32 GB memory. The regularization constant λ is set to 0. in all eperiments. For each sequence, the location of the tracked target is manually labeled in the first frame. We resize each image observation to piels and use 6 eigenvectors for PCA representation. As a trade-off between effectiveness and speed, 600 particles are adopted and our tracker is incrementally updated every 5 frames. The MATLAB source codes, datasets and supplementary materials are available on our websites ice.dlut.edu.cn/lu/publications.html, faculty.ucmerced.edu/mhyang/pubs.html). In this work, we use fifteen challenging image sequences from prior work [20, 9, 4, 5, 29] and the CAVIAR data set CAVIAR/CAVIARDATA/). The challenging factors of these sequences include partial occlusion, illumination variation, pose change, background clutter and motion blur. We evaluate the proposed tracker against ten state-of-theart algorithms, including the FragT [], IVT [20], MIL [4], VTD [5], TLD [4], APGL [5], MTT [3], LSAT [7], SCM [32], ASLSA [3] and OSPT [26] trackers. For fair evaluation, we use the source codes provided by the authors and run them with adjusted parameters. 4.. Quantitative Evaluation We evaluate the above-mentioned algorithms using two criteria: the center location error and the overlap rate. Table 3 reports the average center location errors in piels, where a smaller average error means a more accurate result. In addition, we use the segmentation criterion in the PAS- CAL VOC challenge [9] to evaluate the overlap rate. Given the tracking result bounding bo) of each frame R T and the corresponding ground truth bounding bo R G, the overlap. Table 4 reports the average overlap rates, where larger average scores mean more accurate results. score is defined as score = arear T R G ) arear T R G ) 4.2. Qualitative Evaluation Severe Occlusion: We test several sequences Occlusion, Occlusion2, Caviar, Caviar2, Caviar3, DavidOutdoor) with heavy or long-time partial occlusion, scale change and rotation. Figure 3 a-c) demonstrate that the proposed method performs well in terms of position, rotation and scale when the target undergoes severe occlusion. This can be attributes to two reasons: ) the proposed LSS distance takes outliers e.g., occlusion) into account eplicitly; and 2) the update scheme is able to avoid degrading the observation model by removing the outliers from new observed samples. In addition, the SCM and ASLAS methods also achieve good performance in most cases as both of them include part-based representations with overlapping patches. The IVT method is sensitive to partial occlusion Occlusion2, Caviar, Caviar3, DavidOutdoor) since the OLS distance is not effective to handle outliers. The MIL and TLD methods do not perform well when the target object is occluded by a similar object Caviar, Caviar2, Caviar3). This can be eplained by that the rectangle features they adopted generalized Haar-liker features or binary patterns) are less effective when similar objects occlude each other. Illumination Change: Figure 3 d) shows the tracking results in the sequences DavidIndoor, Car4, Singer) with significant illumination variation and both scale and pose change. We can see that the MIL and TLD methods are less effective in these cases e.g., DavidIndoor #0350 and Singer #0090). Due to the use of incremental PCA algorithm, the proposed tracker achieves good performance in dealing with the appearance change caused by light change. For the same reason, the IVT and ASLSA methods also perform well. Background Clutter: Figure 3 e) demonstrates the tracking results in the Car, Deer and Football sequences with background clutter. These videos also pose other challenging factors including illumination variation Car), fast motion Deer) and partial occlusion Football). As the proposed LSS distance encourages good matching results when outliers occur, our tracker performs better than other methods in these videos e.g., Deer #0052 and Football #035). Fast Motion: Figure 3 f) illustrates the tracking results on the Jumping, Owl and Face sequences. It is difficult to predict the locations of the tracked objects when they undergo abrupt motion. Furthermore, the appearance change caused by motion blur poses great challenges for capturing the tracked targets accurately and updating the observation models properly. We can see that the TLD and proposed methods perform better than other algorithms e.g., Owl #0248 and Face #0254). We note that the TLD method is equipped with a re-initialization mechanism which facilitates object tracking.

7 Table 3. Average center location error in piels). The best three results are shown in red, blue, and green fonts. Sequence FragT IVT MIL VTD TLD APGL MTT LSAT SCM ASLAS OSPT LSSTOurs) Occlusion Occlusion Caviar Caviar Caviar DavidOutdoor DavidIndoor Singer Car Car Deer Football Jumping Owl Face Average Table 4. Average overlap rate. The best three results are shown in red, blue, and green fonts. Sequence FragT IVT MIL VTD TLD APGL MTT LSAT SCM ASLAS OSPT LSSTOurs) Occlusion Occlusion Caviar Caviar Caviar DavidOutdoor DavidIndoor Singer Car Car Deer Football Jumping Owl Face Average Conclusion In this paper, we propose a Least Soft-thresold Squares LSS) regression method that assumes the noise is Gaussian-Laplacian distributed, and apply it to object tracking. We present an efficient iteration algorithm to solve the LSS problem, which achieves the global minimal solution. We derive a LSS distance to measure the difference between an observation sample and the dictionary. The LSS distance is effective in handling outliers and therefore provides an accurate match, which facilitates object tracking e.g., in dealing with partial occlusion). In addition, we develop a robust generative tracker based on the proposed LSS method and a simple update scheme. Both quantitative and qualitative evaluations on challenging image sequences show that the proposed tracker performs favorably against several state-of-the-art algorithms. In the future, we will etend the LSS method to solve other vision problems e.g., face recognition). Acknowledgements: D. Wang and H. Lu are supported by the National Natural Science Foundation of China # and # M.-H. Yang is supported by the US National Science Foundation CAREER Grant #49783 and IIS Grant # References [] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In CVPR, pages , [2] S. Avidan. Ensemble tracking. TPAMI, 292):26, [3] A. Ayvaci, M. Raptis, and S. Soatto. Occlusion detection and motion estimation with conve optimization. In NIPS, pages 00 08, 200. [4] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with online multiple instance learning. In CVPR, pages , [5] C. Bao, Y. Wu, H. Ling, and H. Ji. Real time robust l tracker using accelerated proimal gradient approach. In CVPR, pages , 202. [6] I. Barrodale and F. D. K. Roberts. An improved algorithm for discrete l linear approimation. SIAM Journal on Numerical Analysis, 05): , 973. [7] M. J. Black. EigenTracking : Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. IJCV, 26):63 84, 998. [8] Z. Chen, J. Wang, and Y. Wu. Decomposing and regularizing sparse/non-sparse components for motion field estimation. In CVPR, pages , 202. [9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes voc) challenge. IJCV, 882): , 200. [0] H. Grabner and H. Bischof. On-line boosting and vision. In CVPR, pages , [] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured output tracking with kernels. In ICCV, pages , 20. [2] W. Hu, X. Li, X. Zhang, X. Shi, S. J. Maybank, and Z. Zhang. Incremental tensor subspace learning and its applications to foreground segmentation and tracking. IJCV, 93): , 20. [3] X. Jia, H. Lu, and M.-H. Yang. Visual tracking via adaptive structural local sparse appearance model. In CVPR, pages , 202. [4] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. TPAMI, 347): , 202.

a) Tracking results on sequence Occlusion and

b) Tracking results on sequence Caviar and Caviar2

c) Tracking results on sequence Caviar3 and

Sample tracking results on fifteen challenging image

This figure demonstrates the results of the IVT [20],

In CVPR, pages 269 276, 200. [24] D. Wang and H. Lu.

SPL, 9):7 74, 202. [6] X. Li, W. Hu, Z. Zhang, X.

Visual tracking via incremental log-euclidean

incremental orthogonal projective non-negative matri

and k-selection. In CVPR, pages 33 320, 20. [26] D.

Online object tracking with sparse prototypes.

8 a) Tracking results on sequence Occlusion and Occlusion2 with heavy occlusion and in-plane rotation. b) Tracking results on sequence Caviar and Caviar2 with partial occlusion and scale change. c) Tracking results on sequence Caviar3 and DavidOutdoor with severe occlusion, scale change and pose variation. d) Tracking results on sequence DavidIndoor, Car4 and Singer with illumination variation. e) Tracking results on sequence Car, Deer and Football with background clutter. f) Tracking results on sequence Jumping, Owl and Face with abrupt motion. IVT MIL TLD APGL SCM ASLAS Our Figure 3. Sample tracking results on fifteen challenging image sequences. This figure demonstrates the results of the IVT [20], MIL [4], TLD [4], APGL [5], SCM [32], ASLAS [3] and the proposed methods. More results can be found in the supplementary material. [5] J. Kwon and K. M. Lee. Visual tracking decomposition. In CVPR, pages , 200. [24] D. Wang and H. Lu. Object tracking via 2DPCA and ` -regularization. SPL, 9):7 74, 202. [6] X. Li, W. Hu, Z. Zhang, X. Zhang, M. Zhu, and J. Cheng. Visual tracking via incremental log-euclidean Riemannian subspace learning. In CVPR, pages 8, [25] D. Wang and H. Lu. On-line learning parts-based representation via incremental orthogonal projective non-negative matri factorization. SP, 93: , 203. [7] B. Liu, J. Huang, L. Yang, and C. A. Kulikowski. Robust tracking using local sparse appearance model and k-selection. In CVPR, pages , 20. [26] D. Wang, H. Lu, and M.-H. Yang. Online object tracking with sparse prototypes. TIP, 22):34 325, 203. [8] H. Lu, S. Lu, D. Wang, S. Wang, and H. Leung. Piel-wise spatial pyramidbased hybrid tracking. TCSVT, 229): , 202. [27] S. Wang, H. Lu, F. Yang, and M.-H. Yang. Superpiel tracking. In ICCV, pages , 20. [9] X. Mei and H. Ling. Robust visual tracking using ` minimization. In ICCV, pages , [28] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. TPAMI, 32):20 227, [20] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental learning for robust visual tracking. IJCV, 77-3):25 4, [29] Y. Wu, H. Ling, J. Yu, F. Li, X. Mei, and E. Cheng. Blurred target tracking by blur-driven tracker. In ICCV, pages 00 07, 20. [2] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof. PROST: Parallel robust online simple tracking. In CVPR, pages , 200. [30] K. Zhang, L. Zhang, and M.-H. Yang. Real-time compressive tracking. In ECCV, pages , 202. [22] E. J. Schlossmacher. An iterative technique for absolute deviations curve fitting. JOSA, 68344): , 973. [3] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via multitask sparse learning. In CVPR, pages , 202. [23] I. W. Selesnick. The estimation of laplace random vectors in additive white gaussian noise. TSP, 568-): , [32] W. Zhong, H. Lu, and M.-H. Yang. Robust object tracking via sparsity-based collaborative model. In CVPR, pages , 202.

Structural Sparse Tracking

Structural Sparse Tracking Tianzhu Zhang 1,2 Si Liu 3 Changsheng Xu 2,4 Shuicheng Yan 5 Bernard Ghanem 1,6 Narendra Ahuja 1,7 Ming-Hsuan Yang 8 1 Advanced Digital Sciences Center 2 Institute of Automation,