VISUAL tracking plays an important role in signal processing

Size: px
Start display at page:

Download "VISUAL tracking plays an important role in signal processing"

Transcription

1 IEEE TRANSACTIONS ON CYBERNETICS 1 Correlation Filter Learning Toward Peak Strength for Visual Tracking Yao Sui, Guanghui Wang, and Li Zhang Abstract This paper presents a novel visual tracking approach to correlation filter learning toward peak strength of correlation response. Previous methods leverage all features of the target and the immediate background to learn a correlation filter. Some features, however, may be distractive to tracking, like those from occlusion and local deformation, resulting in unstable tracking performance. This paper aims at solving this issue and proposes a novel algorithm to learn the correlation filter. The proposed approach, by imposing an elastic net constraint on the filter, can adaptively eliminate those distractive features in the correlation filtering. A new peak strength metric is proposed to measure the discriminative capability of the learned correlation filter. It is demonstrated that the proposed approach effectively strengthens the peak of the correlation response, leading to more discriminative performance than previous methods. Extensive experiments on a challenging visual tracking benchmark demonstrate that the proposed tracker outperforms most state-of-the-art methods. Index Terms Correlation filtering, elastic net, kernel method, regression, visual tracking. I. INTRODUCTION VISUAL tracking plays an important role in signal processing and computer vision with various applications, such as video processing, motion analysis, and unmanned control systems. Visual tracking, in general, is classified into single object tracking and multiple objects tracking. They are associated with applications and different research methodologies. This paper aims at the single object tracking. Recent years have witnessed a rapid development in visual tracking [1], [2]. The performance of visual trackers is being significantly improved in terms of accuracy, robustness, and running speed. Some challenges, however, such as heavy occlusions, nonrigid deformations, illumination changes, scale Manuscript received September 5, 2016; revised January 15, 2017 and March 30, 2017; accepted April 1, This work was supported in part by the National Aeronautics and Space Administration LEARN II Program under Grant NNX15AN94N, in part by the New Faculty General Research Fund of the University of Kansas under grant , in part by the National Natural Science Foundation of China (NSFC under Grant and Grant , and in part by the Joint Fund of Civil Aviation Research by the NSFC and Civil Aviation Administration under Grant U This paper was recommended by Associate Editor W. Hu. (Corresponding author: Yao Sui. Y. Sui and G. Wang are with the Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS USA ( suiyao@gmail.com; ghwang@ku.edu. L. Zhang is with the Department of Electronic Engineering, Tsinghua University, Beijing , China ( chinazhangli@tsinghua.edu.cn. Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCYB variations, background clutters, and in-plane/out-of-plane rotations, are still hindering the practical applications of visual tracking. Recently, there is a significant interest in developing correlation filtering [3] based visual trackers. Under this paradigm, a correlation filter is learned from the temporally obtained targets and the neighboring background. The target is identified as the region that has the strongest response against the learned filter when a correlation is imposed within a search area around the possible target location. Note that the target localization is actually a brute-force search within a local region using a sliding window method. Thus, it is computationally expensive to tracking. Fortunately, following Parseval s Identity, the correlation can be implemented in frequency domain using Fourier transform. As a result, the computational complexity is reduced to O(n log n for the target of size n n pixels. This implementation extremely speeds up visual tracking, leading to a high-speed visual tracker [3]. However, the properties of tracking, which can facilitate tracking in challenging situations, is neglected in [3]. A circulant structure of tracking is exploited in [4] to improve the tracking performance in various challenging cases. The samples used to learn the correlation filter are represented by cyclic shifts of a base sample, leading to an equivalence to the dense sampling method. Furthermore, the correlation filtering is interpreted from the perspective of ridge regression, resulting in a typical discriminative tracking model that focuses on distinguishing the target from its surrounding background. The work [4] significantly improves the tracking performance in various challenging situations by taking account of the circulant structure of tracking, while achieving high running speed. However, it is instable in the presence of scale variations because the size of the learned filter is fixed during tracking. Although several studies [5] [7] design the strategies to estimate the target scale, the tracking speed is unexpectedly reduced. The correlation filtering approach localizes the target according to the position, where the maximum (peak of the filter response (correlation output appears over a search region. Motivated by the fact that the target localization focuses only on the relative response values of the candidate regions, rather than the absolute values, we aim at learning such a correlation filter that achieves a peak-strengthened (PS filter response. It indicates that, with the strengthened peak, the target is represented more discriminatively against the background by the correlation filter. In this paper, we construct a robust correlation filter, which has strong peak strength c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 2 IEEE TRANSACTIONS ON CYBERNETICS over the search region by introducing the intrinsic structure of visual tracking, while ensuring the tracker to run fast. Note that the correlation in all existing methods, such as [3] [12], is applied over all the pixels within a search region. However, some of these pixels may be distractive, such as those from an occlusion or a significant nonrigid deformation. These distractions will influence the filter response over the search region, leading to a weak peak strength, and further resulting in an inaccurate target localization. Thus, a correlation approach that can adaptively ignore the distractive regions or pixels is required to enhance the peak strength of the filter response. To this end, a sparsity constraint [13] on the filter to be learned is desired to reduce the response from these distractive pixels by zeroing the corresponding entries of the correlation results. From a regression point of view, 1 such a sparsity-based correlation filter corresponds to the LASSO [14] in the case that the training samples are obtained by the cyclic shift of a base sample. Although LASSO has good performance on feature (pixel selection, it is unable to group the features in the regression. Note that, in visual tracking, the distractive pixels always appear densely in several regions. Thus, a region-level selection strategy is preferred, rather than a pixellevel selection. As a result, an elastic net regularization [15], which integrates the squared and the sparsity constraints, is enforced on the correlation filter. In brief, from a filtering perspective, we design a robust correlation filter to augment the peak strength of the response, while from a perspective of tracking-by-detection, we construct a feature-adaptive regressor via an elastic net regularization [15] to reliably separate the target from the surrounding background. Moreover, we define a new metric to quantitatively demonstrate the strengthened peak of the filter response from a large number of empirical results, and reveal how the proposed correlation filter works. Furthermore, we leverage a multiresolution approach to incorporate scale variations of the correlation filter, which can accurately estimate the scale changes of the target appearance in each frame. Extensive experimental results demonstrate the effectiveness of the proposed correlation filter, which significantly improve the tracking performance in various challenging situations against most state-of-the-art trackers. The remainder of this paper is organized as follows. In Section II, the work related to ours is reviewed. In Section III, the proposed approach is elaborated in details. The experimental results are reported in Section IV. And finally, we conclude this paper in Section V. II. RELATED WORK A. General Tracking Algorithms Visual tracking models are in general classified into two categories: 1 generative and 2 discriminative. The generative tracking model focuses on searching a region that best matches a learned target model, while the discriminative tracking model regards tracking as a binary classification and aims to train a classifier that can separate the target from the background. 1 The equivalence between correlation filtering and ridge regression is addressed in [4] when applying the cyclic shift to the training samples. Generative tracking model has good generalization capability while requiring only a relatively small number of training samples. There is an extensive literature on the generative tracking model, like subspace learning [16] [19], sparse representation [20] [26], low-rank approximation [27] [30], tensor subspace [31], [32], log-euclidean Riemannian subspace [33], Gaussian process regression [34], histograms matching [35], [36], fragment strategy [37], graph model [38], and compact representation [39]. Subspace learning is a popular generative tracking model because of the inherent characteristic of the targets. Ross et al. [16] introduced the incremental principal component analysis (PCA to visual tracking. Kwon and Lee [17] employed a sparse PCA method to decompose visual tracking into different submodules. Sui et al. [18] constructed a structured subspace that can deal with heavy distractions during tracking. Sui et al. [19] proposed a tracking approach that leveraged matrix completion to maintain a latent subspace for visual tracking. Sparse representation is usually adopted with subspace learning to improve the robustness of the tracker. Mei and Ling [20] introduced sparse representation to visual tracking. Zhang et al. [21] used a joint sparse representation to speed up [20]. Wang et al. [24], [40] formulated the target by a subspace and modeled the occlusions by a sparse error. Zhang et al. [41] exploited the circulant structure among the sparse representation. Tensor subspace is a straightforward extension on the subspace learning approach. It treats the target in the image domain. Li et al. [31] and Hu et al. [32] proposed an incremental tensor subspace learning for visual tracking. Wang and Lu [42] used 2-D PCA method to construct the target subspace. Graph model is also a popular approach to formulate visual tracking. Li et al. [38] proposed a tracking algorithm via random walks on a graph. Sui et al. [28] constructed a low-rank graph to conducted visual tracking. Discriminative tracking model achieves impressive performance in recent years. Various methods are developed based on this tracking model, including support vector machine (SVM [43] [47], boosting [48] [50], compressive sensing [51], superpixel [52], correlation filtering [3], [4], [10], [12], structural learning [53], [54], multiple instance learning [55], segmentation [56], hashing [57], and deep learning [58] [62]. Avidan [43] used an SVM classifier to separate the target from its surrounding background. Hare et al. [63] proposed a structural output SVM for visual tracking. Babenko et al. [55] leveraged multiple instance learning to solve the sample label ambiguity problem in discriminative tracking model. Avidan [48] employed a boosting approach to combine several weak classifiers into a strong classifier to solve tracking problem. Grabner and Bischof [64] proposed an online learning algorithm by using a boosting classifier. Zhang et al. [51] represented the target and the background by compressive sensing and separate them by an online learned Bayesian classifier. Bolme et al. [3] introduced correlation filtering method to visual tracking. Henriques et al. [4] exploited the circulant structure among the correlation filtering. Fan et al. [56] designed a segmentationbased tracking algorithm to distinguish the target from

3 SUI et al.: CORRELATION FILTER LEARNING TOWARD PEAK STRENGTH FOR VISUAL TRACKING 3 its surrounding background. Recently, deep learning is extensively used in visual tracking and achieves impressive performance. Zhang et al. [65] proposed a tracking algorithm via convolutional networks with training. Wang et al. [59] employed fully convolutional networks to conduct visual tracking. Ma et al. [58] analyzed the deep features from different layers of the convolutional networks. B. Tracking Algorithms Based on Correlation Filtering Recently, there is a significant interest in correlation filtering-based tracking algorithm design. In terms of both tracking accuracy and running speed, the correlation filteringbased visual trackers achieve state-of-the-art results. Under this paradigm, a correlation filter is learned from the previously obtained targets and their surrounding background. The learning problem can be exactly transferred to frequency domain by Fourier transform and Parseval s Identity. Consequently, the filter learning is computationally efficient in the frequency domain. It is also determined that this paradigm is a discriminative model and the target is located by means of tracking-by-detection. The target is detected by a correlation operation in the frequency domain over a region containing the possible target location. This ensures the high running speed of this paradigm. Finally, the target is localized according to the location where the peak of the filter response appears. Bolme et al. [3] introduced this paradigm to visual tracking and achieved high speed visual tracker. Henriques et al. [4] exploited the circulant structure of the training samples to approximate the locally dense sampling. They also explained the correlation filtering in visual tracking from a regression perspective, i.e., the correlation filtering is equivalent to a ridge regression over the target and its surrounding background, and the target will be assigned the largest regression value. In their substantial work [10], they theoretically proved the connection between the correlation filtering and the ridge regression and extended their circulant structure to high-dimensional nonlinear feature space by kernel tricks. Sui et al. [12] raiseda problem that the ridge regression via squared loss function may lead to overfitting when training the correlation filter over the circulant structure. They proposed to leverage robust loss function to compensate the possible overfitting and explained their approach from both robust regression and anisotropic filter response perspectives. Note that the fact that, in the above-mentioned approaches, maintaining a correlation filter of a fixed size during tracking to ensure the high running speed also weakens the capability to the scale adaptivity, because it always detects the target using the fixed size identical to the size itself. It is thus unable to deal with the scale variations of the target. Some studies aimed at estimating the scale of the target within this paradigm by using a multiresolution approach [6] and a motion model [5], [7]. However, the scale estimations significantly increase the computational load. Thus, a balance between the scale estimation and the running speed should be carefully considered for a robust correlation filter. Motivated by previous success, our approach is conducted within the framework that exploits the circulant structure of Fig. 1. Illustration of the cyclic shift. (a Base image patch. (b Cyclic shift of the base image by ±15 pixels in both x and y directions. training samples with the kernel tricks [10]. It would be helpful for readers to clarify the difference between our and the similar methods before presenting our approach in detail. From the regression perspective, a regression formulation includes two parts: 1 a data fitting term guaranteed by a loss function and 2 a regulation term that generates some properties for the regressor. Sui et al. [12] improved the data fitting performance by employing different robust loss functions, leading to a robustness promoted correlation filter. Different from their method, our approach leverages an elastic net constraint on the regulation term by a PS design for the filter response, to avoid acquiring the distractive information, which easily causes tracking failure, during the correlation filter learning. In addition, we also explain our approach from a feature selection point of view. Note that there are extensive studies on feature selection-based visual tracking algorithms. Sui et al. [18] proposed a sparsity-induced subspace learning method to exclude the distractive features during tracking. Zhang et al. [66] leveraged a feature selection strategy to improve the multiple instance learning framework, leading to an effective discriminative tracking model. III. PROPOSED APPROACH Given a target location, we generate the training samples around the target and its immediately surrounding background by a pixel-wise sliding method. This can be efficiently implemented by a cyclic shift of a base image patch [4], as illustrated in Fig. 1. The base image in this paper is determined as a spatially expended region of the target. We stack the base image into a column vector, and denote it by x. Thus, the training samples, denoted by the matrix X, of which each row denotes a sample, composed by the full cyclic shifts of x. Note that the sample matrix X has a good property [67], that is X = Ddiag (ˆx D H (1 where D denotes the discrete Fourier transform (DFT matrix, the hat ˆ stands for the DFT and hereafter, and H denotes the transpose and complex-conjugate. It indicates that the sample matrix X can be efficiently represented in frequency domain. The goal of our approach is to construct an efficient and effective discriminative tracking model over the samples X to separate the target from its surrounding background. A. Problem Statement To distinguish the target from the background, a linear regressor can be employed, such that the target is localized in the region, where the largest regression value appears. The regressor is simply trained by min w y Xw 2 2 (2

4 4 IEEE TRANSACTIONS ON CYBERNETICS where w denotes the linear coefficients, and y contains the regression values of the samples X. This is the well-known least squares regression and has a closed-form solution. Note that, however, all features of the samples are used to train the linear regressor in (2. It indicates that some distractive features, such as the pixels from occlusions or nonrigid deformations, may significantly degrade the accuracy of the regressor. For this reason, the regressor is expected to adaptively ignore theses distractive features. To this end, a sparsity on w is desired and promising to promote the robustness of the regressor, because the zeros locating at the distractive features can eliminate their contributions to the regression values. Thus, the regressor is trained by min w y Xw τ w 1 (3 where 1 denotes the l 1 -norm that returns the sum of the absolute values of all elements, and τ>0 is a weight parameter. The regressor in (3 is known as the LASSO [14]. The features that facilitate the regression can be adaptively selected by the nonzeros of w, while the rest features are ignored due to the zeros of w. Note that LASSO fail to group the features for the selection. However, in tracking, the distractive features, like occlusions, always appear in several regions (i.e., groups of pixels. Thus, a grouping strategy is required to incorporate the sparsity for the regressor. As a result, the regressor is reformulated as min w y Xw λ w τ w 1 (4 where λ>0 denotes a weight parameter. In (4, the squared regularization w 2 2 is used to group the features. This is also known as the elastic net regression [15]. To further enhance the regression, a popular method, known as the kernel tricks, is employed to transform the samples into a high-dimensional and nonlinear space, such that the regressor has a good nonlinearly separable capability. Let ϕ( denotes a nonlinear function, and α denotes the dual conjugate of w, such that w = i α iϕ(x i. The regression problem in (4 is described in its dual space as min α y Kα λαt Kα + τ α 1 (5 where K denotes the kernel matrix, of which the element k ij = ϕ(x i T ϕ ( x j. The above regression is conducted in the high-dimensional and nonlinear space defined by ϕ( over the adaptively selected features. Shortly, we will interpret in Section III-E that this regression is actually equivalent to a correlation filtering with a strengthened peak of the filter response. According to [10], it is demonstrated that, similar to X, the matrix K from some kernels, such as Gaussian or polynomial, also has a circulant structure, and can be diagonalized as K = Ddiag (ˆk 1 D H (6 where k 1 denotes the first row of K. In fact, the kernel matrix K is obtained from the full cyclic shifts of its first row k 1. B. Correlation Filter Learning The problem in (5 involves joint minimization on both the squared form and the l 1 -norm with respect to α. For the convenience of computation, we relax it by introducing another variable β min y α,β Kα λαt Kα + τ β 1 + μ α β 2 2 (7 where μ>0 is a weight parameter. Note that (7 is convex with respect to α if fixing β, and vice versa. This allows us to develop an iterative algorithm, like block coordinate descent, to approximate the solution to (5. Thus, the two subproblems with respect to α and β, respectively, are presented as follows: min α y Kα λαt Kα + μ α β 2 2 (8 min β τ β 1 + μ α β 2 2. (9 Note that (8 only contains the squared forms of α. Thus, through least squares, it has a globally unique solution with the closed-form α = ( K H K + λk + μi 1( K H y + μβ (10 where I denotes the identity matrix. Note that (10 involves inverse matrix, whose computational complexity yields regularly O ( n 3 for a n n matrix. Such a complexity is unable to satisfy the speed requirement of tracking. Fortunately, by combining (6, the above problem can be transformed into Fourier domain and solved efficiently with a complexity of O(n log n. As a result, α can be obtained from the following proposition. Proposition 1: Guaranteed by the Parseval s Identity, α can be solved in the Fourier domain by ˆα = ˆk 1 ŷ + μ ˆβ ˆk 1 ˆk 1 + λˆk 1 + μ (11 where denotes Hadamard product, i.e., (a b i = a i b i,the division is performed element-wise, and the asterisk denotes the conjugate operation. The proof of the above proposition are presented in the Appendix. Equation (9 presents a standard l 1 -regularized least squares problem. Through the shrinkage threshold algorithm [68], it can be solved by the following proposition. Proposition 2: Equation (9 has a globally unique solution ( τ β = δ 2μ, α (12 where δ(, denotes the shrinkage operator, defined as δ(ε, x = sign(x max(0, x ε. (13 Therefore, (7 can be solved iteratively by alternately optimizing α and β using (11 and (12, respectively. It stops when the difference of the objective values between two consecutive iterations is very small, e.g., 10 8 in this paper. In each iteration, the fast Fourier transform dominates the complexity, yielding O(n log n. According to the empirical results in this paper, the iterative algorithm converges after around ten

5 SUI et al.: CORRELATION FILTER LEARNING TOWARD PEAK STRENGTH FOR VISUAL TRACKING 5 Algorithm 1: Correlation Filter Learning Toward Peak Strength Input: Training samples X, and regression objective y. Output: The peak-strengthened correlation filter α. 1 Calculate the kernel matrix K for k ij = ϕ(x i T ϕ ( x j. 2 Initialize ˆβ = ˆk 1 ŷ. 3 while not converged do 4 Compute α from Eq. (11. 5 Compute β from Eq. (12. 6 end Fig. 2. Visualization of the PS correlation filter learning. A frame in the case of occlusion is shown in the left, where the target are marked in red and the base image is marked in blue. The features selected to train the filter are shown in the right, highlighted by the green color. iterations. The formal description of the iterative algorithm is depicted in Algorithm 1. Note that, due to the circulant structure of K, (11 in fact learns a correlation filter in the Fourier domain. To exploit the temporal information, and avoid that the correlation filter changes abruptly in successive frames, the base image x t and the correlation filter α t in the tth frame are, respectively, updated in an incremental manner ˆx t = (1 πˆx t 1 + π ˆx (14 ˆα t = (1 π ˆα t 1 + π ˆα (15 where ˆx is the DFT of the spatially expended region of the (t 1th target, ˆα is obtained from (11, and π (0, 1 controls the update rate. C. Target Localization In the tth frame, a large number of target candidates are pixel-wise sampled within the search area defined by a base image denoted by x. The base image is sampled by a spatial expansion of the region, where the (t 1th target locates. With the circulant structure, these target candidates are obtained from the full cyclic shifts of x. As a result, the regression values of these candidates are computed from f ( x = F 1(ˆk ˆα t (16 where ˆk = ϕ T( x ϕ(x t denotes the kernel correlation between the temporally obtained target and the candidate regions, and F 1 ( denotes the inverse fast Fourier transform. The target is localized as the region, where the largest regression values (filter response locates. Note that (16 is actually a spatial correlation over the search area in the dual space defined by ϕ(x t. By transforming it into the Fourier domain, the correlation is implemented by the Hadamard product, which significantly improve the computational efficiency. D. Scale Estimation During tracking, the size of the filter α is fixed to maintain the fast speed. Thus, to incorporate the scale estimation, we employ a scale pool S = {s 1, s 2,...,s m } containing m scales. We sample m groups of target candidates, i.e., m base images x 1:m, and in the ith group, the target candidates yield the scale s i. Then, we compute the regression values of the m groups of target candidates by using (16. As a result, the scale of the target is estimated by s = arg max f ( s x (17 s S where f s ( denotes the regression values with respect to the scale s. Correspondingly, the criterion of the target localization is revised as the region, where the largest regression value locates with respect to the estimated scale. E. Discussion The goal of the proposed approach is to learn a PS correlation filter that can augment the discriminative capability to separate the target from its surrounding background. An elastic net constraint is imposed on the correlation filter to achieve this goal. To make the proposed approach more clear, we discuss the learning method from the perspectives of feature selection and correlation filtering, respectively. Note that we use the raw pixels as the features in this example for the clarity and simplicity. In the actual implementation of the proposed tracker, some other more effective features are adopted, such as histogram of orientation gradient (HOG feature. 1 Feature Selection: As presented in (4 and (5, an elastic net constraint is enforced on the correlation filter learning. The sparsity (i.e., through the l 1 -norm can adaptively ignore the distractive features (pixels, such as occlusions and cluttered background, by zeroing the corresponding entries of the correlation filter w or α. Meanwhile, these distractive features usually appear within a region (i.e., groups of pixels. To reflect the group property, a quadric constraint is then applied through the l 2 -norm. We visualize the correlation filter learning in the case of occlusions, as shown in Fig. 2. A representative frame is shown in the left, where the target (i.e., the man s face is marked in the red box and the base image is marked in the blue box. In the right, the features selected to learn the correlation filter are highlighted in green color. It is evident that the pixels from the face are extensively selected, while the pixels from the occlusion (i.e., the book are rarely considered. Moreover, the pixels from the background with large difference from the target are also adaptively selected. Such a selection promotes the robustness of the correlation filtering and augments the discriminative capability of the correlation response. It is also found that the features at top right of the target region (from the hair are extensively excluded. Note that

6 6 IEEE TRANSACTIONS ON CYBERNETICS the correlation filter has symmetric property over the base image region. Because the book (distractive object needs to be excluded, its symmetric region is from the hair. As a result, the hair region is excluded as well. In addition, from another point of view, the hair within the target region is less differentiable from its immediately surrounding background. For this reason, the hair region from either target or background is extensively excluded. All the selection is adaptively made by the proposed approach. Through this example, it is evident that the proposed feature selection is effective to learn the correlation filter by excluding the distractive features. 2 Correlation Filtering: The correlation filtering method localizes the target in terms of the position of the peak of the correlation response. Note that the target localization only aims at the relative response values of the candidate regions, rather than the absolute response values. Intuitively, it indicates that a good correlation filter produces a much strong response at the target location but the weak response at other regions, even the regions very close to the target. From a trackingby-detection point of view, the larger the difference of the response values between the target location and other regions, the more discriminative the correlation filter. We investigate quantitatively to what extent the proposed approach improves the discriminative capability. Clearly, we are interested in how strong the response of the target is over several other competitive candidate regions. We also consider how accurately the response peak appears at the center of the target location. To this end, we define a new metric, the peak strength s = 1 n ( 2 p rj n j=1 1 2 [ xp y p ] [ xgt y gt ] 2 (18 to evaluate the discriminative performance of the learned correlation filter, where p denotes the peak value of the response, r j denotes the jth response value, n denotes the number of the neighboring response values around the peak, and [x p, y p ] T and [x gt, y gt ] T denote the positions of the response peak (correlation output and the ground truth peak (center of the target location, respectively. The peak strength is expected to be high for a correlation filter with good discriminative performance. Note that, to balance between their scales, the two terms in (18 are normalized to [0, 1] over a video sequence, respectively. To demonstrate that the proposed approach improves the discriminative performance of the correlation filtering (i.e., PS, we construct another correlation filter learned only with a quadric constraint. We evaluate the peak strength of the two filters on the OTB 2013 benchmark [69], a popular tracking benchmark containing 50 challenging video sequences. We use the eight immediate neighbors of the peak values to calculate the peak strength, i.e., we set n = 8 in(18. Fig. 3(a shows the average peak strength on each of the 50 video sequences, from which it is evident that, at most video sequences, the proposed approach (PS has higher peak strength than its competing counterpart (NPS. Furthermore, the distributions of the peak strength obtained by the two Fig. 3. Investigation on the peak strength over the OTB 2013 benchmark. (a Average peak strength on each video sequence. (b Distribution of the peak strength on all frames. approaches on all frames are shown in Fig. 3(b. It is evident that the proposed approach has significantly higher peak strength over the tracking benchmark. It demonstrates that the elastic net constraint leads to a PS correlation filtering, as well as improved discriminative performance. For more thorough evaluations, please refer to the results shown in Fig. 13. IV. EXPERIMENTS A. Implementation Details The proposed tracking algorithm is implemented in MATLAB on a PC with an Intel Xeon CPU of W3520 at 2.67 Hz. The MATLAB scripts are programmed without any code optimization. The average running speed of the proposed tracker is 13.4 frames/s. The training samples X are the fully cyclic shifts of the base image patch centered at the current target location with a spatially expanded region 1.5 times of the current target. A cosine window is applied to the base image to alleviate the discontinuity caused by the cyclic shifts. HOG feature is extracted from the training samples, and Gaussian kernel is employed to embed the features into a highdimensional nonlinear space. The target candidates, sampled from the image patch centered at the latest obtained target, are generated by the same procedure as the training samples. As recommended in [10], the parameter λ in (5 issetto10 4. Following the suggestion in [15], the parameter τ in (5 is set to 1 λ. We empirically set the parameter μ in (7 to 10 5, and build the scale pool by seven different scaling coefficients, i.e., S = {0.985, 0.99, 0.995, 1, 1.005, 1.01, 1.015}. The source codes will be available on the authors websites. B. Evaluation Setting The proposed tracker is evaluated on two popular visual tracking benchmarks, the OTB 2013 benchmark [69] and the OTB 2015 benchmark [70], which respectively contain 50 and 100 video sequences with various challenging situations, such as illumination change, occlusion, nonrigid deformation, in-plane/out-of-plane rotation, and scale variation. In each frame of these video sequences of the two benchmarks, the target is manually labeled by a rectangle bounding box that is used as the ground truth in the quantitative evaluations. Eighteen other state-of-the-art trackers are referred to as the competing methods in the evaluations, including the top five trackers on the OTB 2013 benchmark according to Wu et al. s evaluation [69] (Struck [63], SCM [71],

7 SUI et al.: CORRELATION FILTER LEARNING TOWARD PEAK STRENGTH FOR VISUAL TRACKING 7 TABLE I TRACKING PERFORMANCE ON THE 50 VIDEO SEQUENCES OF THE OTB 2013 BENCHMARK. ρ AND φ DENOTE LOCATION ERROR THRESHOLD AND OVERLAP THRESHOLD, RESPECTIVELY. THE BEST RESULTS ARE MARKED IN BOLD-FACE FONTS. THE SECOND BEST RESULTS ARE MARKED BY UNDERLINES Fig. 4. Tracking performance of the proposed and the top five trackers in Wu et al. s evaluation [69] on the 50 video sequences of the OTB 2013 benchmark. Fig. 5. Tracking performance of the proposed tracker and eight other stateof-the-art trackers based on correlation filtering on the 50 video sequences of the OTB 2013 benchmark. TLD [54], ASLA [72], and CXT [73], the top five trackers on the OTB 2015 benchmark according to Wu et al. s evaluation [70] (Struck [63], SCM [71], ASLA [72], CSK [4], and L1APG [22], eight other correlation filtering-based trackers (RCF [12], SRDCF [7], KCF [10], SAMF [6], DSST [5], STC [8], CN [9], and CSK [4], two deep learning-based trackers (CNT [65] and HCFT [58], and two feature selectionbased trackers (ODFS [66] and SSL [18]. Two criteria are used to evaluate the performance of the proposed tracker, which are defined as follows. 1 Precision: The percentage of frames where the tracking location errors (TLEs are less than a predefined threshold. The TLE is defined as the Euclidean distance between the centers of the tracking and the ground truth bounding boxes. 2 Success Rate: The percentage of frames where the overlap rates (ORs are greater than a predefined threshold. The OR is defined as [(A t A g /(A t A g ], where A t and A g denote the areas of the tracking and the ground truth bounding boxes, respectively. C. Evaluations on the OTB 2013 Benchmark Fig. 4 shows the comparison on the tracking performance of the proposed tracker and the top five trackers in Wu et al. s evaluation [69] on the 50 video sequences of the OTB 2013 benchmark. It is evident that the proposed tracker significantly outperforms its five counterparts in this comparison. According to the quantitative evaluation results shown in Table I, the proposed outperforms its five counterparts by 16.5% and 18.4% in terms of precision (ρ = 20 and success rate (φ = 0.5, respectively, on the OTB 2013 benchmark. Fig. 5 shows the comparison on the tracking performance of the proposed tracker and eight other state-of-the-art trackers based on correlation filtering on the 50 video sequences of the OTB 2013 benchmark. The proposed tracker obtains the best performance in terms of precision and the second best performance in terms of success rate. Specifically, according to Table I, although the success rate of the proposed tracker is slightly inferior to the SRDCF tracker, the proposed tracker runs 2.5 times faster than the SRDCF tracker, as shown in Table III. Fig.6 shows the comparison on the tracking performance of the proposed tracker, a deep learning-based tracker CNT and two feature selection-based trackers. It is evident that the proposed tracker outperforms its three counterparts in this comparison by 10.5% and 9.2% in terms of precision (ρ = 20 and success rate (φ = 0.5, respectively. Overall, the evaluation results on the OTB 2013 benchmark demonstrate that the proposed tracker achieves competitive performance against the state-of-the-art approaches on the 50 challenging video sequences. D. Evaluations on the OTB 2015 Benchmark Fig. 7 shows the comparison on the tracking performance of the proposed tracker and the top five trackers in Wu et al. s evaluation [70] on the 100 video sequences of the OTB 2015 benchmark. It is evident that the proposed tracker significantly outperforms its five counterparts in this comparison. According to the quantitative evaluation results shown in Table II, the proposed outperforms its five counterparts by 14.1% and 14.2% in terms of precision (ρ = 20 and success rate (φ = 0.5, respectively, on the OTB 2015 benchmark.

8 8 IEEE TRANSACTIONS ON CYBERNETICS TABLE II TRACKING PERFORMANCE ON THE 100 VIDEO SEQUENCES OF THE OTB 2015 BENCHMARK. ρ AND φ DENOTE LOCATION ERROR THRESHOLD AND OVERLAP THRESHOLD, RESPECTIVELY. THE BEST RESULTS ARE MARKED IN BOLD-FACE FONTS. THE SECOND BEST RESULTS ARE MARKED BY UNDERLINES Fig. 8. Tracking performance of the proposed tracker and six other state-ofthe-art trackers based on correlation filtering on the 100 video sequences of the OTB 2015 benchmark. TABLE III RUNNING SPEEDS (IN FRAMES/S OF THE PROPOSED AND THE EIGHT OTHER STATE-OF-THE-ART CORRELATION FILTERING BASED TRACKERS achieves competitive performance against the state-of-the-art approaches on the 100 challenging video sequences. Fig. 6. Tracking performance of the proposed tracker, the deep learning based tracker [65], and two state-of-the-art feature selection-based trackers on the 50 video sequences of the OTB 2013 benchmark. Fig. 7. Tracking performance of the proposed tracker, the deep tracker [58] and the top five trackers in Wu et al. s evaluation [70] on the 100 video sequences of the OTB 2015 benchmark. Fig. 8 shows the comparison on the tracking performance of the proposed tracker and six other state-of-the-art trackers based on correlation filtering on the 100 video sequences of the OTB 2015 benchmark. The proposed tracker obtains the best performance in terms of precision and the second best performance in terms of success rate. However, the proposed tracker runs 2.5 times faster than the SRDCF tracker that achieves the best success rate. Overall, the evaluation results on the OTB 2015 benchmark demonstrate that the proposed tracker E. Evaluations on Running Speed Considering the time-sensitive nature of practical tracking applications, running speed is also a critical factor to the performance of a visual tracker. Table III shows the running speeds of the proposed and the eight state-of-the-art correlation filtering-based trackers. By incorporating the tracking performance in terms of precision and success rate shown in Tables I and II, the proposed tracker makes a good tradeoff between tracking accuracy and running speed. Even without any code optimization in MATLAB, it still achieves a competitive speed compared with its eight counterparts. F. Evaluations on Various Challenging Situations To thoroughly evaluate the proposed tracker, the performance in various challenging situations is investigated. The results are shown in Figs Both the OTB 2013 and the OTB 2015 benchmarks separate the 50 and 100 video sequences into 11 challenging situations, respectively, including occlusion, deformation, out-of-plane/in plane rotation, illumination variation, scale variation, background clutter, motion blur, fast motion, low resolution, and out of view. In this evaluation, five (SRDCF [7], RCF [12], KCF [10], SAMF [6], and DSST [5] and two (SRDCF [7] and KCF [10] other correlation filtering-based trackers are employed as the baselines on the OTB 2013 and the OTB 2015 benchmarks, respectively. The evaluation results in several challenging situations are reported below. 1 Occlusion: During tracking, the target is often occluded by other objects, leading to partially or entirely abrupt changes in the appearance. Occlusion is usually the major factor resulting in tracking failure. Fig. 9(a shows the tracking performance of the proposed and the competing trackers in the case of occlusion. It is evident that the proposed tracker outperforms its counterparts in this case. Some tracking results in the case of occlusion in representative frames are shown in Fig. 12(a, where a football player is running on the field and

9 SUI et al.: CORRELATION FILTER LEARNING TOWARD PEAK STRENGTH FOR VISUAL TRACKING 9 Fig. 9. Tracking performance of the proposed and the baseline trackers in various challenging situations on the OTB 2013 and the OTB 2015 benchmarks. The number in the legend of each precision plot denotes the precision under the threshold ρ = 20, and in the legend of each success rate plot denotes the average success rate on all thresholds φ [0, 1]. (a Occlusion on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (b Deformation on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (c Out-of-plane rotation on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (d In-plane-rotation on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. sometimes occluded by other players with similar appearances. Only SRDCF and the proposed tracker successfully track the player, but the proposed tracker obtains more accurate center location and scale estimation, while unfortunately, the other competing trackers fail in this experiment. 2 Nonrigid Deformation: In this situation, the target deforms locally in the appearance. This enlarges the representation error of the target, making the tracking instable. Fig. 9(b shows the tracking performance of the visual trackers in the case of nonrigid deformation. The proposed tracker achieves the superior performance over its competing peers in this case. Fig. 12(b shows the qualitative tracking results in some representative frames where nonrigid deformation occurs. The proposed tracker performs very well in the tracking of the sprinter whose body appearance suffers from significant nonrigid deformations. Unfortunately, the SRDCF tracker drifts to another sprinter in the early tracking. 3 Out-of-Plane Rotation: Due to the motion of the target and the viewpoint change of the camera, the target appearance often suffers from out-of-plane rotation in successive frames. Fig. 9(c shows the tracking performance of the visual trackers in the case of out-of-plane rotation. The proposed tracker outperforms its competing counterparts in this case. As shown in Fig. 12(c, the qualitative tracking results in several representative frames in this case are illustrated, where a soccer is celebrating their victory with his teammates. It can be seen that the appearance of this soccer is changed drastically during tracking due to the out-of-plane rotation. The proposed tracker succeeds in tracking this soccer and obtains good results in these challenging frames. 4 In-Plane Rotation: During tracking, the motion of the target often causes the in-plane rotation in the appearance. In this case, it is difficult to estimate the boundary between the target and the background. This is thus a major factor

10 10 IEEE TRANSACTIONS ON CYBERNETICS Fig. 10. Tracking performance of the proposed and the baseline trackers in various challenging situations on the OTB 2013 and the OTB 2015 benchmarks. The number in the legend of each precision plot denotes the precision under the threshold ρ = 20, and in the legend of each success rate plot denotes the average success rate on all thresholds φ [0, 1]. (a Illumination variation on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (b Scale variation on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (c Background clutter on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (d Motion blur on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. influencing the accuracy of target localization. The tracking performance of the visual trackers is shown in Fig. 9(d. The proposed tracker yields the second and the third best performance in this case in terms of precision and success rate, respectively. Fig. 12(d shows the qualitative tracking results in several representative frames, where in-plane rotation happens. It can be seen that the proposed tracker performs well in these frames although the face of the person being tracked rotates in the camera plane during tracking. 5 Illumination Change: Due to the changes in the lighting condition of the scene where tracking is conducted, the entire target appearance varies abruptly. Fig. 11(a shows the tracking performance of the proposed and the competing trackers in the case of illumination change. The proposed tracker performs the third best in this case. Fig. 12(e shows the representative results in this case, where a person is walking in a room in which the illumination drastically changes. The proposed tracker accurately tracks the person in this scene. 6 Scale Variation: During tracking, the scale of the target often varies in successive frames as the motion of the target or the camera, or both. In this case, a good tracker is required to estimate the scale as accurately as possible, while keeping low TLE. Fig. 11(b shows the tracking performance of the visual trackers in the case of scale variation. The proposed tracker performs the best and second best in this case in terms of precision and success rate, respectively. Fig. 12(f shows the qualitative results in this case in several representative frames, where a woman is walking to the farther end along a corridor. The proposed tracker accurately estimate the scale of the appearance of this woman in these frames. It can also be seen from Fig. 10(d that the proposed tracker underperforms the SRDCF tracker in the case of motion blur. Because the target is smoothed in the presence of motion blur, the features (pixels are averaged within the target region. As a result, the sparsity-based feature selection strategy is unable to

11 SUI et al.: CORRELATION FILTER LEARNING TOWARD PEAK STRENGTH FOR VISUAL TRACKING 11 Fig. 11. Tracking performance of the proposed and the baseline trackers in various challenging situations on the OTB 2013 and the OTB 2015 benchmarks. The number in the legend of each precision plot denotes the precision under the threshold ρ = 20, and in the legend of each success rate plot denotes the average success rate on all thresholds φ [0, 1]. (a Fast motion on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (b Low resolution on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. (c Out of view on the OTB 2013 (left two and the OTB 2015 (right two benchmarks. accurately localize the distractive features. This is a limitation of the proposed approach. G. Demonstration of the Peak Strength To analyze to what extent the proposed approach to the PS correlation filter learning improves the tracking performance, another comparison is conducted on the OTB 2013 benchmark between the PS and no PS (NPS trackers, as shown in Fig. 13. It can be seen that the PS learning method leads to 6.5% and 6.3% performance improvements in terms of precision and success rate, respectively. H. Demonstration of the Feature Grouping To investigate the contribution of the feature grouping [i.e., the l 2 -norm in (5] to the tracking performance, a comparison is conducted on the OTB 2013 benchmark between the trackers, respectively, implemented via the elastic net (combination of l 1 - and l 2 -norms and the LASSO (l 1 -norm only constraints. Correspondingly, we constructed another tracker via the LASSO constraint. The comparison results are shown in Fig. 14. It is evident that the grouping constraint leads to 7.1% and 6.0% performance improvements on the OTB 2013 benchmark in terms of precision and success rate, respectively. The l 1 constraint can ignore the distractive features during the filter learning but it fails to select features from the same group. However, the distractive features often appear at a local region, i.e., in a group form. For this reason, a l 2 constraint is leveraged to incorporate with the l 1 constraint for grouping the features. As a result, the correlation filter can suppress the response at the regions with significant appearance changes, such as occlusion, deformation, illumination variation, and in-plane/out-of-plane rotation, by its zero coefficients. This also enhances the target region in the correlation filtering, leading to the strengthened peak of the filter response for the improved discrimination performance. I. Investigation on Parameters In this paper, as recommended in [10], the parameter λ in (5 is set to Following the suggestion in [15], the parameter τ in (5 issetto1 λ. In(7, the parameter μ is investigated comprehensively. According to our initial observations, when μ is set to a relatively small value, the proposed tracker performs more stable and robust. Thus, we investigate μ within the selected range { 10 2, 10 3, 10 4, 10 5, 10 6} to observe its influence on the tracking performance on the benchmark. As shown in Fig. 15, the proposed tracker performs best when setting μ = Note that the parameters λ, τ, and μ balance the weights between their corresponding terms in (7. Because λ is set to 10 4, which controls the weight of the stability of the correlation filter in a squared form, μ is set

12 12 IEEE TRANSACTIONS ON CYBERNETICS Fig. 12. Tracking results in representative frames in various challenging situations. (a Occlusion. (b Nonrigid deformation. (c Out-of-plane rotation. (d In-plane rotation. (e Illumination change. (f Scale variation. Fig. 13. Tracking performance of the PS and NPS on the 50 video sequences of the OTB 2013 benchmark. The number in the legend of the precision plot denotes the precision under the threshold ρ = 20, and in the legend of the success rate plot denotes the average success rate on all thresholds φ [0, 1]. Fig. 14. Tracking performance of the feature grouped and no feature grouped trackers on the 50 video sequences of the OTB 2013 benchmark. The number in the legend of the precision plot denotes the precision under the threshold ρ = 20, and in the legend of the success rate plot denotes the average success rate on all thresholds φ [0, 1]. slightly smaller than λ may ensure the stability of the learned filter. Also, too smaller μ may make its corresponding term trivial. As a result, we set μ = 10 5 in all the experiments in this paper. Fig. 15. Investigation on the parameter μ in (7 on the OTB 2013 benchmark. V. CONCLUSION We have proposed a novel visual tracking approach to correlation filter learning toward peak strength of filter response. An elastic net constraint has been imposed on the correlation filter during the filter learning, among which the squared term ensures the features to be grouped, while the sparsity term adaptively ignores the distractive features in the correlation filtering. A new metric to evaluate the peak strength has been proposed to measure the discriminative capability of the learned correlation filter, by which the proposed approach has been demonstrated to effectively strengthen the peak of the correlation response, leading to more discriminative performance than previous methods. Extensive experiments on a popular visual tracking benchmark has demonstrated that the proposed tracker outperforms most state-of-the-art methods. APPENDIX This section presents the proof of Proposition 1. The problem in (8 has a closed-form solution that

13 SUI et al.: CORRELATION FILTER LEARNING TOWARD PEAK STRENGTH FOR VISUAL TRACKING 13 is shown in (10 α = ( K H K + λk + μi 1( K H y + μβ. By incorporating (6 K = Ddiag (ˆk 1 we have the following deviation: D H α = ( K H K + λk + μi 1( K H y + μβ ( = Ddiag (ˆk 1 ˆk 1 D H + λddiag (ˆk 1 D H + μi ( Ddiag(ˆk 1 D H y + μβ ( 1 = Ddiag ˆk 1 diag(ˆk 1 D H y ˆk 1 + λˆk 1 + μ ( 1 + μddiag ˆk 1 D H β. ˆk 1 + λˆk 1 + μ The conjugate of DFT of α is then found by ( ˆk 1 ˆα = diag ˆk 1 ˆk 1 + λˆk 1 + μ ( 1 + μ diag ˆβ ˆk 1 ˆk 1 + λˆk 1 + μ = ˆk 1 ŷ + μ ˆβ ˆk 1 ˆk 1 + λˆk 1 + μ. Equivalently, the DFT of α is obtained from ˆα = ŷ ˆk 1 ŷ + μ ˆβ ˆk 1 ˆk 1 + λˆk 1 + μ. REFERENCES 1 [1] A. Yilmaz, O. Javed, and M. Shah, Object tracking: A survey, ACM Comput. Surveys, vol. 38, no. 4, Dec. 2006, Art. no. 13. [2] A. W. M. Smeulders et al., Visual tracking: An experimental survey, IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp , Jul [3] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, Visual object tracking using adaptive correlation filters, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2010, pp [4] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in Proc. Eur. Conf. Comput. Vis. (ECCV, Florence, Italy, 2012, pp [5] M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, Accurate scale estimation for robust visual tracking, in Proc. Brit. Mach. Vis. Conf. (BMVC, Linköping, Sweden, 2014, pp [6] Y. Li and J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in Proc. Eur. Conf. Comput. Vis. Workshop, Zürich, Switzerland, 2014, pp [7] M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, Learning spatially regularized correlation filters for visual tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Santiago, Chile, 2015, pp [8] K. Zhang, L. Zhang, Q. Liu, D. Zhang, and M.-H. Yang, Fast visual tracking via dense spatio-temporal context learning, in Proc. Eur. Conf. Comput. Vis. (ECCV, Zürich, Switzerland, 2014, pp [9] M. Danelljan, F. S. Khan, M. Felsberg, and J. V. D. Weijer, Adaptive color attributes for real-time visual tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Columbus, OH, USA, 2014, pp [10] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp , Mar [11] S. Liu, T. Zhang, X. Cao, and C. Xu, Structural correlation filter for robust visual tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2016, pp [12] Y. Sui, Z. Zhang, G. Wang, Y. Tang, and L. Zhang, Real-time visual tracking: Promoting the robustness of correlation filter learning, in Proc. Eur. Conf. Comput. Vis. (ECCV, Amsterdam, The Netherlands, 2016, pp [13] J. Wright et al., Sparse representation for computer vision and pattern recognition, Proc. IEEE, vol. 98, no. 6, pp , Jun [14] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Stat. Soc. B (Methodol., vol. 58, no. 1, pp , [15] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. Roy. Stat. Soc. B (Stat. Methodol., vol. 67, no. 2, pp , [16] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, Incremental learning for robust visual tracking, Int. J. Comput. Vis., vol. 77, nos. 1 3, pp , [17] J. Kwon and K. Lee, Visual tracking decomposition, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2010, pp [18] Y. Sui, S. Zhang, and L. Zhang, Robust visual tracking via sparsityinduced subspace learning, IEEE Trans. Image Process., vol. 24, no. 12, pp , Dec [19] Y. Sui, G. Wang, Y. Tang, and L. Zhang, Tracking completion, in Proc. Eur. Conf. Comput. Vis. (ECCV, Amsterdam, The Netherlands, 2016, pp [20] X. Mei and H. Ling, Robust visual tracking using L 1 minimization, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, 2009, pp [21] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, Robust visual tracking via multi-task sparse learning, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2012, pp [22] X. Mei and H. Ling, Robust visual tracking and vehicle classification via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp , Nov [23] M. Barnard, W. Wang, J. Kittler, S. M. Naqvi, and J. A. Chambers, A dictionary learning approach to tracking, in Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP, Kyoto, Japan, 2012, pp [24] D. Wang, H. Lu, and M.-H. Yang, Least soft-threshold squares tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Portland, OR, USA, 2013, pp [25] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, Robust visual tracking via structured multi-task sparse learning, Int. J. Comput. Vis., vol. 101, no. 2, pp , [26] T. Zhang et al., Structural sparse tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2015, pp [27] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, Low-rank sparse learning for robust visual tracking, in Proc. Eur. Conf. Comput. Vis. (ECCV, Florence, Italy, 2012, pp [28] Y. Sui et al., Self-expressive tracking, Pattern Recognit., vol. 48, no. 9, pp , [29] Y. Sui, Y. Tang, and L. Zhang, Discriminative low-rank tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Santiago, Chile, 2015, pp [30] Y. Sui and L. Zhang, Robust tracking via locally structured representation, Int. J. Comput. Vis., vol. 119, no. 2, pp , [31] X. Li, W. Hu, Z. Zhang, X. Zhang, and G. Luo, Robust visual tracking based on incremental tensor subspace learning, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Rio de Janeiro, Brazil, 2007, pp [32] W. Hu et al., Incremental tensor subspace learning and its applications to foreground segmentation and tracking, Int. J. Comput. Vis., vol. 91, no. 3, pp , [33] W. Hu et al., Single and multiple object tracking using log-euclidean Riemannian subspace and block-division appearance model, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 12, pp , Dec [34] Y. Sui and L. Zhang, Visual tracking via locally structured Gaussian process regression, IEEE Signal Process. Lett., vol. 22, no. 9, pp , Sep [35] B. Liu, J. Huang, L. Yang, and C. Kulikowsk, Robust tracking using local sparse appearance model and K-selection, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2011, pp [36] B. Liu, J. Huang, C. Kulikowski, and L. Yang, Robust visual tracking using local sparse appearance model and K-selection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 12, pp , Dec

14 14 IEEE TRANSACTIONS ON CYBERNETICS [37] A. Adam, E. Rivlin, and I. Shimshoni, Robust fragments-based tracking using the integral histogram, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, vol. 1. New York, NY, USA, 2006, pp [38] X. Li, Z. Han, L. Wang, and H. Lu, Visual tracking via random walks on graph model, IEEE Trans. Cybern., vol. 46, no. 9, pp , Sep [39] X. Li, A. Dick, C. Shen, A. van den Hengel, and H. Wang, Incremental learning of 3D-DCT compact representations for robust visual tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp , Apr [40] D. Wang, H. Lu, and M.-H. Yang, Online object tracking with sparse prototypes, IEEE Trans. Image Process., vol. 22, no. 1, pp , Jan [41] T. Zhang, A. Bibi, and B. Ghanem, In defense of sparse tracking: Circulant sparse tracker, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2016, pp [42] D. Wang and H. Lu, Object tracking via 2DPCA and L1-regularization, IEEE Signal Process. Lett., vol. 19, no. 11, pp , Nov [43] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 8, pp , Aug [44] S. Zhang, X. Yu, Y. Sui, S. Zhao, and L. Zhang, Object tracking with multi-view support vector machines, IEEE Trans. Multimedia, vol. 17, no. 3, pp , Mar [45] S. Zhang, Y. Sui, X. Yu, S. Zhao, and L. Zhang, Hybrid support vector machines for robust object tracking, Pattern Recognit., vol. 48, no. 8, pp , [46] S. Zhang, Y. Sui, S. Zhao, X. Yu, and L. Zhang, Multi-local-task learning with global regularization for object tracking, Pattern Recognit., vol. 48, no. 12, pp , [47] S. Zhang, S. Zhao, Y. Sui, and L. Zhang, Single object tracking with fuzzy least squares support vector machine, IEEE Trans. Image Process., vol. 24, no. 12, pp , Dec [48] S. Avidan, Ensemble tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp , Feb [49] H. Grabner, M. Grabner, and H. Bischof, Real-time tracking via online boosting, in Proc. Brit. Mach. Vis. Conf. (BMVC, vol , pp [50] H. Grabner, C. Leistner, and H. Bischof, Semi-supervised on-line boosting for robust tracking, in Proc. Eur. Conf. Comput. Vis. (ECCV, Marseilles, France, 2008, pp [51] K. Zhang, L. Zhang, and M.-H. Yang, Fast compressive tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 10, pp , Oct [52] S. Wang, H. Lu, F. Yang, and M.-H. Yang, Superpixel tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Barcelona, Spain, 2011, pp [53] Z. Kalal, J. Matas, and K. Mikolajczyk, P-N learning: Bootstrapping binary classifiers by structural constraints, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 2010, pp [54] Z. Kalal, K. Mikolajczyk, and J. Matas, Tracking-learning-detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp , Jul [55] B. Babenko, M.-H. Yang, and S. Belongie, Robust object tracking with online multiple instance learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp , Aug [56] J. Fan, X. Shen, and Y. Wu, Scribble tracker: A matting-based approach for robust tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 8, pp , Aug [57] D. Du, L. Zhang, H. Lu, X. Mei, and X. Li, Discriminative hash tracking with group sparsity, IEEE Trans. Cybern., vol. 46, no. 8, pp , Aug [58] C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, Hierarchical convolutional features for visual tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Santiago, Chile, 2015, pp [59] L. Wang, W. Ouyang, X. Wang, and H. Lu, Visual tracking with fully convolutional networks, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Santiago, Chile, 2015, pp [60] Y. Qi et al., Hedged deep tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2016, pp [61] L. Wang, W. Ouyang, X. Wang, and H. Lu, STCT: Sequentially training convolutional networks for visual tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2016, pp [62] H. Nam and B. Han, Learning multi-domain convolutional neural networks for visual tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2016, pp [63] S. Hare, A. Saffari, and P. H. S. Torr, Struck: Structured output tracking with kernels, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Barcelona, Spain, 2011, pp [64] H. Grabner and H. Bischof, On-line boosting and vision, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, vol , pp [65] K. Zhang, Q. Liu, Y. Wu, and M.-H. Yang, Robust visual tracking via convolutional networks without training, IEEE Trans. Image Process., vol. 25, no. 4, pp , Apr [66] K. Zhang, L. Zhang, and M.-H. Yang, Real-time object tracking via online discriminative feature selection, IEEE Trans. Image Process., vol. 22, no. 12, pp , Dec [67] R. M. Gray, Toeplitz and Circulant Matrices: A Review. Boston, MA, USA: Now, [68] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imag. Sci., vol. 2, no. 1, pp , Jan [69] Y. Wu, J. Lim, and M.-H. Yang, Online object tracking: A benchmark, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Portland, OR, USA, 2013, pp [70] Y. Wu, J. Lim, and M. H. Yang, Object tracking benchmark, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp , Sep [71] W. Zhong, H. Lu, and M.-H. Yang, Robust object tracking via sparsitybased collaborative model, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2012, pp [72] X. Jia, H. Lu, and M.-H. Yang, Visual tracking via adaptive structural local sparse appearance model, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, 2012, pp [73] T. B. Dinh, N. Vo, and G. Medioni, Context tracker: Exploring supporters and distracters in unconstrained environments, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 2011, pp Yao Sui received the Ph.D. degree in electronic engineering from Tsinghua University, Beijing, China, under the supervision of Prof. L. Zhang. He is currently a Post-Doctoral Research Fellow with the Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA, researching with Prof. G. Wang. His current research interests include machine learning, computer vision, image processing, and pattern recognition. Guanghui Wang received the Ph.D. degree from the University of Waterloo, Waterloo, ON, Canada. He is currently an Assistant Professor with the University of Kansas, Lawrence, KS, USA. He is an Adjunct Professor with the Institute of Automation, Chinese Academy of Sciences, Beijing, China. He has published one book at Springer-Verlag, and over 80 papers in peer-reviewed journals and conference proceedings. His current research interests include computer vision, image processing, and robotics. Mr. Wang served as an Associate Editor and on the editorial board of two journals, as an Area Chair or a TPC Member of over 20 conferences, and as a Reviewer of over 20 journals. Li Zhang received the B.S., M.S., and Ph.D. degrees in electronic engineering from Tsinghua University, Beijing, China. He is currently a Professor with the Department of Electronic Engineering, Tsinghua University. He is directing the UAV Vision Laboratory, Tsinghua University, and also a member of the National Laboratory of Pattern Recognition, Beijing. His current research interests include image processing, computer vision, pattern recognition, and computer graphics.

Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning

Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning Yao Sui 1, Ziming Zhang 2, Guanghui Wang 1, Yafei Tang 3, Li Zhang 4 1 Dept. of EECS, University of Kansas, Lawrence,

More information

AS a fundamental component in computer vision system,

AS a fundamental component in computer vision system, 18 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 Exploiting Spatial-Temporal Locality of Tracking via Structured Dictionary Learning Yao Sui, Guanghui Wang, Senior Member, IEEE, Li Zhang,

More information

Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning

Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning Yao Sui 1(B), Ziming Zhang 2, Guanghui Wang 1, Yafei Tang 3, and Li Zhang 4 1 Department of EECS, University of Kansas,

More information

arxiv: v2 [cs.cv] 9 Sep 2016

arxiv: v2 [cs.cv] 9 Sep 2016 arxiv:1608.08171v2 [cs.cv] 9 Sep 2016 Tracking Completion Yao Sui 1, Guanghui Wang 1,4, Yafei Tang 2, Li Zhang 3 1 Dept. of EECS, University of Kansas, Lawrence, KS 66045, USA 2 China Unicom Research Institute,

More information

Real-time Object Tracking via Online Discriminative Feature Selection

Real-time Object Tracking via Online Discriminative Feature Selection IEEE TRANSACTION ON IMAGE PROCESSING 1 Real-time Object Tracking via Online Discriminative Feature Selection Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang Abstract Most tracking-by-detection algorithms

More information

A New Approach for Train Driver Tracking in Real Time. Ming-yu WANG, Si-le WANG, Li-ping CHEN, Xiang-yang CHEN, Zhen-chao CUI, Wen-zhu YANG *

A New Approach for Train Driver Tracking in Real Time. Ming-yu WANG, Si-le WANG, Li-ping CHEN, Xiang-yang CHEN, Zhen-chao CUI, Wen-zhu YANG * 2018 International Conference on Modeling, Simulation and Analysis (ICMSA 2018) ISBN: 978-1-60595-544-5 A New Approach for Train Driver Tracking in Real Time Ming-yu WANG, Si-le WANG, Li-ping CHEN, Xiang-yang

More information

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking International Journal of Computer Vision manuscript No. (will be inserted by the editor) Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking Chao Ma Jia-Bin Huang Xiaokang

More information

OBJECT tracking has been extensively studied in computer

OBJECT tracking has been extensively studied in computer IEEE TRANSAION ON IMAGE PROCESSING 1 Real-time Object Tracking via Online Discriminative Feature Selection Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang Abstract Most tracking-by-detection algorithms train

More information

Keywords:- Object tracking, multiple instance learning, supervised learning, online boosting, ODFS tracker, classifier. IJSER

Keywords:- Object tracking, multiple instance learning, supervised learning, online boosting, ODFS tracker, classifier. IJSER International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 37 Object Tracking via a Robust Feature Selection approach Prof. Mali M.D. manishamali2008@gmail.com Guide NBNSCOE

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Tracking Completion. Institute of Automation, CAS, Beijing, China

Tracking Completion.  Institute of Automation, CAS, Beijing, China Tracking Completion Yao Sui 1(B), Guanghui Wang 1,4, Yafei Tang 2, and Li Zhang 3 1 Department of EECS, University of Kansas, Lawrence, KS 66045, USA suiyao@gmail.com, ghwang@ku.edu 2 China Unicom Research

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Image Restoration and Background Separation Using Sparse Representation Framework

Image Restoration and Background Separation Using Sparse Representation Framework Image Restoration and Background Separation Using Sparse Representation Framework Liu, Shikun Abstract In this paper, we introduce patch-based PCA denoising and k-svd dictionary learning method for the

More information

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS This chapter presents a computational model for perceptual organization. A figure-ground segregation network is proposed based on a novel boundary

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

Supplementary Material

Supplementary Material Supplementary Material ECO: Efficient Convolution Operators for Tracking Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg Computer Vision Laboratory, Department of Electrical Engineering,

More information

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier Computer Vision 2 SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung Computer Vision 2 Dr. Benjamin Guthier 1. IMAGE PROCESSING Computer Vision 2 Dr. Benjamin Guthier Content of this Chapter Non-linear

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Online Discriminative Tracking with Active Example Selection

Online Discriminative Tracking with Active Example Selection 1 Online Discriminative Tracking with Active Example Selection Min Yang, Yuwei Wu, Mingtao Pei, Bo Ma and Yunde Jia, Member, IEEE Abstract Most existing discriminative tracking algorithms use a sampling-and-labeling

More information

ROBUST visual tracking is a challenging problem due to

ROBUST visual tracking is a challenging problem due to Learning Support Correlation Filters for Visual Tracking Wangmeng Zuo, Xiaohe Wu, Liang Lin, Lei Zhang, and Ming-Hsuan Yang arxiv:6.632v [cs.cv] 22 Jan 26 Abstract Sampling and budgeting training examples

More information

GIVEN an initial state (or bounding box) of a target

GIVEN an initial state (or bounding box) of a target IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 8, AUGUST 2017 3817 Locator-Checker-Scaler Object Tracking Using Spatially Ordered and Weighted Patch Descriptor Han-Ul Kim, Student Member, IEEE, and

More information

Globally Stabilized 3L Curve Fitting

Globally Stabilized 3L Curve Fitting Globally Stabilized 3L Curve Fitting Turker Sahin and Mustafa Unel Department of Computer Engineering, Gebze Institute of Technology Cayirova Campus 44 Gebze/Kocaeli Turkey {htsahin,munel}@bilmuh.gyte.edu.tr

More information

Object Tracking using HOG and SVM

Object Tracking using HOG and SVM Object Tracking using HOG and SVM Siji Joseph #1, Arun Pradeep #2 Electronics and Communication Engineering Axis College of Engineering and Technology, Ambanoly, Thrissur, India Abstract Object detection

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

A Complementary Tracking Model with Multiple Features

A Complementary Tracking Model with Multiple Features A Complementary Tracking Model with Multiple Features Peng Gao, Yipeng Ma, Chao Li, Ke Song, Fei Wang, Liyi Xiao Shenzhen Graduate School, Harbin Institute of Technology, China arxiv:184.7459v3 [cs.cv]

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

A Comparative Study of Object Trackers for Infrared Flying Bird Tracking

A Comparative Study of Object Trackers for Infrared Flying Bird Tracking A Comparative Study of Object Trackers for Infrared Flying Bird Tracking Ying Huang, Hong Zheng, Haibin Ling 2, Erik Blasch 3, Hao Yang 4 School of Automation Science and Electrical Engineering, Beihang

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

Enhanced Laplacian Group Sparse Learning with Lifespan Outlier Rejection for Visual Tracking

Enhanced Laplacian Group Sparse Learning with Lifespan Outlier Rejection for Visual Tracking Enhanced Laplacian Group Sparse Learning with Lifespan Outlier Rejection for Visual Tracking Behzad Bozorgtabar 1 and Roland Goecke 1,2 1 Vision & Sensing, HCC Lab, ESTeM University of Canberra 2 IHCC,

More information

Converting 2D motion into 3D world coordinates in the case of soccer players video

Converting 2D motion into 3D world coordinates in the case of soccer players video Converting 2D motion into 3D world coordinates in the case of soccer players video by Charalampaki Eirini Technological Educational Institute of Crete, Department of Informatics Engineering, School of

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Collaborative Sparsity and Compressive MRI

Collaborative Sparsity and Compressive MRI Modeling and Computation Seminar February 14, 2013 Table of Contents 1 T2 Estimation 2 Undersampling in MRI 3 Compressed Sensing 4 Model-Based Approach 5 From L1 to L0 6 Spatially Adaptive Sparsity MRI

More information

Tracking Using Online Feature Selection and a Local Generative Model

Tracking Using Online Feature Selection and a Local Generative Model Tracking Using Online Feature Selection and a Local Generative Model Thomas Woodley Bjorn Stenger Roberto Cipolla Dept. of Engineering University of Cambridge {tew32 cipolla}@eng.cam.ac.uk Computer Vision

More information

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Florin C. Ghesu 1, Thomas Köhler 1,2, Sven Haase 1, Joachim Hornegger 1,2 04.09.2014 1 Pattern

More information

Robust Face Recognition via Sparse Representation

Robust Face Recognition via Sparse Representation Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

2 Cascade detection and tracking

2 Cascade detection and tracking 3rd International Conference on Multimedia Technology(ICMT 213) A fast on-line boosting tracking algorithm based on cascade filter of multi-features HU Song, SUN Shui-Fa* 1, MA Xian-Bing, QIN Yin-Shi,

More information

Breaking it Down: The World as Legos Benjamin Savage, Eric Chu

Breaking it Down: The World as Legos Benjamin Savage, Eric Chu Breaking it Down: The World as Legos Benjamin Savage, Eric Chu To devise a general formalization for identifying objects via image processing, we suggest a two-pronged approach of identifying principal

More information

Superpixel Tracking. The detail of our motion model: The motion (or dynamical) model of our tracker is assumed to be Gaussian distributed:

Superpixel Tracking. The detail of our motion model: The motion (or dynamical) model of our tracker is assumed to be Gaussian distributed: Superpixel Tracking Shu Wang 1, Huchuan Lu 1, Fan Yang 1 abnd Ming-Hsuan Yang 2 1 School of Information and Communication Engineering, University of Technology, China 2 Electrical Engineering and Computer

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2 A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open

More information

Detecting and Identifying Moving Objects in Real-Time

Detecting and Identifying Moving Objects in Real-Time Chapter 9 Detecting and Identifying Moving Objects in Real-Time For surveillance applications or for human-computer interaction, the automated real-time tracking of moving objects in images from a stationary

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

CONTENT ADAPTIVE SCREEN IMAGE SCALING

CONTENT ADAPTIVE SCREEN IMAGE SCALING CONTENT ADAPTIVE SCREEN IMAGE SCALING Yao Zhai (*), Qifei Wang, Yan Lu, Shipeng Li University of Science and Technology of China, Hefei, Anhui, 37, China Microsoft Research, Beijing, 8, China ABSTRACT

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

Image features. Image Features

Image features. Image Features Image features Image features, such as edges and interest points, provide rich information on the image content. They correspond to local regions in the image and are fundamental in many applications in

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Second Order SMO Improves SVM Online and Active Learning

Second Order SMO Improves SVM Online and Active Learning Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms

More information

Robust Ring Detection In Phase Correlation Surfaces

Robust Ring Detection In Phase Correlation Surfaces Griffith Research Online https://research-repository.griffith.edu.au Robust Ring Detection In Phase Correlation Surfaces Author Gonzalez, Ruben Published 2013 Conference Title 2013 International Conference

More information

Robust Shape Retrieval Using Maximum Likelihood Theory

Robust Shape Retrieval Using Maximum Likelihood Theory Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,

More information

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking The Visual Object Tracking Challenge Workshop 2016 Modeling and Propagating CNNs in a Tree Structure for Visual Tracking Hyeonseob Nam* Mooyeol Baek* Bohyung Han Dept. of Computer Science and Engineering

More information

Lecture 19: November 5

Lecture 19: November 5 0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Face Recognition via Sparse Representation

Face Recognition via Sparse Representation Face Recognition via Sparse Representation John Wright, Allen Y. Yang, Arvind, S. Shankar Sastry and Yi Ma IEEE Trans. PAMI, March 2008 Research About Face Face Detection Face Alignment Face Recognition

More information

ARGUABLY one of the biggest breakthroughs in recent

ARGUABLY one of the biggest breakthroughs in recent IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 High-Speed Tracking with Kernelized Correlation Filters João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista arxiv:144.7584v3

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

One category of visual tracking. Computer Science SURJ. Michael Fischer

One category of visual tracking. Computer Science SURJ. Michael Fischer Computer Science Visual tracking is used in a wide range of applications such as robotics, industrial auto-control systems, traffic monitoring, and manufacturing. This paper describes a new algorithm for

More information

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major

More information

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

PROBLEM FORMULATION AND RESEARCH METHODOLOGY PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter

More information

THE recent years have witnessed significant advances

THE recent years have witnessed significant advances IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 4, APRIL 2014 1639 Robust Superpixel Tracking Fan Yang, Student Member, IEEE, Huchuan Lu, Senior Member, IEEE, and Ming-Hsuan Yang, Senior Member, IEEE

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information

SUBMISSION TO IEEE TRANSACTIONS ON IMAGE PROCESSING 1. RGB-T Object Tracking: Benchmark and Baseline

SUBMISSION TO IEEE TRANSACTIONS ON IMAGE PROCESSING 1. RGB-T Object Tracking: Benchmark and Baseline SUBMISSION TO IEEE TRANSACTIONS ON IMAGE PROCESSING 1 RGB-T Object Tracking: Benchmark and Baseline Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang arxiv:1805.08982v1 [cs.cv] 23 May 2018

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

Photometric Stereo with Auto-Radiometric Calibration

Photometric Stereo with Auto-Radiometric Calibration Photometric Stereo with Auto-Radiometric Calibration Wiennat Mongkulmann Takahiro Okabe Yoichi Sato Institute of Industrial Science, The University of Tokyo {wiennat,takahiro,ysato} @iis.u-tokyo.ac.jp

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Face Detection and Recognition in an Image Sequence using Eigenedginess

Face Detection and Recognition in an Image Sequence using Eigenedginess Face Detection and Recognition in an Image Sequence using Eigenedginess B S Venkatesh, S Palanivel and B Yegnanarayana Department of Computer Science and Engineering. Indian Institute of Technology, Madras

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

VISUAL object tracking is an important problem in

VISUAL object tracking is an important problem in IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 4, APRIL 017 763 Occlusion-Aware Real-Time Object Tracking Xingping Dong, Jianbing Shen, Senior Member, IEEE, Dajiang Yu, Wenguan Wang, Jianhong Liu, and Hua

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Structural Local Sparse Tracking Method Based on Multi-feature Fusion and Fractional Differential

Structural Local Sparse Tracking Method Based on Multi-feature Fusion and Fractional Differential Journal of Information Hiding and Multimedia Signal Processing c 28 ISSN 273-422 Ubiquitous International Volume 9, Number, January 28 Structural Local Sparse Tracking Method Based on Multi-feature Fusion

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing

More information

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIENCE, VOL.32, NO.9, SEPTEMBER 2010 Hae Jong Seo, Student Member,

More information

4. Image Retrieval using Transformed Image Content

4. Image Retrieval using Transformed Image Content 4. Image Retrieval using Transformed Image Content The desire of better and faster retrieval techniques has always fuelled to the research in content based image retrieval (CBIR). A class of unitary matrices

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1 Section 5 Convex Optimisation 1 W. Dai (IC) EE4.66 Data Proc. Convex Optimisation 1 2018 page 5-1 Convex Combination Denition 5.1 A convex combination is a linear combination of points where all coecients

More information