AS a fundamental component in computer vision system,

Size: px
Start display at page:

Download "AS a fundamental component in computer vision system,"

Transcription

1 18 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 Exploiting Spatial-Temporal Locality of Tracking via Structured Dictionary Learning Yao Sui, Guanghui Wang, Senior Member, IEEE, Li Zhang, and Ming-Hsuan Yang, Senior Member, IEEE Abstract In this paper, a novel spatial-temporal locality is proposed and unified via a discriminative dictionary learning framework for visual tracking. By exploring the strong local correlations between temporally obtained target and their spatially distributed nearby background neighbors, a spatial-temporal locality is obtained. The locality is formulated as a subspace model and exploited under a unified structure of discriminative dictionary learning with a subspace structure. Using the learned dictionary, the target and its background can be described and distinguished effectively through their sparse codes. As a result, the target is localized by integrating both the descriptive and the discriminative qualities. Extensive experiments on various challenging video sequences demonstrate the superior performance of proposed algorithm over the other state-of-the-art approaches. Index Terms Visual tracking, dictionary learning, sparse representation, low-rank learning. I. INTRODUCTION AS a fundamental component in computer vision system, visual tracking has been one of the most active research topics in computer vision with various potential applications, such as video surveillance, human-computer interaction, autonomous driving, and robotics. Recent years have witnessed a great success in visual tracking [1], []. Nonetheless, many challenging problems still have not been solved effectively, such as drastic illumination change, background clutter, scale variation, and heavy occlusion. A typical tracking approach is to learn a robust representation for the target, Manuscript received October 7, 016; revised February 19, 017, June 19, 017, and September 8, 017; accepted November 3, 017. Date of publication December 4, 017; date of current version December 7, 017. This work was supported in part by the National Natural Science Foundation of China (NSFC under Grant , in part by the Kansas NASA EPSCoR Program under Grant KNEP-PDG KU, in part by the General Research Fund of the University of Kansas under Grant 8901, in part by the Joint Fund of Civil Aviation Research by NSFC and Civil Aviation Administration under Grant U153313, in part by the NSF CAREER under Grant , and in part by Gifts from Adobe and Nvidia. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Junsong Yuan. (Corresponding Author: Yao Sui. Y. Sui is with the Harvard Medical School, Harvard University, Boston, MA 0115 USA ( suiyao@gmail.com. G. Wang is with the Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS USA ( ghwang@ku.edu. L. Zhang is with the Department of Electronic Engineering, Tsinghua University, Beijing , China ( chinazhangli@tsinghua.edu.cn. M.-H. Yang is with the Department of Electrical Engineering and Computer Science, University of California at Merced, Merced, CA USA ( mhyang@ucmerced.edu. Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TIP Fig. 1. Illustration of the spatial-temporal locality. (a A video sequence being analyzed. (b Temporal and spatial correlations. (c Temporal correlations between pairwise targets. (d Average spatial correlations between the target and the background patches in every frame. In (c and (d, the cooler color indicates smaller value. the background, or both, by which the intrinsic structure of the target and the background is exploited via subspace [3], sparsity [4], and compactness [5] models. In this work, we propose a unified structure for visual tracking, i.e., thespatial-temporal locality. Such a structure leads to strong local correlations between temporally obtained targets and their spatially nearby regions in background. Fig. 1 shows an illustration of the spatial-temporal locality. The temporal and spatial correlations, as illustrated in Fig. 1(b, are analyzed on the video sequence shown in Fig. 1(a. The temporal correlations between pairwise targets are plotted in Fig. 1(c. It can be seen that large values are located near the diagonal blocks of the plot, which indicates that a target is strongly correlated to its temporal neighbors. Such a neighborhood results in salient locality in the temporal dimension. Meanwhile, the spatial correlations between the target and the background in every frame are investigated, and the average result on all frames is plotted in Fig. 1(d. It is evident that the closer the background to the target, the stronger the correlation between the target and the background. This observation leads to a locality structure in the spatial dimension IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 183 To further verify the above observation, we analyze the temporal and spatial correlations on the 51 video sequences of the OTB 013 benchmark [6], which include a total of 9,486 frames. The statistical results are shown in Fig.. The average correlations on all these frames between the ground truth targets in the frames within a temporal window are plotted in Fig. (a. It is evident that, as the window size grows (i.e., the temporal locality is weakened, the correlation decreases sharply. Meanwhile, the average correlations in all these frames between the target and the background away from the target within a certain radius are plotted in Fig. (b. It can be seen that, as the background moves far away from the target (i.e., the spatial locality tends to be not obvious, the correlation curve drops rapidly. As discussed above, the locality in either temporal or spatial dimension is evident. As a result, we are encouraged to exploit the joint locality in both the temporal and spatial dimensions, i.e., the spatial-temporal locality, to further improve tracking performance. The preceding analysis demonstrates that there are strong correlations between the recently obtained targets and the background regions near the targets, leading to a joint locality in both the temporal and spatial dimensions. The temporal locality provides an effective description for the target, and the spatial locality implicitly introduces the discrimination to tracking. For this reason, we propose to construct a tracking model with both descriptive and discriminative capabilities. Furthermore, it has been shown that sparse representation can well reflect the locality structure between the dictionary atoms and the samples being represented [7]. To this end, a discriminative dictionary learning for sparse representation has the desirable property to exploit the spatialtemporal locality in this work: Sparse representation can provide a description from a perspective of locality structure. Dictionary learning can promote the descriptive capability of the sparse representation. Discriminative learning can augment the separability between the target and the background. Additional structures can be embedded into the dictionary to well reflect the spatial-temporal locality. To effectively describe and separate the target from the background, additional structures need to be embedded into the dictionary. As analyzed above, the spatial-temporal locality leads to strong correlations in the recently obtained targets and the background near these targets. Subspace learning is a classical method to deal with correlations. In this work, these targets and background are considered to reside in a low-dimensional subspace. As a result, a subspace structure is embedded into the dictionary, such that the learned dictionary can well represent the spatial-temporal locality. Motivated by the above observations, a novel tracking algorithm is proposed by exploiting the spatial-temporal locality of tracking via structured dictionary learning. Fig. 3 illustrates the flow diagram of our tracking algorithm. The dictionary, used to reflect the spatial-temporal locality, is learned from the recently obtained targets and the background near these targets. To capture the spatial-temporal locality structure, a subspace constraint is imposed on the dictionary atoms via Fig.. Statistics of the temporal and spatial correlations in the 9,486 frames of the 51 video sequences of OTB 013 benchmark [6]. (a Average temporal correlations with respect to different temporal window sizes. (b Average spatial correlations with respect to different spatial radii. low-rank learning. A discriminative constraint is also enforced to the learning process, such that the dictionary can discriminatively encode the sparse codes, leading to linearly separable descriptions for the target and the background. During tracking, a number of target candidates are sampled from the current frame, and then encoded by the learned dictionary. With the output sparse codes, the descriptive scores and the discriminative scores of these target candidates are computed to measure the description and discrimination accuracy, respectively. Finally, the target is determined as the candidate with the largest likelihood that is evaluated according to both the descriptive and the discriminative scores. Different from the previous methods based on the (discriminative dictionary learning [8] [10], which utilize only the temporally obtained targets, our approach exploits the temporally local correlation by analyzing the most recently obtained targets, as shown in Fig. 4(a. Meanwhile, the previous methods use the background samples that are far away from the target locations, while our approach samples the background densely around the target locations to capture the spatially local correlation, as illustrated in Fig. 4(b. As a result, the most distinctive difference from the previous methods based on the dictionary learning [8] [10] is that a subspace structure is embedded into the dictionary to exploit the spatialtemporal locality of the tracking in our approach. Unlike the previous work that also considers temporal and spatial context information [11] [13], our method constructs a representation through the spatial-temporal locality to describe both the target and the background. The work [1] also utilizes the recently obtained targets and the nearby background to learn a subspace. However, the subspace transform in [1] is linear, while the proposed structured sparse coding is nonlinear, which is more powerful to represent the targets and the background. Meanwhile, [1] incorporates the subspace reconstruction with the classifier, where the subspace reconstruction has the same feature dimension with the original samples, i.e., there is no dimension reduction or promotion, leading to the fact that the classifier has more pressure to make the samples linearly separable. In contrast, the spatial-temporal locality is embedded into the dictionary and propagated to the sparse codes that are actually input to the classifier. By the nonlinear coding scheme and the dimension promotion from the dictionary, the classifier performs more stably to linearly separate the targets and

3 184 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 Fig. 3. Flow diagram of our tracking approach. presented in Section III; the experimental results are reported in Section IV; the limitation of the proposed approach is briefly discussed in Section V; conclusion of this paper is drawn in Section VI; and the evaluation algorithms of the dictionary learning and sparse coding are depicted in the Appendix. Fig. 4. Differences between the previous methods and ours. (a temporal difference. (b spatial difference. The targets are marked in solid boxes and the background samples are marked in dashed boxes. In each subfigure, the previous method is illustrated in the left and ours is illustrated in the right. the background, resulting in more discriminative capability. In summary, the novelty and contributions of this work are that 1 embedding the spatial-temporal locality structure into a dictionary learning is first introduced to visual tracking, and the joint learning of the sparsity and this structure is novel in visual tracking; the formulation for the tracking model and its solution is new to the field of visual tracking; and 3 different from previous sparse trackers that aim at a structured representation to improve tracking performance, the proposed approach focuses on learning a structured dictionary that leads to an accurate and robust representation. Extensive experiments are conducted to evaluate the performance of the proposed technique, including various challenging situations, such as abrupt appearance change, heavy occlusion, and cluttered background. Both qualitative and quantitative evaluations demonstrate that our tracking algorithm improves the performance of the baseline methods and outperforms most state-of-the-art approaches. The remainder of this paper is organized as follows: the related work is reviewed in Section II; our approach is II. RELATED WORK Constructing visual trackers via sparse learning has received much attention in recent years. It has been demonstrated that sparse learning is effective to handle partial occlusions [14]. Mei and Ling [4] introduced sparse representation to visual tracking. In their method, the current target is represented by a linear combination of a few target templates and the residual error is used to model the appearance changes, such as occlusion and deformation. This paradigm is the computationally expensive, and subsequently, compressive sensing [15] and joint sparse representation [16] were employed to speed up this paradigm, respectively. Mei et al. [17] analyzed the minimum error bound for the sparse tracker with occlusion detection. In [18], Liu et al. proposed a dictionary learning method, named K-selection, for visual tracking and employed local sparse representation to describe the target. Jia et al. [19] developed a local sparsity based feature to improve tracking performance, and Zhong et al. [0] exploited the sparsity structure of tracking and proposed a collaborative appearance model. Sui and Zhang [1] leveraged sparse learning to construct a Gaussian process model and use regression to conduct tracking. A tracker was introduced by Zhang et al. [] by exploiting global and local sparsity properties of appearance models. Wang et al. [3] proposed a tracking algorithm via joint optimization of optimization and classification. Wang et al. [4] designed a locally weighted distance metric and employed an inverse sparse approach to formulate the tracking problem. Lan et al. [5] leveraged joint sparse representation and a feature-level fusion approach to address tracking. Dictionary learning can significantly improve the representation quality of the sparse codes [6], [7]. A typical method to learn an effective dictionary is embedding some structures that are intrinsically consistent with the inherent

4 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 185 structures in the training data. Discrimination is one of the most widely used structures embedded in the dictionary, such as least squares [8] [30], label consistent [31], and fisher criterion [3]. In visual tracking, many algorithms have been proposed via various dictionary learning methods, but only a few of them embed structures into the dictionary. Bai et al. [33] learned a dictionary for sparse codes of the target and the background and adopted a support vector machine to separate them. Xing et al. [8] proposed an online multi-lifespan dictionary learning methods for visual tracking. Wang et al. [9] injected the non-negativity property into the dictionary and developed an online learning method for tracking. Yang et al. [10] designed an online discriminative dictionary learning method for visual tracking. Yang et al. [34] gathered the appearance information from two collaborative feature sets by exploiting the geometric structures through dictionary learning. Xie et al. [35] proposed a discriminative object tracking approach via a sparse coding based online dictionary learning. Fan and Xiang [36] adopted a multi-task joint dictionary learning for visual tracking. Subspace learning is a simple but effective method in visual tracking. In this paradigm, the temporally obtained targets are assumed to reside in a low-dimensional subspace. Ross et al. [3] introduced an incrementally subspace learning to visual tracking. It has been demonstrated that subspace learning is effective to illumination variations and pose changes [37], [38]. However, subspace learning is very sensitive to occlusions. For this reason, many subspace learning methods [1], [39] [44] leveraged sparse representation to enhance the stability in the case of occlusions. The basic idea of such methods is employing subspace learning to describe the target and utilizing the sparse representation to absorb the occlusions. Sui et al. [1], [13] proposed a discriminative subspace learning method to describe the target and the surrounding background and a sparse error term is used to deal with the occlusion. Wang et al. [4] proposed an approach by assuming the temporally obtained targets yield a Gaussian distribution (subspace prior and the occlusions follow a Laplace distribution (sparsity prior. Extensive tracking algorithms, beyond sparse learning and subspace learning, are proposed in recent years, achieving impressive tracking performance. Some representative approaches include structural discriminative learning [45], [46], correlation filtering [47] [51], deep learning and deep features [5] [56], etc. Readers can refer to [1], [] for a thorough reviews of the state-of-the-art tracking methods. III. PROPOSED APPROACH During tracking, we use a rectangle box to mark the target. In each frame, a number of regions of interest (ROI are generated and used as the target candidates. We evaluate the likelihood of these candidates to be the target. The candidate with the highest likelihood is determined as the target. Specifically, each candidate is associate with a motion state variable s = x, y,σ}, (1 where x and y denote the D position of a candidate, and σ denotes the scale coefficient. The motion state variables are estimated by a Bayesian inference framework, as discussed in Sec. III-C. Given a candidate, we crop it out of the frame according to its motion state variable. All candidates are resized to the same size and stacked into column vectors, respectively. The likelihood of the candidates is evaluated over the corresponding column vectors. A. Structured Dictionary Learning Given a training sample set, denoted by X, the standard sparse coding used dictionary D can be learned from min D,z k (x k Dz k x k X s.t. z k 0 z 0, ( where ( is an empirical loss function, z k denotes the sparse codes of x k, 0 counts the nonzeros of a vector, and z 0 is a predefined non-negative integer that controls the sparsity of the codes z k. Because of the non-convexity of 0,and considering the computational efficiency, the convex conjugate 1 is usually used to approximate 0 in practice. To discriminatively encode the sparse codes, a classifier needs to be trained simultaneously. Assuming that each training sample x k is assigned to a binary class labels y k +1, 1} 1, the discriminative dictionary is learned from min D,z k,w (x k Dz k + λ δ (y k w T z k x k X x k X s.t. z k 0 z 0, (3 where δ ( is the classification loss function, w denotes the linear classifier,andλ is a weight parameter. As discussed above, the spatial-temporal locality is exploited by learning a dictionary with the subspace structure. We formulate the learning as (x k Dz k + λ min D,z k,w x k X z k s.t. 0 z 0 rank(d r 0, x k X δ (y k w T z k where rank( computes the rank value of a matrix, and the non-negative integer r 0 controls the rank of D. Note that the minimization on rank( function is non-convex and has been demonstrated to be NP-hard. In practice, the convex conjugate, called the nuclear norm, is usually adopted to efficiently approximate the rank( function. In this work, to emphasize the spatial-temporal locality, the training samples consist of the recently obtained targets and the nearby background samples around the latest target location. Considering the abrupt appearance changes of the target during tracking, e.g., in the case of occlusion, we leverage 1 -loss for the empirical loss function (, to allow for extremely large fitting errors. Meanwhile, we employ 1 Because there are only two categories that need to be considered, i.e., the target and the background, we discuss the problem only within binary classification. The bias unit of the linear classifier is truncated in this work. (4

5 186 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 -loss (squared loss for the classification loss function δ ( to obtain a stable classification performance. As a result, the dictionary in this work is learned from min X DZ 1 + λ y w T Z D,Z,w F s.t. Z0, z 0 rank(d r 0, where each column of the matrix X = [x 1, x,...] denotes a training sample, the matrix Z denotes the sparse codes of X, the row vector y = [y 1, y,...] consists of the class labels of the training samples, 1 stands for the sum of absolute values of all the elements of an input matrix, F is the Frobeniusnorm, and 0, returns the maximum number of nonzeros in each column of an input matrix. The above problem can be solved using augmented Lagrange multiplier method; and the computation details are provided in Appendix. Note that the low-rank structure imposed on the dictionary D is one of the key differences of our approach from previous methods. First, from the dictionary learning point of view, the dictionary is expected to reflect the intrinsic structure among the data samples. As analyzed above, the data samples have an apparent low-dimensional subspace structure. Thus, we impose a low-rank constraint on the dictionary to exploit the subspace structure, leading to a better representation capability to the data samples. Moreover, from the coding perspective, the sparse coding benefits from the redundancy of the dictionary. In the proposed model, the dictionary is jointly learned with a linear classifier. Its redundancy leads to more sparser representation with higher non-linear encoding scheme, resulting in better linear separability. Note that the joint learning framework can make a good trade-off between the dictionary redundancy (low-rank structure and the linear separability (discrimination. B. Target Localization Concerning the learned dictionary D, given a candidate, denoted by c, the sparse codes z is obtained from 3 (5 min z c Dz 1 s.t. z 0 z 0. (6 We can evaluate the descriptive performance on the candidate c via the fitting error c Dz 1. The smaller the error, the higher the descriptive performance. Within this method, a candidate more likely to be the target is expected to have smaller error, i.e., it is better described by the learned dictionary, as illustrated in Fig. 5. As a result, we consider the fitting error as a criterion of target localization, called the descriptive score, whichisdefinedas } ε (c = exp 1lε c Dz 1, (7 where l ε denotes the scale of the exponential function. Meanwhile, with the learned linear classifier w, the candidate c can be classified as either the target or the background via its sparse codes z. In this work, instead of the class label, 3 The solution of this problem is also presented in Appendix. Fig. 5. Illustration of descriptive performance on four different candidates. One candidate more likely to be the target (in solid box and three candidates (less likely to be the target with translations and scaling (in dashed boxes are marked in the left image. Their fitting errors are shown in the right image. Fig. 6. Illustration of discriminative scores of different candidates. From left to right, the classifier responses of the 6 candidates are respectively 1.0, 0.6, 0.1, 0.1, 0., 0.9. we evaluate the discriminative score of a candidate, which is defined as ϕ (c = exp 1 } 1 w T z, (8 l ϕ where l ϕ denotes the scale of the exponential function. The above definition indicates that a candidate more likely to be the target is expected to have the classifier response much closer to the positive class label (the target label +1. Fig. 6 illustrates the discriminative scores of several candidates with different likelihood to be the target. It is evident that the candidates more likely to be the target are assigned to larger discriminative scores from the classifier responses. The above discussions encourage us to consider both the descriptive score and the discriminative score when localizing the target. Consequently, we define a criterion L (c = ε (c ϕ τ (c, (9 to evaluate the likelihood of a candidate c to be the target, where τ is a weight parameter that balances the importance of the descriptive and the discriminative scores. Note that τ needs to be finely tuned with respect to different candidates in each frame. To circumvent tuning this parameter, we modify the above criterion as the following normalized one L (c = exp 1 } l (ε c + ϕ c, (10 where ε c and ϕ c are the normalized scores over all the candidates in one frame, defined as ε c = κ c Dz 1, C}, (11 } 1 ϕ c = κ w T z, C, (1 where C denotes the set of all candidates in one frame, and the function f (α min α f (α κ f (α,} = (13 max α f (α min α f (α normalizes the values of f (α for all α in the range of [0, 1]. In Eq. (10, the parameter l > 0 denotes the

6 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 187 Fig. 7. Demonstration of the likelihood evaluation. (a A frame where the ground truth target and the search area are marked in solid and dashed boxes, respectively. (b, (c, and (d Likelihood of the candidates centered at every pixel within the search area, which is evaluated by using both the descriptive and the discriminative scores, the descriptive score only, and the discriminative score only, respectively. The largest likelihood is expected to appear at the center of each plot. scale of the exponential function. It is employed to enlarge the differences exponentially between the likelihood of the candidates. Obviously, in the step of target localization, it is trivial because only the relative values of the likelihood are referred to. However, it is responsible to the resampling of the tracking framework that relies on a Bayesian inference model, which will be presented in the next subsection. In all the experiments reported in this work, we set l = 10 4 according to the empirical results. Below is an explanation on the likelihood function defined in Eq. (10. As shown in Fig. 7, where a closer look at the target localization in one frame is presented in the case of occlusions. We densely generate the candidates around the ground truth target, whose center locations are at every pixel within the search region marked by the dashed box. We calculate and plot the likelihood of these candidates according to both the descriptive and the discriminative scores, the descriptive score only, and the discriminative score only, respectively. The largest likelihood is expected to appear at the center of each plot, i.e., overlap with the ground truth target. Note that the 1 -norm on the representation, as shown in Eqs. (5 and (6, can effectively handle the occlusions by compensating a large error. It indicates that the three criteria of the likelihood are robust against the occlusions in this example. However, it is evident that only the combination of both the descriptive and the discriminative scores obtains reasonable result in this case, as shown in Fig. 7(b. Using the spatial-temporal locality, the dictionary and the classifier are trained over the recently obtained targets and the nearby background regions. For this reason, both the target and nearby background can be described accurately by the dictionary. This leads to higher descriptive scores for the target and nearby background, as shown in Fig. 7(c. It indicates that the descriptive score cannot recognize the target from its nearby regions. However, it can exclude the distant regions because the dictionary describes the distant regions poorly. Meanwhile, although the classifier performs unstably in the distant regions due to lack of training samples in those regions, it can accurately distinguish the target from the nearby background regions according to the discriminative scores, as shown in Fig. 7(d. As a result, the likelihood defined in Eq. (10 penalizes the distant background regions and enhances the target region by integrating both the descriptive and the discriminative scores. C. Tracking Framework In the j-th frame, given the previously obtained targets t 1: j 1, the motion state of the j-th target, denoted by s j,is estimated by maximizing the posterior distribution p ( s j t 1: j 1 = p ( ( s j s j 1 p s j 1 t 1: j 1 ds j 1, (14 where p ( s j s j 1 denotes the motion model, which controls the candidate generation. Then, a candidate c is observed and the distribution is updated by p ( s j c, t 1: j 1 = p ( c s j p ( s j t 1: j 1 p ( c t 1: j 1, (15 where p ( c s j denotes the observation model. The target in the j-th frame, denoted by t j, is found by t j = arg max c C p ( s j c, t 1: j 1. (16 In this work, the motion model p ( s j s j 1 is defined by a Gaussian distribution N ( s j 1,, where the covariance matrix is diagonal, whose diagonal elements denote the variances of the x- andy-translations and scaling. The observation model p ( c s j is defined to be proportional to the likelihood defined in Eq. (10. The covariance matrix is set to = diag3, 3, 0.05} in this work. We choose relatively small values for these three parameters, considering the trade-off between the tracking accuracy and the running speed. On one hand, the Baysian inference framework is used as the sampling strategy that can speedup the tracking by conducting a local search with different possibilities. Note that smaller values of these three parameters indicate smaller search area in the sampling. It means that fewer particles (candidates are needed to cover the smaller search area. Considering the speed of the tracking algorithm is linearly proportional to the number of particles, we prefer fewer particles, i.e., smaller values of the three parameters. On the other hand, high tracking accuracy relies on a large search area for the candidates, especially for the case of fast motion and significant scale variation. It requires the x-, y-translations and the scale coefficient cover a wide range for the candidates search, leading to large values of these three parameters. Specifically, for example, the x- and y-translations are set to 3 pixels in this work. It indicates that the tracker covers a search area with a radius of 6 pixels.

7 188 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 Fig. 8. Tracking performance of the proposed tracker and the top 5 trackers in [6] over the OTB 013 benchmark. Fig. 9. Tracking performance of the proposed tracker and 7 other lowrank and/or sparse learning based state-of-the-art trackers over the OTB 013 benchmark. This is because 95% samples locate within the range of 3 times standard deviation. In summary, as a trade-off between the tracking accuracy and the running speed, we select 3, 3, 0.05} for the three parameters of the covariance matrix. IV. E XPERIMENTS The proposed tracking algorithm is implemented in Matlab R015a on a PC with an Intel Core.6 GHz processor. The pixels in each frame are converted to grayscale and normalized to [0, 1]. 400 candidates are generated in each frame. The ROIs of the candidates are resized to 0 0 pixels. The number of the dictionary atoms is set to 0, i.e., the dictionary D R400 0 since the samples are 400-dimensional (0 0 pixels. In Eq. (7 and (8, we set lε = 1 and lϕ = 1. The dictionary is learned at every 5 frames considering the efficiency of the tracking algorithm. In the dictionary learning, we use the 50 most recently obtained targets as the positive samples, and 56 background regions as the negative samples, which locate at the regions of d pixels away from the latest target location, where d denotes the x- and/or y-translations with ρ times width and/or height of the latest target, and ρ 0.1, 0.15, 0., 0.5, 0.3, 0.35, 0.4}. We evaluated our tracker on three popular benchmarks, the OTB 013 [6], the OTB 015 [57], and the VOT 014, which consist of 51, 100 and 5 video sequences including various challenges, respectively, such as illumination change, occlusion, background clutter, in-plane/out-of-plane rotation, and scale variation. To thorough evaluate the proposed tracker, we compare it to other 7 state-of-the-art trackers. Two criteria are employed in the evaluations: precision and success rate. The precision is defined as the percentage of frames where the center location errors between the tracking results and the ground truths are less than a threshold that normally varies within the range [0, 50]. The success rate is defined as the percentage of frames where the overlap rate between the tracking results and the ground truths are less than a threshold from 0 to 1. A. Evaluations on the Benchmarks We extensively compared our tracker with most state-ofthe-art trackers on the OTB 013, the OTB 015, and the VOT 014 benchmarks. To thoroughly evaluate the proposed tracker, we group the comparisons into 5 categories, as shown in Figs In the first comparison, the baseline methods are the top 5 trackers over the OTB 013 benchmark according to Fig. 10. Tracking performance of the proposed tracker and 7 other state-ofthe-art trackers over the OTB 013 benchmark. Fig. 11. Tracking performance of the proposed tracker and 8 other state-ofthe-art trackers over the OTB 015 benchmark. Fig. 1. Tracking performance of the proposed tracker and 11 other stateof-the-art trackers over the VOT 014 benchmark. Wu et al. s evaluation [6], including Struck [45], SCM [0], TLD [46], ASLA [19], and CXT [58]. The results are shown in Fig. 8. It can be seen that our tracker outperforms the top 5 trackers with an increase of 6% in precision and 5% in success rates. Moreover, in order to demonstrate the superior performance improvement over the baseline methods, we compared our tracker with 7 other popular low-rank and/or sparse learning based trackers over the OTB 013 benchmark, including LLR [43], L1APG [59], MTT [16], LRST [60], [61], CT [6], SSL [1], and PCOM [63], [64]. The results are shown in Fig. 9. It is evident that our tracker achieves the best result in terms of both success rate and precision, leading

8 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 189 Fig. 13. Tracking performance of our tracker and 3 other state-of-the-art trackers in different challenging cases on the OTB 013 benchmark. (a occlusion (31 sequences; (b illumination change (5 sequences; (c non-rigid deformation (19 sequences; (d out-of-plane rotation (39 sequences; (e background clutter ( sequences; and (f scale variation (8 sequences. to significantly improved performance against the 7 baselines methods. Furthermore, we compared our tracker with additional 7 state-of-the-art trackers over the OTB 013 benchmark, including DSL [1], STC [11], CNT [65], KCF [48], MEEM [66], and FCNT [53]. These competing trackers leverages state-ofthe-art models and algorithms to address the tracking problem, such as correlation filtering and deep learning. The comparison results are shown in Fig. 10. It is evident that the proposed tracker obtains competitive performance compared to the 7 counterparts. In particular, the proposed tracker achieves the second best performance in terms of success rate. In addition, we compared our tracker with 8 other stateof-the-art trackers over the OTB 015 benchmark, including PST [51], HCFT [5], KCF [48], Struck [45], SCM [0], ASLA [19], CSK [47] and L1APG [59]. These trackers are respectively related to the state-of-the-art tracking models including deep learning, correlation filtering, structural learning, and sparse learning. From the comparison results shown in Fig. 11, it can be seen that the proposed tracker obtains comparable results against the 8 state-of-the-art trackers. We also evaluate the proposed tracker on the VOT 014 benchmark. 11 state-of-the-art trackers are employed as the baseline trackers, including Struck [45], SCM [0], ASLA [19], IVT [3], MTT [16], LRST [60], DPCA [67], CT [6], PCOM [63], [64], ODFS [68], and LSST [68]. The evaluate results are shown in Fig. 1, from which it is evident that the proposed tracker outperforms the 11 baseline trackers in this benchmark in terms of both precision and success rate. B. Evaluations in Challenging Situations We thoroughly evaluated our tracker in various challenging situations, such as illumination change, occlusion, non-rigid deformation, and background clutter. Also, the LLR tracker, the DSL tracker, and the STC tracker are used as the competing trackers. The results are shown in Fig Occlusion: In this situation, the target is occluded by other similar or dissimilar objects, leading to abrupt changes in the target appearance. Our tracker obtains good tracking results in this scenario. This is attributed to: 1 the target description is robust to the occlusions due to the 1 -loss used in the dictionary learning, where the occlusions are penalized by the sparse but large residual errors; and the discrimination of the sparse codes is able to separate the target from the nearby background regions. Illumination Change: In this case, the appearance of the target significantly changes due to variations of lighting condition in the scenarios. It can be seen from the results that, owing to the descriptive capability of the sparse codes, our tracker is insensitive to illumination changes. In fact, the sparse learning method can be regarded as a non-linear redundant

9 190 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 subspace approach, where the atoms (i.e., the column vector of the dictionary are considered as over complete (i.e., nonorthogonal basis vector of the subspace, and the sparse codes are viewed as the coordinates in the subspace. It has been demonstrated that the subspace method is effective to illumination change [37], [38]. For this reason, from the subspace perspective, it explains why our method performs better in this case. 3 Non-Rigid Deformation: The non-rigid deformation usually leads to local changes in the target appearance. The 1 -loss used in the dictionary learning is robust to these local changes. For this reason, our tracker performs very well in this situation. Moreover, from the subspace point of view, our method is insensitive to pose changes (one case of nonrigid deformation, because it has been shown that subspace method can successfully deal with pose changes in visual tracking [37], [38]. 4 Out-of-Plane Rotation: The motions from either the target or the camera can result in the out-of-plane rotations. This challenge leads to significant changes in the target appearance. It is evident from the results that our tracker obtains much better results in this situation, because our method can effectively distinguish the target from the nearby background regions. 5 Background Clutter: In this scenario, the cluttered background distracts the tracker severely, and leads to tracking drift. It can be seen from the results that our tracker performs the best in this case. This is also attributed to the good discriminative capability of our method, where the target and the background can be accurately separated. 6 Scale Variation: In this situation, the scale of the target appearance varies in successive frames during tracking. It is evident from the results that our tracker obtains better results in the case of scale variation. This is attributed to 1 the higher accuracy of the target description through the dictionary learning, and the fact that we consider the scale variation in successive frames in the motion model of our tracking framework through the scale coefficient σ in Eq. (1. C. Effectiveness of the Spatial-Temporal Locality To verify the effectiveness of the spatial-temporal locality to the tracking performance, we conducted the following experiments on the OTB 015 and the VOT 014 benchmarks. We implement two trackers. One, denoted by the STL, exploits the spatial-temporal locality, and the other, denoted by the NSTL, does not exploit the spatial-temporal locality. The STL tracker learns the dictionary over the recently obtained targets and the nearby background, while the NSTL tracker learns the dictionary over all the previously obtained targets, as well as both the distant and the nearby background, while the lowrank constraint is removed. Except for the difference in the usage of the spatial-temporal locality, the other parts of the two trackers are identical. We evaluated the two trackers over the OTB 015 and the VOT 014 benchmarks, and the results are shown in Figs. 14 and 15. It is evident from the results that the STL tracker significantly outperforms the NSTL tracker on both the OTB 015 and the VOT 014 benchmarks. Fig. 14. Tracking performance of the trackers that exploits (STL and does not exploit (NSTL the temporal-spatial locality, respectively, on the OTB 015 benchmark. Fig. 15. Tracking performance of the trackers that exploits (STL and does not exploit (NSTL the temporal-spatial locality, respectively, on the VOT 014 benchmark. It indicates that exploiting the spatial-temporal locality of tracking can significantly improve the tracking performance. Without the spatial-temporal locality, the NSTL tracker needs a large number of training samples to compensate the diversity of the background samples and obtain a good discrimination for the linear classifier (perhaps a nonlinear classifier is required. In contrast, the STL tracker successfully fits the inherent structure among the training samples by exploiting the spatial-temporal locality, leading to a classifier with more discriminative capability. D. Effectiveness of the Structured Dictionary Learning As shown above, exploiting the spatial-temporal locality can significantly improve the tracking performance. Therefore, it is reasonable to investigate how different implementations of the spatial-temporal locality influence the tracker performance. Our approach implements the spatial-temporal locality via the discriminative dictionary learning with a subspace structure. We construct three other trackers, which implement the spatial-temporal locality via the random dictionary [43], templates based dictionary construction [59], and the K-SVD dictionary learning [69], respectively. We use 0 atoms for our dictionary, the random dictionary, and the K-SVD dictionary. Referring to [59], we use 10 atoms for the templates dictionary. We investigate the tracking performance on the OTB 015 and the VOT 014 benchmarks. The results are shown in Figs. 16 and 17. It is evident from the result that our approach yields a more accurate target localization result than its three peers. As a result, we recommend to exploit the spatial-temporal locality of tracking via the discriminative dictionary learning with a subspace structure. Furthermore, we investigated the influence of different numbers of the dictionary atoms on the tracking performance. We select four dictionaries with 5, 10, 0, 30, and 50 atoms

10 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 191 Fig. 16. Tracking performance of the four implementations of the temporalspatial locality on the OTB 015 benchmark. Fig. 0. Tracking performance of the trackers that leverage dictionary learning based sparse representation (DLSR, linear classification (LC, and both (Ours, respectively, on the OTB 015 benchmark. Fig. 17. Tracking performance of the four implementations of the temporalspatial locality on the VOT 014 benchmark. Fig. 1. Tracking performance of the trackers that leverage dictionary learning based sparse representation (DLSR, linear classification (LC, and both (Ours, respectively, on the VOT 014 benchmark. TABLE I RUNNING SPEED IN FPS (FRAMES PER SECOND OF THE PROPOSED AND SOME SPARSE TRACKERS Fig. 18. Tracking performance with respect to different numbers d of the dictionary atoms on the OTB 015 benchmark. to the tracking performance, we construct two other trackers that leverage dictionary learning based sparse representation (DLSR and linear classification (LC, respectively. Figs. 0 and 1 show the tracking performance of the two trackers on the OTB 015 and the VOT 014 benchmarks. It is evident that jointly exploiting the representation and the discrimination yields better performance than exploiting either of them independently on the two benchmarks. Fig. 19. Tracking performance with respect to different numbers d of the dictionary atoms on the VOT 014 benchmark. and evaluate the corresponding trackers on the OTB 015 and the VOT 014 benchmarks. The results are shown in Figs. 18 and 19. It is evident that the tracker with 0 dictionary atoms obtains the best performance. The dictionary with more atoms has weaker representation capability, while the dictionary with fewer atoms lacks of the redundancy required by the sparse coding. As a result, we empirically set the number of the dictionary atom to 0. E. Representation v.s Discrimination To investigate the contributions of the representation (through the dictionary learning based sparse representation and the discrimination (via the linear classification F. Complexity and Running Speed The proposed approach relies on an iterative algorithm. In each iteration, the SVD and the shrinkage operations are involved. Note that the SVD yields a cubic order complexity while the shrinkage has a linear order computational cost. As a result, the SVD dominates the computational complexity in the iterations. Specifically, the SVD imposes on the dictionary that is a matrix of size m d. The computational complexity of the proposed algorithm thus yields O ( md. We compared the running speed of ours and 6 other stateof-the-art sparse trackers. Table. I shows the running speed in fps (frames per second of the proposed and the other 6 sparse trackers. It is evident that although our tracker does not run in real-time, it still runs much faster than the other 5 sparse trackers, and slightly more slowly than the DSL tracker. Note that the proposed tracker is eligible to speed up in an easy way. The proposed tracking model is driven by the

11 19 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 Bayesian inference framework. The running speed is linearly proportional to the number of candidates. Note that the candidates are independently evaluated for the target localization. It indicates that these evaluations can be conducted in parallel. In that case, the proposed tracker can be sped up significantly by employing concurrent computation methods, such as multithread/process implementation and distributed computing system. V. LIMITATION Any model has its limitation for various data, applications, and systems. In this work, the Bayesian inference method is adopted to formulate the motion model of the proposed tracker, where the motion transition of the target between two consecutive frames is considered as a Gaussian distribution with a zero mean and a diagonal covariance matrix. Two of the diagonal elements control the translations of the target in the x- and y-directions, respectively. Note that, for a Gaussian distribution, 95% samples are drawn from the range within three standard deviations. We set the variances of the translations (i.e., two of the diagonal elements to 3. It indicates that we have a very high possibility to cover the target in the next frame if the target does not move faster than 6 or more pixels in that frame; otherwise, we might lose the target. We could set the variances of the translations to a larger value, such that we have a wider range to cover the target in the next frame. However, we have to increase the number of the candidates at the same time to ensure the adequate sampling density within this range. As a result, the tracker will be slowed down as the running speed is proportional to the number of the candidates. Thus, a trade-off between the tracking accuracy and the running speed has to made to configure our tracker as presented above. VI. CONCLUSION We have proposed a powerful structure for visual tracking, the spatial-temporal locality, from the recently obtained targets and the nearby background regions. We have exploited the spatial-temporal locality of tracking via the structured dictionary learning. A large number of experiments have been conducted, and the experimental results have shown that 1 the spatial-temporal locality can significantly improve the tracking performance; and the discriminative dictionary learning with a subspace structure is recommended to exploit the spatial-temporal locality of tracking. Both the qualitative and quantitative evaluations on various challenging video sequences have demonstrated that our tracker outperforms most other state-of-the-art trackers. APPENDIX A ALGORITHM OF THE DICTIONARY LEARNING The problem of the dictionary learning shown in Eq. (5 of the manuscript is min X DZ 1 + λ y w T Z D,Z,w F Z s.t. 0, z 0 rank(d r 0. To efficiently solve it, we further relax the constraint on Z to Z 1 nz 0,wheren denotes the column number of Z. Then the minimization can be written as min Z 1 + α E 1 + βrank(d D,Z,w X DZ + E s.t. y w T Z, (17 where a relax variable E is introduced to implement the 1 -loss, and α and β are the weight parameters. Note that Z and D are coupled in the multiplications in the constraints. To solve the above problem, another two relax variables Z and D are introduced, and Z and D are re-written as Z 1 and D 1, respectively. In addition, considering the stability of the linear classifier and to avoid overfitting, a regularization on w is enforced. As a result, the above problem is approximated by min Z α E 1 + β D 1 + γ w D 1,D,Z 1,Z,E,w X D Z + E D 1 = D s.t. y w T (18 Z Z 1 = Z, where 0 -norm and rank( function are approximated by the convex conjugates 1 -norm and nuclear norm, respectively, and γ is a weight parameter. Note that there are six variables in the above problem. Although the problem is not convex with respect to all six variables, it is convex with respect to any of the six variables. For this reason, an iterative algorithm is developed. In each iteration, we alternately solve one variable while fixing the other five variables. We use the augmented Lagrange multiplier (ALM method to solve the above problem. The ALM formulation of the problem is min Z α E 1 + β D 1 + γ w D 1,D,Z 1,Z,E,w + Y 1, X D Z E + Y, y w T Z + Y 3, Z Z 1 + Y 4, D D 1 + μ ( y w T Z + X D Z E F + Z Z 1 F + D D 1 F (19 where Y 1, Y, Y 3,andY 4 denote the Lagrange multipliers, μ is a penalty parameter, and, denotes the inner product. In this work, we empirically set α = β = γ = 1andμ = 10. Compute D 1. The problem with respect to D 1 is reduced to min β D 1 + Y 4, D D 1 + μ D 1 D D 1 F. (0 By completing squares and using the singular value thresholding algorithm [70], D 1 can be obtained from D 1 = US β/μ (S V T, (1 where [ U, S, V T ] = svd (D + Y 4 /μ, ands θ ( denotes the shrinkage function, defined as S θ (x = sign (x max (0, x θ, θ > 0. (

12 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 193 Compute D. The problem with respect to D is reduced to min Y 1, X D Z E + Y 4, D D 1 D + μ ( X D Z E F + D D 1 F, (3 which is a least squares problem with a closed-form solution [ ]( 1 D = (X E + Y 1 /μ Z T + D 1 Y 4 /μ Z Z T + I, (4 where I denotes the identity matrix. Compute Z 1. The problem with respect to Z 1 is reduced to min Z Y 3, Z Z 1 + μ Z 1 Z Z 1 F. (5 By completing squares and using the iterative shrinkage algorithm [71], Z 1 can be obtained from Z 1 = S 1/μ (Z + Y 3 /μ. (6 Compute Z. The problem with respect to Z is reduced to min Y 1, X D Z E + Y, y w T Z Z + Y 3, Z Z 1 + μ ( y w T Z + X D Z E F + Z Z 1 F, (7 which is a least squares problem, and it has a closed-form solution ( 1 Z = D T D + ww T + I [Z1 Y 3 /μ ] +D T (X E + Y 1/μ + w (y + Y /μ, (8 where I denotes the identity matrix. Compute E. The problem with respect to E is reduced to min α E 1 + Y 1, X D Z E + μ E X D Z E F. (9 By completing squares and using the iterative shrinkage algorithm [71], E can be obtained from E = S α/μ (X D Z + Y 1 /μ. (30 Compute w. The problem with respect to w is reduced to min γ w w + Y, y w T Z + μ y w T Z, (31 which is a least squares problem, with the following closedform solution w = ( γ μ I + Z Z T 1 Z (y + Y /μ T, (3 where I denotes the identity matrix. The solutions of the six variables are obtained when the iterative algorithm converges. The convergence criterion achieved by evaluating X D Z E + y w T Z + D D 1 + Z Z 1. When the difference between two consecutive iterations is smaller than a threshold, e.g., 10 8, the iterative algorithm stops. APPENDIX B ALGORITHM OF THE SPARSE CODING The sparse coding problem shown in Eq. (6 of the manuscript is which can be written as min z c Dz 1 s.t. z 0 z 0 min z z 1 + λ e 1 s.t. c Dz + e, (33 where a relax variable e is introduced, λ is a weight parameter, and the 0 -norm of z is approximated by its convex conjugate 1 -norm. Note that the above problem is not convex with respect to both z and e, however, it is convex with respect to either z or e. Furthermore, because z is coupled in the multiplication in the constraint, another relax variable z is introduced, and z is re-written as z 1. As a result, the ALM formulation of the above problem is formulated as min z 1 z 1,z,e 1 + λ e 1 + Y 1, c Dz e + Y, z z 1 + μ ( c Dz e + z z 1, (34 where Y 1 and Y denote the Lagrange multipliers. An iterative algorithm can be derived to solve z 1, z and e. In each iteration, we alternately solve one variable while fixing the others. Compute z 1. The problem with respect to z 1 is reduced to min z Y, z z 1 + μ z 1 z z 1. (35 By completing squares and using the iterative shrinkage algorithm [71], z 1 can be obtained from z 1 = S 1/μ (z + Y /μ. (36 Compute z. The problem with respect to z is reduced to min Y 1, c Dz e + Y, z z 1 z + μ ( c Dz e z 1, (37 which is a least squares problem. It has a closed-form solution ( 1 [ ] z = D T D + I z 1 Y /μ + D T (c e + Y 1 /μ, (38 where I denotes the identity matrix. Compute e. The problem with respect to e is reduced to min λ e 1 + Y 1, c Dz e + μ e c Dz e. (39 By completing squares and using the iterative shrinkage algorithm [71], e can be obtained from e = S λ/μ (c Dz + Y 1 /μ. (40 The convergence is determined from the difference between two consecutive iterations on the value c Dz e + z z 1.

13 194 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 REFERENCES [1] A. Yilmaz, O. Javed, and M. Shah, Object tracking: A survey, ACM Comput. Surveys, vol. 38, no. 4, pp , Dec [Online]. Available: [] A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, Visual tracking: An experimental survey, IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp , Jul [Online]. Available: gov/pubmed/ [3] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, Incremental learning for robust visual tracking, Int. J. Comput. Vis., vol. 77, nos. 1 3, pp , Aug [Online]. Available: com/index/ /s [4] X. Mei and H. Ling, Robust visual tracking using 1 minimization, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Sep. 009, pp [Online]. Available: org/xpls/abs_all.jsp?arnumber=54599 [5] X. Li, A. Dick, C. Shen, A. van den Hengel, and H. Wang, Incremental learning of 3D-DCT compact representations for robust visual tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp , Apr [Online]. Available: org/xpls/abs all.jsp?arnumber= [6] Y. Wu, J. Lim, and M.-H. Yang, Online object tracking: A benchmark, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 013, pp [Online]. Available: [7] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, Sparse representation for computer vision and pattern recognition, Proc. IEEE, vol. 98, no. 6, pp , Jun [Online]. Available: all.jsp?arnumber= [8] J. Xing, J. Gao, B. Li, W. Hu, and S. Yan, Robust object tracking with online multi-lifespan dictionary learning, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Dec. 013, pp [Online]. Available: [9] N. Wang, J. Wang, and D. Yeung, Online robust non-negative dictionary learning for visual tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Sep. 013, pp [Online]. Available: [10] F. Yang, Z. Jiang, and L. S. Davis, Online discriminative dictionary learning for visual tracking, in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV, Mar. 014, pp [11] K. Zhang, L. Zhang, Q. Liu, D. Zhang, and M.-H. Yang, Fast visual tracking via dense spatio-temporal context learning, in Proc. Eur. Conf. Comput. Vis. (ECCV, 014, pp [1] Y. Sui, Y. Tang, and L. Zhang, Discriminative low-rank tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Dec. 015, pp [13] Y. Sui, Y. Tang, L. Zhang, and G. Wang, Visual tracking via subspace learning: A discriminative approach, Int. J. Comput. Vis., pp. 1, 017. [online]: z, doi: [14] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no., pp. 10 7, Feb [Online]. Available: all.jsp?arnumber= [15] H. Li, C. Shen, and Q. Shi, Real-time visual tracking using compressive sensing, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 011, pp [Online]. Available: [16] T. Zhang, B. Ghanem, and S. Liu, Robust visual tracking via multitask sparse learning, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 01, pp [Online]. Available: [17] X. Mei, H. Ling, and Y. Wu, Minimum error bounded efficient 1 tracker with occlusion detection, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 011, pp [Online]. Available: org/xpls/abs_all.jsp?arnumber= [18] B. Liu, J. Huang, L. Yang, and C. Kulikowsk, Robust tracking using local sparse appearance model and K-selection, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 011, pp [Online]. Available: epic03/wrapper.htm?arnumber= [19] X. Jia, H. Lu, and M.-H. Yang, Visual tracking via adaptive structural local sparse appearance model, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 01, pp [Online]. Available: ucmerced.edu/mhyang/papers/cvpr1c.pdf [0] W. Zhong, H. Lu, and M.-H. Yang, Robust object tracking via sparsity-based collaborative model, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 01, pp [Online]. Available: cvpr1_scm/cvpr1_scm.files/cvpr1_wei.pdf [1] Y. Sui and L. Zhang, Visual tracking via locally structured Gaussian process regression, IEEE Signal Process. Lett., vol., no. 9, pp , Sep [] T. Zhang et al., Structural sparse tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 015, pp [3] Q. Wang, F. Chen, W. Xu, and M.-H. Yang, Object tracking with joint optimization of representation and classification, IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 4, pp , Apr [4] D. Wang, H. Lu, Z. Xiao, and M.-H. Yang, Inverse sparse tracker with a locally weighted distance metric, IEEE Trans. Image Process., vol. 4, no. 9, pp , Sep [5] X. Lan, A. J. Ma, P. C. Yuen, and R. Chellappa, Joint sparse representation and robust feature-level fusion for multi-cue visual tracking, IEEE Trans. Image Process., vol. 4, no. 1, pp , Dec [6] Y. Cong, J. Liu, G. Sun, Q. You, Y. Li, and J. Luo, Adaptive greedy dictionary selection for Web media summarization, IEEE Trans. Image Process., vol. 6, no. 1, pp , Jan [7] X. Liu, D. Zhai, J. Zhou, S. Wang, D. Zhao, and H. Guo, Sparsitybased image error concealment via adaptive dual dictionary learning and regularization, IEEE Trans. Image Process., vol. 6, no., pp , Feb [8] D.-S. Pham and S. Venkatesh, Joint learning and dictionary construction for pattern recognition, in Proc. IEEE Conf. CVPR, Anchorage, AK, USA, Jun. 008, pp [Online]. Available: org/lpdocs/epic03/wrapper.htm?arnumber= [9] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Discriminative learned dictionaries for local image analysis, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 008, pp [Online]. Available: org/xpls/abs_all.jsp?arnumber= [30] Q. Zhang and B. Li, Discriminative K-SVD for dictionary learning in face recognition, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 010, pp [Online]. Available: [31] Z. Jiang, Z. Lin, and L. S. Davis, Learning a discriminative dictionary for sparse coding via label consistent K-SVD, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 011, pp [Online]. Available: org/xpls/abs_all.jsp?arnumber= [3] M. Yang, L. Zhang, X. Feng, and D. Zhang, Fisher discrimination dictionary learning for sparse representation, in Proc. 13th IEEE Int. Conf. Comput. Vis., Nov. 011, pp [Online]. Available: [33] T. Bai, Y. F. Li, and X. Zhou, Robust and fast visual tracking using constrained sparse coding and dictionary learning, in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 01, pp [Online]. Available: [34] F. Yang, H. Lu, and M.-H. Yang, Learning structured visual dictionary for object tracking, Image Vis. Comput., vol. 31, no. 1, pp , Dec [Online]. Available: com/retrieve/pii/s [35] Y. Xie, W. Zhang, Y. Qu, and Y. Zhang, Discriminative subspace learning with sparse representation view-based model for robust visual tracking, Pattern Recognit., vol. 47, no. 3, pp , Mar [Online]. Available: elsevier.com/retrieve/pii/s [36] H. Fan and J. Xiang, Robust Visual Tracking with Multitask Joint Dictionary Learning, IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 5, pp , May 017. [37] G. D. Hager and P. N. Belhumeur, Real-time tracking of image regions with changes in geometry and illumination, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 1996, pp [38] P. N. Belhumeur and D. J. Kriegmant, What is the set of images of an object under all possible lighting conditions? in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 1996, pp [39] D. Wang, H. Lu, and M.-H. Yang, Online object tracking with sparse prototypes, IEEE Trans. Image Process., vol., no. 1, pp , Jan [Online]. Available: org/xpls/absall.jsp?arnumber=61358

14 SUI et al.: EXPLOITING SPATIAL-TEMPORAL LOCALITY OF TRACKING VIA STRUCTURED DICTIONARY LEARNING 195 [40] Y. Sui, S. Zhang, and L. Zhang, Robust visual tracking via sparsityinduced subspace learning, IEEE Trans. Image Process., vol. 4, no. 1, pp , Dec [41] Y. Sui, X. Zhao, S. Zhang, X. Yu, S. Zhao, and L. Zhang, Self-expressive tracking, Pattern Recognit., vol. 48, no. 9, pp , 015. [Online]. Available: com/retrieve/pii/s [4] D. Wang, H. Lu, and M.-H. Yang, Least soft-threshold squares tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 013, pp [Online]. Available: [43] Y. Sui and L. Zhang, Robust tracking via locally structured representation, Int. J. Comput. Vis., vol. 119, no., pp , 016. [Online]. Available: [44] Y. Sui, G. Wang, Y. Tang, and L. Zhang, Tracking completion, in Proc. Eur. Conf. Comput. Vis. (ECCV, 016, pp [45] S. Hare, A. Saffari, and P. H. S. Torr, Struck: Structured output tracking with kernels, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Nov. 011, pp [Online]. Available: abs_all.jsp?arnumber=61651 [46] Z. Kalal, K. Mikolajczyk, and J. Matas, Tracking-learningdetection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp , Jul. 01. [Online]. Available: org/xpls/abs all.jsp?arnumber= [47] F. Henriques, R. Caseiro, P. Martins, and J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in Proc. Eur. Conf. Comput. Vis. (ECCV, 01, pp [48] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp , Mar [Online]. Available: [49] S. Liu, T. Zhang, X. Cao, and C. Xu, Structural correlation filter for robust visual tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 016, pp [50] Y. Sui, Z. Zhang, G. Wang, Y. Tang, and L. Zhang, Real-time visual tracking: Promoting the robustness of correlation filter learning, in Proc. Eur. Conf. Comput. Vis. (ECCV, 016, pp [51] Y. Sui, G. Wang, and L. Zhang, Correlation filter learning toward peak strength for visual tracking, IEEE Trans. Cybern., to be published. [Online]. Available: doi: /TCYB [5] C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, Hierarchical convolutional features for visual tracking, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Dec. 015, pp [53] L. Wang, W. Ouyang, X. Wang, and H. Lu, Visual tracking with fully convolutional networks, in Proc. IEEE Int. Conf. Comput. Vis. (ICCV, Dec. 015, pp [54] Y. Qi et al., Hedged deep tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 016, pp [55] L. Wang, W. Ouyang, X. Wang, and H. Lu, STCT: Sequentially training convolutional networks for visual tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 016, pp [56] H. Nam and B. Han, Learning Multi-Domain Convolutional Neural Networks for Visual Tracking, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 016, pp [57] Y. Wu, J. Lim, and M. H. Yang, Object tracking benchmark, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp , Sep [58] T. B. Dinh, N. Vo, and G. Medioni, Context tracker: Exploring supporters and distracters in unconstrained environments, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 011, pp [Online]. Available: org/lpdocs/epic03/wrapper.htm?arnumber= [59] X. Mei and H. Ling, Robust visual tracking and vehicle classification via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 59 7, Nov [Online]. Available: [60] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, Low-rank sparse learning for robust visual tracking, in Proc. Eur. Conf. Comput. Vis. (ECCV, 01, pp [Online]. Available: com/index/wkpmk0tt3683t.pdf [61] T. Zhang, S. Liu, N. Ahuja, M.-H. Yang, and B. Ghanem, Robust visual tracking via consistent low-rank sparse learning, Int. J. Comput. Vis., vol. 111, no., pp , Jun [Online]. Available: [6] K. Zhang, L. Zhang, and M.-H. Yang, Real-time compressive tracking, in Proc. Eur. Conf. Comput. Vis. (ECCV, 01, pp [63] D. Wang and H. Lu, Visual tracking via probability continuous outlier model, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR, Jun. 014, pp [64] D. Wang, H. Lu, and C. Bo, Fast and robust object tracking via probability continuous outlier model, IEEE Trans. Image Process., vol. 4, no. 1, pp , Dec [65] K. Zhang, Q. Liu, Y. Wu, and M.-H. Yang, Robust visual tracking via convolutional networks without training, IEEE Trans. Image Process., vol. 5, no. 4, pp , Apr [66] J. Zhang, S. Ma, and S. Sclaroff, MEEM: Robust tracking via multiple experts using entropy minimization, in Proc. Eur. Conf. Comput. Vis. (ECCV, 014, pp [67] D. Wang and H. Lu, Object tracking via DPCA and 1 - regularization, IEEE Signal Process. Lett., vol. 19, no. 11, pp , Nov. 01. [Online]. Available: jsp?arnumber= [68] K. Zhang, L. Zhang, and M.-H. Yang, Real-time object tracking via online discriminative feature selection, IEEE Trans. Image Process., vol., no. 1, pp , Dec [Online]. Available: [69] M. Aharon, M. Elad, and A. Bruckstein, K-SVD : An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Trans. Signal Process., vol. 54, no. 11, pp , Nov [70] J.-F. Cai, E. J. Candes, and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim., vol. 0, no. 4, pp , 010. [Online]. Available: org/doi/pdf/ / [71] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imag. Sci., vol., no. 1, pp , 009. [Online]. Available: org/doi/abs/ / Yao Sui received the master s degree in software engineering from Peking University, Beijing, China and the Ph.D. degree in electronic engineering from Tsinghua University, Beijing. He was a Post- Doctoral Researcher with the Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA. He is currently a Research Fellow with the Harvard Medical School, Harvard University, Boston, MA, USA. His research interests include machine learning, computer vision, image processing, pattern recognition, and medical imaging. Guanghui Wang (SM 17 received the Ph.D. degree in computer vision from the University of Waterloo, Waterloo, ON, Canada. From 003 to 005, he was a Research Fellow and a Visiting Scholar with the Department of Electronic Engineering, The Chinese University of Hong Kong. From 005 to 007, he acted as a Professor with the Department of Control Engineering, Changchun Aviation University, China. From 006 to 010, he was a Research Fellow with the Department of Electrical and Computer Engineering, University of Windsor, Canada. He is currently an Assistant Professor with the University of Kansas, Lawrence, KS, USA. He is also with the Institute of Automation, Chinese Academy of Sciences, China, as an Adjunct Professor. He has authored one book Guide to Three Dimensional Structure and Motion Factorization (Springer Verlag. He has authored or co-authored over 90 papers in peer-reviewed journals and conferences. His research interests include computer vision, structure from motion, object detection and tracking, artificial intelligence, and robot localization and navigation. He has served as an associate editor and on the editorial board of two journals, as an area chair or TPC member of over 0 conferences, and as a reviewer of over 0 journals.

15 196 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 3, MARCH 018 Li Zhang received the B.S., M.S., and Ph.D. degrees from Tsinghua University, Beijing, China, all in electronic engineering. He was an Instructor with the Department of Physics, Tsinghua University from 1987 to 199. He joined the Department of Electronic Engineering, Tsinghua University, as a faculty member in 199, where he is currently a Professor and also the Director of the UVA Vision Laboratory. He is also a member of the National Laboratory of Pattern Recognition, Beijing. His interests include image processing, computer vision, pattern recognition, and computer graphics. Ming-Hsuan Yang (SM 06 received the Ph.D. degree in computer science from the University of Illinois at Urbana Champaign in 000. In 008, he joined the University of California at Merced (UC Merced, where he is currently a Professor in electrical engineering and computer science with. He was a Senior Research Scientist with the Honda Research Institute, where he was involved in vision problems related to humanoid robots. He served as an Associate Editor of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLI- GENCE from 007 to 011. He is currently an Associate Editor of the International Journal of Computer Vision, Image and Vision Computing and the Journal of Artificial Intelligence Research. He received the NSF CAREER Award in 01, the Senate Award for Distinguished Early Career Research from UC Merced in 011, and the Google Faculty Award in 009. He is a Senior Member of the ACM.

arxiv: v2 [cs.cv] 9 Sep 2016

arxiv: v2 [cs.cv] 9 Sep 2016 arxiv:1608.08171v2 [cs.cv] 9 Sep 2016 Tracking Completion Yao Sui 1, Guanghui Wang 1,4, Yafei Tang 2, Li Zhang 3 1 Dept. of EECS, University of Kansas, Lawrence, KS 66045, USA 2 China Unicom Research Institute,

More information

VISUAL tracking plays an important role in signal processing

VISUAL tracking plays an important role in signal processing IEEE TRANSACTIONS ON CYBERNETICS 1 Correlation Filter Learning Toward Peak Strength for Visual Tracking Yao Sui, Guanghui Wang, and Li Zhang Abstract This paper presents a novel visual tracking approach

More information

Superpixel Tracking. The detail of our motion model: The motion (or dynamical) model of our tracker is assumed to be Gaussian distributed:

Superpixel Tracking. The detail of our motion model: The motion (or dynamical) model of our tracker is assumed to be Gaussian distributed: Superpixel Tracking Shu Wang 1, Huchuan Lu 1, Fan Yang 1 abnd Ming-Hsuan Yang 2 1 School of Information and Communication Engineering, University of Technology, China 2 Electrical Engineering and Computer

More information

Tracking Completion. Institute of Automation, CAS, Beijing, China

Tracking Completion.  Institute of Automation, CAS, Beijing, China Tracking Completion Yao Sui 1(B), Guanghui Wang 1,4, Yafei Tang 2, and Li Zhang 3 1 Department of EECS, University of Kansas, Lawrence, KS 66045, USA suiyao@gmail.com, ghwang@ku.edu 2 China Unicom Research

More information

Keywords:- Object tracking, multiple instance learning, supervised learning, online boosting, ODFS tracker, classifier. IJSER

Keywords:- Object tracking, multiple instance learning, supervised learning, online boosting, ODFS tracker, classifier. IJSER International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 37 Object Tracking via a Robust Feature Selection approach Prof. Mali M.D. manishamali2008@gmail.com Guide NBNSCOE

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning

Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning Yao Sui 1, Ziming Zhang 2, Guanghui Wang 1, Yafei Tang 3, Li Zhang 4 1 Dept. of EECS, University of Kansas, Lawrence,

More information

Real-time Object Tracking via Online Discriminative Feature Selection

Real-time Object Tracking via Online Discriminative Feature Selection IEEE TRANSACTION ON IMAGE PROCESSING 1 Real-time Object Tracking via Online Discriminative Feature Selection Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang Abstract Most tracking-by-detection algorithms

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

OBJECT tracking has been extensively studied in computer

OBJECT tracking has been extensively studied in computer IEEE TRANSAION ON IMAGE PROCESSING 1 Real-time Object Tracking via Online Discriminative Feature Selection Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang Abstract Most tracking-by-detection algorithms train

More information

Online Discriminative Tracking with Active Example Selection

Online Discriminative Tracking with Active Example Selection 1 Online Discriminative Tracking with Active Example Selection Min Yang, Yuwei Wu, Mingtao Pei, Bo Ma and Yunde Jia, Member, IEEE Abstract Most existing discriminative tracking algorithms use a sampling-and-labeling

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

An efficient algorithm for sparse PCA

An efficient algorithm for sparse PCA An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University Tracking Hao Guan( 管皓 ) School of Computer Science Fudan University 2014-09-29 Multimedia Video Audio Use your eyes Video Tracking Use your ears Audio Tracking Tracking Video Tracking Definition Given

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Structure-adaptive Image Denoising with 3D Collaborative Filtering

Structure-adaptive Image Denoising with 3D Collaborative Filtering , pp.42-47 http://dx.doi.org/10.14257/astl.2015.80.09 Structure-adaptive Image Denoising with 3D Collaborative Filtering Xuemei Wang 1, Dengyin Zhang 2, Min Zhu 2,3, Yingtian Ji 2, Jin Wang 4 1 College

More information

THE recent years have witnessed significant advances

THE recent years have witnessed significant advances IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 4, APRIL 2014 1639 Robust Superpixel Tracking Fan Yang, Student Member, IEEE, Huchuan Lu, Senior Member, IEEE, and Ming-Hsuan Yang, Senior Member, IEEE

More information

Globally Stabilized 3L Curve Fitting

Globally Stabilized 3L Curve Fitting Globally Stabilized 3L Curve Fitting Turker Sahin and Mustafa Unel Department of Computer Engineering, Gebze Institute of Technology Cayirova Campus 44 Gebze/Kocaeli Turkey {htsahin,munel}@bilmuh.gyte.edu.tr

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Structural Local Sparse Tracking Method Based on Multi-feature Fusion and Fractional Differential

Structural Local Sparse Tracking Method Based on Multi-feature Fusion and Fractional Differential Journal of Information Hiding and Multimedia Signal Processing c 28 ISSN 273-422 Ubiquitous International Volume 9, Number, January 28 Structural Local Sparse Tracking Method Based on Multi-feature Fusion

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

Robust Visual Tracking Via Consistent Low-Rank Sparse Learning

Robust Visual Tracking Via Consistent Low-Rank Sparse Learning DOI 10.1007/s11263-014-0738-0 Robust Visual Tracking Via Consistent Low-Rank Sparse Learning Tianzhu Zhang Si Liu Narendra Ahuja Ming-Hsuan Yang Bernard Ghanem Received: 15 November 2013 / Accepted: 3

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

arxiv: v1 [cs.cv] 2 May 2016

arxiv: v1 [cs.cv] 2 May 2016 16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer

More information

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 122 CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 5.1 INTRODUCTION Face recognition, means checking for the presence of a face from a database that contains many faces and could be performed

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Robust Face Recognition via Sparse Representation

Robust Face Recognition via Sparse Representation Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of

More information

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS This chapter presents a computational model for perceptual organization. A figure-ground segregation network is proposed based on a novel boundary

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

Collaborative Sparsity and Compressive MRI

Collaborative Sparsity and Compressive MRI Modeling and Computation Seminar February 14, 2013 Table of Contents 1 T2 Estimation 2 Undersampling in MRI 3 Compressed Sensing 4 Model-Based Approach 5 From L1 to L0 6 Spatially Adaptive Sparsity MRI

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

Face Recognition via Sparse Representation

Face Recognition via Sparse Representation Face Recognition via Sparse Representation John Wright, Allen Y. Yang, Arvind, S. Shankar Sastry and Yi Ma IEEE Trans. PAMI, March 2008 Research About Face Face Detection Face Alignment Face Recognition

More information

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at 14th International Conference of the Biometrics Special Interest Group, BIOSIG, Darmstadt, Germany, 9-11 September,

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Robust Model-Free Tracking of Non-Rigid Shape. Abstract Robust Model-Free Tracking of Non-Rigid Shape Lorenzo Torresani Stanford University ltorresa@cs.stanford.edu Christoph Bregler New York University chris.bregler@nyu.edu New York University CS TR2003-840

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

IEEE TRANSACTIONS ON CYBERNETICS 1. Robust Visual Tracking via Exclusive Context Modeling

IEEE TRANSACTIONS ON CYBERNETICS 1. Robust Visual Tracking via Exclusive Context Modeling IEEE TRANSACTIONS ON CYBERNETICS 1 Robust Visual Tracking via Exclusive Context Modeling Tianzhu Zhang, Member, IEEE, Bernard Ghanem, Member, IEEE, Si Liu, Member, IEEE, Changsheng Xu, Fellow, IEEE, and

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Incremental Structured Dictionary Learning for Video Sensor-Based Object Tracking

Incremental Structured Dictionary Learning for Video Sensor-Based Object Tracking Sensors 2014, 14, 3130-3155; doi:10.3390/s140203130 OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.com/journal/sensors Article Incremental Structured Dictionary Learning for Video Sensor-Based Object Tracking

More information

Image Restoration and Background Separation Using Sparse Representation Framework

Image Restoration and Background Separation Using Sparse Representation Framework Image Restoration and Background Separation Using Sparse Representation Framework Liu, Shikun Abstract In this paper, we introduce patch-based PCA denoising and k-svd dictionary learning method for the

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Feature Point Extraction using 3D Separability Filter for Finger Shape Recognition

Feature Point Extraction using 3D Separability Filter for Finger Shape Recognition Feature Point Extraction using 3D Separability Filter for Finger Shape Recognition Ryoma Yataka, Lincon Sales de Souza, and Kazuhiro Fukui Graduate School of Systems and Information Engineering, University

More information

An efficient algorithm for rank-1 sparse PCA

An efficient algorithm for rank-1 sparse PCA An efficient algorithm for rank- sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato Monteiro Georgia Institute of Technology School of Industrial &

More information

Multidirectional 2DPCA Based Face Recognition System

Multidirectional 2DPCA Based Face Recognition System Multidirectional 2DPCA Based Face Recognition System Shilpi Soni 1, Raj Kumar Sahu 2 1 M.E. Scholar, Department of E&Tc Engg, CSIT, Durg 2 Associate Professor, Department of E&Tc Engg, CSIT, Durg Email:

More information

Compressive Sensing for Multimedia. Communications in Wireless Sensor Networks

Compressive Sensing for Multimedia. Communications in Wireless Sensor Networks Compressive Sensing for Multimedia 1 Communications in Wireless Sensor Networks Wael Barakat & Rabih Saliba MDDSP Project Final Report Prof. Brian L. Evans May 9, 2008 Abstract Compressive Sensing is an

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Enhanced Laplacian Group Sparse Learning with Lifespan Outlier Rejection for Visual Tracking

Enhanced Laplacian Group Sparse Learning with Lifespan Outlier Rejection for Visual Tracking Enhanced Laplacian Group Sparse Learning with Lifespan Outlier Rejection for Visual Tracking Behzad Bozorgtabar 1 and Roland Goecke 1,2 1 Vision & Sensing, HCC Lab, ESTeM University of Canberra 2 IHCC,

More information

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port

More information

Automatic Shadow Removal by Illuminance in HSV Color Space

Automatic Shadow Removal by Illuminance in HSV Color Space Computer Science and Information Technology 3(3): 70-75, 2015 DOI: 10.13189/csit.2015.030303 http://www.hrpub.org Automatic Shadow Removal by Illuminance in HSV Color Space Wenbo Huang 1, KyoungYeon Kim

More information

Using temporal seeding to constrain the disparity search range in stereo matching

Using temporal seeding to constrain the disparity search range in stereo matching Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Robust Ring Detection In Phase Correlation Surfaces

Robust Ring Detection In Phase Correlation Surfaces Griffith Research Online https://research-repository.griffith.edu.au Robust Ring Detection In Phase Correlation Surfaces Author Gonzalez, Ruben Published 2013 Conference Title 2013 International Conference

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Image Quality Assessment based on Improved Structural SIMilarity

Image Quality Assessment based on Improved Structural SIMilarity Image Quality Assessment based on Improved Structural SIMilarity Jinjian Wu 1, Fei Qi 2, and Guangming Shi 3 School of Electronic Engineering, Xidian University, Xi an, Shaanxi, 710071, P.R. China 1 jinjian.wu@mail.xidian.edu.cn

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking The Visual Object Tracking Challenge Workshop 2016 Modeling and Propagating CNNs in a Tree Structure for Visual Tracking Hyeonseob Nam* Mooyeol Baek* Bohyung Han Dept. of Computer Science and Engineering

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

BSIK-SVD: A DICTIONARY-LEARNING ALGORITHM FOR BLOCK-SPARSE REPRESENTATIONS. Yongqin Zhang, Jiaying Liu, Mading Li, Zongming Guo

BSIK-SVD: A DICTIONARY-LEARNING ALGORITHM FOR BLOCK-SPARSE REPRESENTATIONS. Yongqin Zhang, Jiaying Liu, Mading Li, Zongming Guo 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) BSIK-SVD: A DICTIONARY-LEARNING ALGORITHM FOR BLOCK-SPARSE REPRESENTATIONS Yongqin Zhang, Jiaying Liu, Mading Li, Zongming

More information

An Improved Approach For Mixed Noise Removal In Color Images

An Improved Approach For Mixed Noise Removal In Color Images An Improved Approach For Mixed Noise Removal In Color Images Ancy Mariam Thomas 1, Dr. Deepa J 2, Rijo Sam 3 1P.G. student, College of Engineering, Chengannur, Kerala, India. 2Associate Professor, Electronics

More information

Dynamic Shape Tracking via Region Matching

Dynamic Shape Tracking via Region Matching Dynamic Shape Tracking via Region Matching Ganesh Sundaramoorthi Asst. Professor of EE and AMCS KAUST (Joint work with Yanchao Yang) The Problem: Shape Tracking Given: exact object segmentation in frame1

More information

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 Face Recognition ased on LDA and Improved Pairwise-Constrained

More information

Tracking Using Online Feature Selection and a Local Generative Model

Tracking Using Online Feature Selection and a Local Generative Model Tracking Using Online Feature Selection and a Local Generative Model Thomas Woodley Bjorn Stenger Roberto Cipolla Dept. of Engineering University of Cambridge {tew32 cipolla}@eng.cam.ac.uk Computer Vision

More information

Structural Sparse Tracking

Structural Sparse Tracking Structural Sparse Tracking Tianzhu Zhang 1,2 Si Liu 3 Changsheng Xu 2,4 Shuicheng Yan 5 Bernard Ghanem 1,6 Narendra Ahuja 1,7 Ming-Hsuan Yang 8 1 Advanced Digital Sciences Center 2 Institute of Automation,

More information

Sparsity Based Regularization

Sparsity Based Regularization 9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

4422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 9, SEPTEMBER 2018

4422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 9, SEPTEMBER 2018 4422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 9, SEPTEMBER 2018 Robust 3D Hand Pose Estimation From Single Depth Images Using Multi-View CNNs Liuhao Ge, Hui Liang, Member, IEEE, Junsong Yuan,

More information

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003 CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems

More information

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Florin C. Ghesu 1, Thomas Köhler 1,2, Sven Haase 1, Joachim Hornegger 1,2 04.09.2014 1 Pattern

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Learning to Track at 100 FPS with Deep Regression Networks - Supplementary Material

Learning to Track at 100 FPS with Deep Regression Networks - Supplementary Material Learning to Track at 1 FPS with Deep Regression Networks - Supplementary Material David Held, Sebastian Thrun, Silvio Savarese Department of Computer Science Stanford University {davheld,thrun,ssilvio}@cs.stanford.edu

More information

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H. Nonrigid Surface Modelling and Fast Recovery Zhu Jianke Supervisor: Prof. Michael R. Lyu Committee: Prof. Leo J. Jia and Prof. K. H. Wong Department of Computer Science and Engineering May 11, 2007 1 2

More information

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,

More information

Virtual Training Samples and CRC based Test Sample Reconstruction and Face Recognition Experiments Wei HUANG and Li-ming MIAO

Virtual Training Samples and CRC based Test Sample Reconstruction and Face Recognition Experiments Wei HUANG and Li-ming MIAO 7 nd International Conference on Computational Modeling, Simulation and Applied Mathematics (CMSAM 7) ISBN: 978--6595-499-8 Virtual raining Samples and CRC based est Sample Reconstruction and Face Recognition

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Simplicial Global Optimization

Simplicial Global Optimization Simplicial Global Optimization Julius Žilinskas Vilnius University, Lithuania September, 7 http://web.vu.lt/mii/j.zilinskas Global optimization Find f = min x A f (x) and x A, f (x ) = f, where A R n.

More information