Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human Actions

Size: px
Start display at page:

Download "Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human Actions"

Transcription

1 Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human Actions Rizwan Chaudhry, Avinash Ravichandran, Gregory Hager and René Vidal Center for Imaging Science, Johns Hopkins University 34 N Charles St, Baltimore, MD 228 Abstract System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results.. Introduction Analysis of human activities has always remained a topic of great interest in computer vision. It is seen as a stepping stone for applications such as automatic environment surveillance, assisted living and human computer interaction. The surveys by Gavrila [4], Aggarwal et al. [] and by Moeslund et al., [8], [9] provide a broad overview of over three hundred papers and numerous approaches for analyzing human motion in videos, including human motion capture, tracking, segmentation and recognition. Related work. Recent work on activity recognition can be broadly classified into three types of approaches: local, global and system-theoretic. Local approaches use local spatiotemporal features to represent human activity in a video, e.g. [7,, 3]. Niebles [2] presented an unsupervised method similar to the bag-of-words approach for learning the probability distributions of space-time interest points in human action videos. İkizler et al. [6] presented a method whereby limb motion model units are learnt from labeled motion capture data and used to detect more complex unseen motions in a test video using search queries specific to the limb motion in the desired activity. [5] and [3] represent human activities by 3-D space-time shapes. Classification is performed by comparing geometric properties of these shapes against training data. However, an important limitation of the aforementioned approaches is that they do not incorporate global characteristics of the activity as a whole. Global approaches use global features such as optical flow to represent the state of motion in the whole frame at a time instant. With static background, one can represent the type of motion of the foreground object by computing features from the optical flow. In [3], optical flow histograms were used to match the motion of a player in a soccer match to that of a subject in a control video. Tran et al. in [27] present an optical flow and shape based approach that uses separate histograms for the horizontal and vertical components of the optical flow as well as the silhouette of the person as a motion descriptor. All these approaches, however, do not model the characteristic temporal dynamics of these features. Moreover, comparison is done either on a frameby-frame basis or using other ad-hoc methods. Clearly, the natural way to compare human activities is to compare their temporal evolution as a whole. System-theoretic approaches to recognition of human actions model feature variations with dynamical systems and hence specifically consider the dynamics of the activity. The recognition pipeline is composed of ) finding features in every frame, 2) modeling the temporal evolution of these /9/$ IEEE 932

2 Figure. Optical flows and HOOF feature trajectories features with a dynamical system, 3) using a similarity criteria, e.g. distances or kernels between dynamical systems, to train classifiers, and 4) using them on novel video sequences. Bissacco et al. used joint-angle trajectotries in [3] as well as joint trajectories from motion-capture data and features extracted from silhouettes in [4] to represent the action profiles. Ali et al. in [2] used joint trajectories to extract invariant features to model the non-linear dynamics of human activities. However, these approaches are mostly limited to local feature representations and to our knowledge, there has been no work on modeling the dynamics of global features, e.g. optical flow variations. Paper contributions. In this paper, we propose the Histogram of Oriented Optical Flow (HOOF) features to represent human activities. These novel features are independent of the scale of the moving person as well as the direction of motion. Extraction of HOOF features does not require any prior human segmentation or background subtraction. However, HOOF features are non-euclidean, and thus the evolution of HOOF features creates a trajectory on a nonlinear manifold. Traditionally, Linear Dynamical Systems (LDSs) have been used to model feature time series that are Euclidean, e.g. joint angles, joint trajectories, pixel intensities, etc. Non-Euclidean data like histogram time series need to be modeled with Non-Linear Dynamical Systems (NLDS). Hence, similarity criteria designed for LDSs cannot be used to compare two histogram time series. In this paper, we extend the Binet-Cauchy kernels [29] to NLDS. This is done by replacing an infinite sum of output feature inner products in the kernel expression by a Mercer kernel [23] on the output space. We model the proposed HOOF features as outputs of NLDS and use the Binet-Cauchy kernels for NLDS to perform human activity recognition on the Weizmann database [5] with encouraging results. Paper outline. The rest of this paper is organized as follows. 2 briefly reviews the LDS recognition pipeline for Euclidean time-series data. 3 proposes the Histogram of Oriented Optical Flow (HOOF) features, which are used to model the activity profile in each frame of a video. Every activity video is thus represented as a non-euclidean time-series of HOOF features. 4 introduces NLDS and describes how NLDS parameters can be learnt using kernels defined on the underlying non-euclidean space. 5 presents the Binet-Cauchy kernels for NLDSs which define a similarity metric between two non-euclidean time-series. 6 gives experimental results for human activity recognition using the proposed metric and features. Finally, 7 gives concluding remarks and future directions. 2. Recognition with Linear Dynamical Systems A LDS is represented by the tuple M = (µ, x, A, C, B, R) and evolves in time according to the following equations { xt+ = Ax t + Bv t y t = µ + Cx t + w t. () Here x t R n is the state of the LDS at time t; y t R p is the observed output or feature at time t; x is the initial state of the system; and µ R p is the mean of {y t } t= N, e.g. the mean joint angle configuration, etc. A R n n describes the dynamics of the state evolution, B R n nv models the way in which input noise affects the state evolution and C R p n transforms the state to an output or observation of the overall system. v t R nv and w t R p are the system noise and the observation noise at time t, respectively. We assume that the noise processes are zero-mean i.i.d. Gaussian, such that v t G(v t,, I nv ) and w t G(w t,, R), R R p p, 933

3 where G(z, µ z, Σ)=(2π) n/2 Σ /2 exp( 2 z µ z 2 Σ ) is a multivariate Gaussian distribution on z with z 2 Σ = z Σ z. By this definition, Bv t G(Bv t,, Q) where Q = BB R nv nv. We also assume that v t and w t are independent processes. Given a set of T training videos, the first task is to learn the parameters M i, i =,..., T, from the feature trajectories of each video. There are several methods to learn these system parameters e.g. [25], [28] and []. Once these parameters are identified for each of the videos, various metrics can be used to define the similarity between these LDSs. In particular, three major types of metrics are ) geometric distances based on subspace angles between the observability subspaces of the LDSs [9], 2) algebraic metrics like the Binet-Cauchy kernels [29] and 3) information theoretic metrics like the KL-divergence [6]. Given a metric, all pairwise similarities are evaluated on the training data, and used for classification of novel sequences using methods such as k-nearest Neighbors (k- NN) or Support Vector Machine (SVM). 3. Histogram of Oriented Optical Flow (HOOF) As we alluded to in the introduction, existing systemtheoretic approaches to action recognition have been mostly applied to joint angles extracted from motion capture data. If one were to apply such approaches to video data, one would be faced with the challenging problem of accurately extracting and tracking the joints of a person in the presence of self-occlusions, changes of scale, pose, etc. Inspired by the recent success of histograms of features in the object recognition community, we posit that the natural feature to use in a motion sequence is optical flow. However, the raw optical flow data may be of no use, as the number of pixels in a person (hence the size of the descriptor) changes over time. Moreover, optical flow computations are very susceptible to background noise, scale changes as well as directionality of movement. To avoid these issues, one could use instead the distribution of optical flow. Indeed, when a person moves through a scene with a stationary background, it induces a very characteristic optical flow profile. Figure shows some optical flow patterns for a sample walking sequence. However, notice that the observed optical flow profile could be different if the activity was performed at a larger scale. For example a zoomed in walking person versus a far-away walking person. The magnitude of the optical flow vectors would be larger in the zoomed in case. Similarly, if a person is running from the left to the right, the optical flow observed would be a reflection in the vertical axis to that observed if the person was running from the right to the left. We thus need a feature based on optical flow that represents the action profile at every time instant and that is invariant to the scale and directionality of motion. To overcome these issues, in this paper we propose the Histogram of Oriented Optical Flow (HOOF), which is defined as follows. First, optical flow is computed at every frame of the window. Each flow vector is binned according to its primary angle from the horizontal axis and weighted according to its magnitude. Thus, all optical flow vectors, v = [x, y] with direction, θ = tan ( y x ) in the range π 2 + π b B θ < π 2 + π b B will contribute by x 2 + y 2 to the sum in bin b, b B, out of a total of B bins. Finally, the histogram is normalized to sum up to. 3 2 β 4 α 4 α Optical Flow Vectors (2) HOOF feature Figure 2. Histogram formation with four bins, B = 4 Figure 2 illustrates the procedure. Binning according to the primary angle, the smallest signed angle between the horizontal axis and the vector, allows the histogram representation to be independent of the (left or right) direction of motion. Normalization makes the histogram representation scale-invariant. We expect to observe the same histogram whether a person is moving from the left to the right or in the opposite direction, whether a person is running far away in the scene or very near the camera. Since the contribution of each optical flow vector to its corresponding bin is proportional to its magnitude, small noisy optical flow measurements have little effect on the observed histogram. Assuming a stationary background, there is no optical flow in the background. Using the magnitude-based addition to each bin, we can simply compute the optical flow histogram on the whole frame rather than requiring to pre-compute a segmentation of the moving person. The number of bins, B, is a parameter of choice. Generally we observe that with histogram time-series of at least 3 bins per histogram, we are able to achieve good recognition results. 3.. Kernels for comparing HOOF HOOF features provide us with a normalized histogram h t = [h t;, h t;2,..., h t;b ] at each time instant t. In order 934

4 to use such histograms for recognition purposes, we need to have a way of comparing two histograms. To that end, notice that histograms cannot be treated as simple vectors in a Euclidean space. Histograms are essentially probability mass functions, hence they must satisfy the constraints B h t;i = and h t;i, i {,..., B}. At first sight, i= one may think that this space is still simple enough. However, the space of histograms H is actually a Riemannian manifold with a nontrivial structure. The problem of equipping the space of probability density functions (PDFs) with a differential manifold structure and defining a Riemannian metric on it has been an active area of research in the field of information geometry. The work by Rao [2] was the first to introduce a Riemannian structure to this statistical manifold by introducing the Fisher-Rao metric. The Fisher-Rao metric, however, is extremely hard to work with due to the difficulty in computing geodesics on this space [26]. Even though the space H turns out to be difficult to work with, we know that it is not the only possible representation for PDFs. There are many different re-parameterizations of PDFs that are equivalent. These include the cumulative distribution function, log density function and square-root density function. Each of these parameterizations will lead to a different resulting manifold. Depending on the choice of representation, the resulting Riemannian structure can have varying degrees of complexity and numerical techniques may be required to compute geodesics on the manifold. For the sake of computational simplicity, in this paper we will restrict our attention to similarity measures built by mapping the histogram h H to a high dimensional (possibly infinite) Euclidean space, F, using the map Φ : H F. Since F is a Euclidean space, all the natural notions of finding distances between two points can be employed for comparison. Most of the time, however, the map Φ cannot be found. Mercer kernels [23] have the special property of being positive definite kernels that induce an inner product in a higher dimensional space under the map Φ. More specifically, for points lying on the non-linear manifold H, the Mercer kernel is given by k(h, h 2 ) = Φ(h ) Φ(h 2 ) and hence a similarity measure between the high-dimensional space can be computed by simply computing the kernel function on the original representation without knowing the mapping Φ. We now briefly describe some popular kernel measures used on the space of histograms. The histogram, h t = [h t;,..., h t;b ] can be reparameterized to the square root representation for histograms, ht := [ h t;,..., B h t;b ] such that ( h t;i ) 2 =. i= This projects every histogram onto the unit B-dimensional hypersphere or S B. The Riemannian metric between two points R and R 2 on the hypersphere is d(r, R 2 ) = cos (R R 2 ). Thus a kernel between two histograms can be defined as an inner product on their square root representations: B k S (h, h 2 ) = h;i h 2;i. (3) i= Note that this is precisely the kernel that can be achieved by using the RBF kernel, k(h, h 2 ) = exp( d(h, h 2 )), on the Bhattacharya distance between the two histograms. Minimum Difference of Pairwise Assignment (MDPA) [5] is similar to the Earth Mover s Distance (EMD) [22] and is a metric on the space of histograms that implicitly is a summation of distances between points on an underlying metric space from which the histograms were created. For ordinal histograms, i.e., histograms created from linearly varying data (as opposed to modular data, e.g. the modular group Z p = {,,..., p }), the MDPA distance is B i d MDPA (h, h 2 ) = (h ;j h 2;j ).. (4) i= j= Another popular distance between two histograms is the χ 2 distance which is defined as d χ 2(h, h 2 ) = 2 B i= h ;i h 2;i h ;i + h 2;i (5) We can use the RBF kernel to create kernels as similarity measures from these distances. Finally, the Histogram Intersection Kernel (HIST) [2] is another Mercer kernel [23] on the space of histograms and is defined for normalized histograms as k HIST = B min(h ;i, h 2;i ). (6) i= The inner product of the square-root representations is by construction an inner product and hence is a Mercer kernel. Also, the χ 2 and HIST kernels are provably Mercer kernels [32], [2]. However, to the best of our knowledge, the positive-definiteness of the MDPA kernel has not been established [32] Kernels for comparing HOOF time series Since HOOF features h t = [h t;, h t;2,..., h t;b ] are defined at each frame of the video, our actual representation is a time series of these histograms {h t } N t=. Our goal is to compare these time series in order to perform classification of actions. But rather than comparing these time series directly, we want to exploit the temporal evolution of these histograms in order to distinguish different actions. We posit that each action induces a time-series of HOOF 935

5 with specific dynamics, and that different actions induce different dynamics. Therefore, we propose to recognize actions by comparing the dynamics of HOOF time series. There are two important technical challenges in developing a framework for the classification of HOOF time series. The first one is that, because each histogram h t lives in a non-euclidean space H, we cannot model its temporal evolution with LDSs. We address this issue in 4 by using the previously defined kernels in H to define a NLDS. The second challenge is how to compute a distance between HOOF time series. We address this issue in 5, where we extend Binet-Cauchy kernels to NLDSs. 4. Modeling HOOF Time Series with Non- Linear Dynamical Systems Modeling Euclidean feature trajectories with LDSs has been very useful for dynamical system recognition. However we need NLDSs to model non-euclidean feature trajectories like histograms. Consider the Mercer kernel, k(y t, y t) = Φ(y t ) Φ(y t) on the non-euclidean space such that the implicit map, Φ, maps the original non-euclidean space H to a high dimensional (possibly infinite) Euclidean space F. Since k is a Mercer kernel, it represents an inner product in the higher dimensional Euclidean space. Using the map Φ, we can therefore transform the non-euclidean features, {y t } N t=, t= and to the high-dimensional Euclidean space {Φ(y t )} N assume that the higher dimensional trajectories follow a linear dynamical system { xt+ = Ax t + Bv t Φ(y t ) = Cx t + w t (7) The main difference between (7) and () is that we do not necessarily know the embedding Φ, hence we cannot identify (A, C) as before. Moreover, even if we knew Φ, C : R n F is now a linear operator, rather than simply a matrix, because F is possibly infinite dimensional. Therefore, the goal is to identify the parameters (A, B), the sequence x t, and some representation for C by exploiting the fact that we only know the kernel k. In [7], an approach based on Kernel PCA (KPCA) [24] that parallels the PCA approach for learning LDS parameters in [] was proposed to learn the system parameters for equation (7). Briefly, given the output feature sequence, {y t } t= N, the intra-sequence kernel matrix, K = {k(y i, y j )} N i,j= is computed, where k(y i, y j ) = Φ(y i ) Φ(y j ). The centered kernel matrix, that represents the kernel between zero-mean data in the high-dimensional space, is thus computed as K = (I N ee )K(I N ee ) where e = [,..., ] R N. After performing the eigenvalue decomposition K = V DV, the j-th eigenvector v j can be used to obtain the j-th kernel principal component as N α i,j Φ(y i ), where i= α i,j represents the i-th component of the j-th weight vector, α j = λj v j, assuming that the eigenvectors are sorted in descending order of the eigenvalues {λ j } N j=. Given α and K, the sequence of hidden states X = [x, x,..., x N ] and the state-transition matrix, A, can be estimated as X = α K (8) A = [x, x 2,..., x N ][x, x,..., x N 2 ] (9) The state noise at time t is estimated as ˆv t = x t x t, and the noise covariance matrix as Q = N N i= ˆv tˆv t. Using a Cholesky decomposition on Q, B is estimated as BB = Q. For details on estimating R and other parameters, refer to [7]. Notice, however, that we have not shown how to estimate C or a representation of it, though it is clear that C is somehow implicitly represented by the kernel matrix K. Since our goal is to use the NLDS in (7) for recognition, rather than synthesis, we do not actually need to compute C. All we need is a way of comparing two linear operators C, which can be done by comparing the corresponding kernel matrices K, as we show in the next section. 5. Comparing HOOF Time Series with Binet- Cauchy Kernels for NLDS Vishwanathan et al. [29] presented the family of Binet- Cauchy kernels for LDSs. In this section we briefly review the main concepts of the Binet-Cauchy kernels for LDSs and then develop extensions for the case of NLDSs. From the family of Binet-Cauchy kernels, the trace kernel, KLDS T, for comparing two infinitely long zeromean Euclidean time-series generated by the LDSs M = (x, A, C, B, R), and M = (x, A, C, B, R ) is defined as KLDS({y T t } t=, {y t} t=) = KLDS(M, T M ) [ ] := E v,w λ t yt y t. () t= Here < λ < and E represents the expected value of the infinite sum of the inner products w.r.t. the joint probability distribution of v t and w t. It was shown in [29] that if the two LDSs have the same underlying and independent noise processes, with covariances Q and R for the state and output, respectively, then the Binet-Cauchy trace kernel can be computed in closed form as K T LDS(M, M 2 ) = x P x + λ trace(qp + R), () λ 936

6 where P = λ t (A t ) C C A t t= (2) If λ A A <, where. is a matrix norm, then P can be computed by solving the Sylvester equation [29], P = λa P A + C C. (3) Notice that as a result of the system parameter learning method [], the second term on the right side of equation (3), C C, is the matrix of all pairwise inner products of the principal components of the matrix Y = [y ȳ, y ȳ,..., y N ȳ] and the matrix Y = [y ȳ, y ȳ,..., y N ȳ ], where ȳ is the mean of the sequence {y t } N t= and so on. Hence, the (i, j)-th entry of C C is c i c j, where c i is the i-th principal component of Y and c j is the j-th principal component of Y. We now develop the Binet-Cauchy trace kernel for NLDSs. From equation (), we see that the Binet-Cauchy trace kernel for LDSs is the expected value of an infinite series of weighted inner products between the outputs of two systems. We can similarly write the Binet-Cauchy trace kernel for NLDSs as the expected value of an infinite series of weighted inner products between the outputs after embedding them into the high-dimensional (possibly infinite) space using the map Φ. Specifically, [ ] KNLDS(M, T M ) := E v,w λ t Φ(y t ) Φ(y t) t= [ ] = E v,w λ t k(y t, y t), (4) t= where k is the kernel defined on the non-euclidean space of outputs H. If we look at eq (3), we see that in the case of NLDSs the equivalent form for the trace kernel is not immediately obtainable, because C and C are unknown, and hence the term C C cannot be evaluated directly. However, notice that C C is now the product of the matrices formed from the kernel principal components from the NLDS identification process as opposed to the principal components as in the case of LDSs. Thus, similar to the approach used in [7], the (i, j)-th entry of C C can be computed as [C C ] i,j = vi v j [ N ] N = α k,i Φ(y k ) α l,jφ(y l) k= l= = α i Sα j (5) where S is the matrix of all inner products of the form [Φ(y k ) Φ(y l )] k,l = [k(y k, y l )] k,l, where k {,..., N}, l {,..., N }. Before computing the entries of C C in equation (5), we need to center the kernel computation S. This is done by computing, α i = α i e α i N e and α j = α j e α j N e and evaluating, F = α S α Hence, the Binet-Cauchy kernel for NLDS requires the computation of the infinite sum, P = λ t (A t ) F A t, (6) t= If λ A A <, where. is a matrix norm, then P can be computed by solving the corresponding Sylvester equation: P = λa P A + F (7) and the Binet-Cauchy trace kernel for NLDS is KNLDS(M T, M 2 ) = x P x + λ λ trace(q P +R) (8) Notice that equation (8) is a general kernel that takes into consideration the dynamics of the system encoded by (A, C), the noise processes, Q and R, and the initial state, x. The effect of the initial state in a periodic time series is to simply delay the time series. Since in activity recognition it does not matter, e.g., at which part of the walking cycle the video begins, we would like a kernel that is independent of the initial states and the noise processes. Hence we define the Binet-Cauchy maximum singular value kernel for NLDS as K σ NLDS = max σ( P ) (9) which is the maximum singular value of P and is a kernel only on the dynamics of the NLDS. Furthermore, we can show that the Martin distance used by [7] (and all subspace angles based distances between dynamical systems) are special cases of the Binet-Cauchy kernels [8]. 6. Experiments To test the performance of our proposed HOOF features on activity recognition using the Binet-Cauchy kernel for NLDS, we perform a number of experiments on the Weizmann Human Action dataset [5]. This dataset contains 94 videos of different actions each performed by 9 different persons. The classes of actions include running, side walking, waving, jumping, etc. Optical flow was computed using OpenCV on each of the sequences and HOOF histograms were extracted from each frame leading to a HOOF time series for each video sequence. 6.. Classification Results We use each of the kernels described in section 3. to perform Kernel PCA on the HOOF time series extracted 937

7 5 Recognition with distance between HOOF temporal means Euclidean Geodesic MDPA chi square Non intersection 5 Geodesic kernel Martin Max SV kernel Temp Means 5 5 Error rate 5 Error rate Number of bins Figure 3. Misclassification rates when using distances between HOOF means from each video. We then use the Binet-Cauchy maximum singular value kernel, KNLDS σ, to compute all pairwise similarity values. We normalize the similarity values such that K(M, M ) =, if M = M by computing, K (M, M K(M, M ) ) = and finding pair-wise distances between systems by computing, K(M, M), K(M, M ) d(m, M ) = 2( K(M, M 2 )). Classification is then performed with Leave-one-out, -Nearest Neighbor classification using these distance values. As a baseline, Figure 3 shows the performance of using the distance between the temporal means to perform classification. The temporal means were computed by averaging the histogram time-series h = N N i= h i. Notice that h is also a histogram and we can apply the distance metrics in section 3. to compare them. Although using the mean histograms to represent the activity profile ignores any dynamics of the motion, we see that the HOOF features give low error rates. Figure 4 shows the error rates for the inner product of the square root representations, or the Geodesic kernel, across all bin sizes. We can see that the Binet-Cauchy maximum singular value kernel gives the lowest error rate of 5.6% achieving a recognition rate of 94.4%. The corresponding confusion matrix is shown in Figure 5. One sequence each is misclassified for the classes jump and run and three sequences of class wave were misclassified as wave2. Table compares the performance of three state of the art activity recognition algorithms on the Weizmann database with similar experimental setups. We can see that the proposed method performs better than two of the methods with any choice of the Mercer kernel on the space of histograms. Furthermore, the proposed method performs almost as well as the best method in [5]. The small decrease in performance is because of the fact that [5] requires the accurate extrac Number of bins, b Figure 4. Misclassification rates with Geodesic kernel on NLDS Proposed method - Geodesic kernel Proposed method - MDPA distance Proposed method - χ 2 distance Proposed method - HIST kernel Gorelick et al. [5] Ali et al. [2] 92.6 Niebles et al. [2] 9. Table. Comparison of recognition rates with state of the art methods on the Weizmann database tion of the silhouette of the moving person at every time instant which is a very strong assumption. Our method, on the other hand, is very general and does not require any preprocessing steps and still gives better results than all stateof-the-art methods, except [5]. bend jack jump pjump run side skip walk wave wave2 Riemannian metric with Binet Cauchy kernel, Leave one out NN bend jack jump pjump run side skip walk wave wave2 Figure 5. Confusion matrix for recognition on Weizmann database, average recognition rate is 94.4% 938

8 7. Conclusions and Future Work We have presented an activity recognition method that models the activity in a scene as a time-series of non- Euclidean Histograms of Oriented Optical Flow features. We have shown that these features do not need any preprocessing, human detection, tracking and prior background subtraction and represent the activity comprehensively. The HOOF features are scale-invariant as well as independent to the direction of motion. Since the space of histograms is non-euclidean, we have modeled the temporal evolution of HOOF features using NLDSs and learnt the system parameters using kernels on the original histograms. More importantly, we have extended the Binet-Cauchy kernels for measuring the similarities between two NLDSs and shown that the Binet-Cauchy kernels can also be computed by evaluating pairwise Mercer kernels on the non-euclidean space of features. We have applied our framework to our proposed HOOF features and have achieved state of the art results on the Weizmann Human Gait database. Currently we are working on extending our method to multiple disconnected motions in a scene by tracking and segmenting optical flow activity in the scene as well as accounting for the motion of the camera. 8. Acknowledgements This work was partially supported by startup funds from JHU and by grants ONR N , ONR N , NSF CAREER and ARL Robotics- CTA 84MC. References [] J. K. Aggarwal and Q. Cai. Human motion analysis: A review. Computer Vision and Image Understanding, 73:9 2, 999. [2] S. Ali, A. Basharat, and M. Shah. Chaotic invariants for human action recognition. In IEEE International Conference on Computer Vision, 27. [3] A. Bissacco, A. Chiuso, Y. Ma, and S. Soatto. Recognition of human gaits. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 52 58, 2. [4] A. Bissacco, A. Chiuso, and S. Soatto. Classification and recognition of dynamical models: The role of phase, independent components, kernels and optimal transport. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(): , 26. [5] S.-H. Cha and S. N. Srihari. On measuring the distance between histograms. Pattern Recognition, 35(6):355 37, June 22. [6] A. Chan and N. Vasconcelos. Probabilistic kernels for the classification of auto-regressive visual processes. In IEEE Conference on Computer Vision and Pattern Recognition, volume, pages , 25. [7] A. Chan and N. Vasconcelos. Classifying video with kernel dynamic textures. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6, 27. [8] R. Chaudhry and R. Vidal. Recognition of visual dynamical processes: Theory, kernels and experimental evaluation. Technical report, Johns Hopkins University, 29. [9] K. D. Cock and B. D. Moor. Subspace angles and distances between ARMA models. System and Control Letters, 46(4):265 27, 22. [] P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, October 25. [] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto. Dynamic textures. Int. Journal of Computer Vision, 5(2):9 9, 23. [2] R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley- Interscience, October 24. [3] A. Efros, A. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In IEEE International Conference on Computer Vision, pages , 23. [4] D. M. Gavrila. The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73:82 98, 999. [5] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(2): , 27. [6] N. İkizler and D. A. Forsyth. Searching for complex human activities with no visual examples. International Journal of Computer Vision, 8(3): , 28. [7] I. Laptev. On space-time interest points. Int. Journal of Computer Vision, 64(2-3):7 23, 25. [8] T. B. Moeslund and E. Granum. A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 8:23 268, 2. [9] T. B. Moeslund, A. Hilton, and V. Krger. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 4:9 26, 26. [2] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79:299 38, 28. [2] C. R. Rao. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math Soc., 37:8 89, 945. [22] Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databases. In IEEE Int. Conf. on Computer Vision, 998. [23] B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 22. [24] B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, :299 39, 998. [25] R. Shumway and D. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4): , 982. [26] A. Srivastava, I. Jermyn, and S. Joshi. Riemannian analysis of probability density functions with applications in vision. In IEEE Conference on Computer Vision and Pattern Recognition, 27. [27] D. Tran and A. Sorokin. Human activity recognition with metric learning. In European Conference on Computer Vision, 28. [28] P. van Overschee and B. D. Moor. Subspace Identification for Linear Systems. Kluwer Academic Publishers, 996. [29] S. Vishwanathan, A. Smola, and R. Vidal. Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision, 73():95 9, 27. [3] G. Willems, T. Tuytelaars, and L. J. V. Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. In European Conference on Computer Vision, 28. [3] A. Yilmaz and M. Shah. Actions sketch: A novel action representation. In IEEE Conference on Computer Vision and Pattern Recognition, pages , 25. [32] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: An in-depth study. Technical report, INRIA,

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems Avinash Ravichandran, Rizwan Chaudhry and René Vidal Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218,

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

Human Activity Recognition Using a Dynamic Texture Based Method

Human Activity Recognition Using a Dynamic Texture Based Method Human Activity Recognition Using a Dynamic Texture Based Method Vili Kellokumpu, Guoying Zhao and Matti Pietikäinen Machine Vision Group University of Oulu, P.O. Box 4500, Finland {kello,gyzhao,mkp}@ee.oulu.fi

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Human Action Recognition Using Independent Component Analysis

Human Action Recognition Using Independent Component Analysis Human Action Recognition Using Independent Component Analysis Masaki Yamazaki, Yen-Wei Chen and Gang Xu Department of Media echnology Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577,

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Generalized Principal Component Analysis CVPR 2007

Generalized Principal Component Analysis CVPR 2007 Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

A Unified Approach to Segmentation and Categorization of Dynamic Textures

A Unified Approach to Segmentation and Categorization of Dynamic Textures A Unified Approach to Segmentation and Categorization of Dynamic Textures Avinash Ravichandran 1, Paolo Favaro 2 and René Vidal 1 1 Center for Imaging Science, Johns Hopkins University, Baltimore, MD,

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Bag of Optical Flow Volumes for Image Sequence Recognition 1

Bag of Optical Flow Volumes for Image Sequence Recognition 1 RIEMENSCHNEIDER, DONOSER, BISCHOF: BAG OF OPTICAL FLOW VOLUMES 1 Bag of Optical Flow Volumes for Image Sequence Recognition 1 Hayko Riemenschneider http://www.icg.tugraz.at/members/hayko Michael Donoser

More information

A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance

A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance Fabio Martínez, Antoine Manzanera, Eduardo Romero To cite this version: Fabio Martínez,

More information

Expanding gait identification methods from straight to curved trajectories

Expanding gait identification methods from straight to curved trajectories Expanding gait identification methods from straight to curved trajectories Yumi Iwashita, Ryo Kurazume Kyushu University 744 Motooka Nishi-ku Fukuoka, Japan yumi@ieee.org Abstract Conventional methods

More information

Locally Adaptive Regression Kernels with (many) Applications

Locally Adaptive Regression Kernels with (many) Applications Locally Adaptive Regression Kernels with (many) Applications Peyman Milanfar EE Department University of California, Santa Cruz Joint work with Hiro Takeda, Hae Jong Seo, Xiang Zhu Outline Introduction/Motivation

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Segmentation of MR Images of a Beating Heart

Segmentation of MR Images of a Beating Heart Segmentation of MR Images of a Beating Heart Avinash Ravichandran Abstract Heart Arrhythmia is currently treated using invasive procedures. In order use to non invasive procedures accurate imaging modalities

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Spatio-Temporal Optical Flow Statistics (STOFS) for Activity Classification

Spatio-Temporal Optical Flow Statistics (STOFS) for Activity Classification Spatio-Temporal Optical Flow Statistics (STOFS) for Activity Classification Vignesh Jagadeesh University of California Santa Barbara, CA-93106 vignesh@ece.ucsb.edu S. Karthikeyan B.S. Manjunath University

More information

Image Classification based on Saliency Driven Nonlinear Diffusion and Multi-scale Information Fusion Ms. Swapna R. Kharche 1, Prof.B.K.

Image Classification based on Saliency Driven Nonlinear Diffusion and Multi-scale Information Fusion Ms. Swapna R. Kharche 1, Prof.B.K. Image Classification based on Saliency Driven Nonlinear Diffusion and Multi-scale Information Fusion Ms. Swapna R. Kharche 1, Prof.B.K.Chaudhari 2 1M.E. student, Department of Computer Engg, VBKCOE, Malkapur

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu

More information

Hand-Eye Calibration from Image Derivatives

Hand-Eye Calibration from Image Derivatives Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Evaluation of local descriptors for action recognition in videos

Evaluation of local descriptors for action recognition in videos Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,

More information

Learning Human Actions by Combining Global Dynamics and Local Appearance

Learning Human Actions by Combining Global Dynamics and Local Appearance IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Learning Human Actions by Combining Global Dynamics and Local Appearance Guan Luo, Shuang Yang, Guodong Tian, Chunfeng Yuan, Weiming Hu,

More information

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada Spatio-Temporal Salient Features Amir H. Shabani Vision and Image Processing Lab., University of Waterloo, ON CRV Tutorial day- May 30, 2010 Ottawa, Canada 1 Applications Automated surveillance for scene

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Action Recognition from a Small Number of Frames

Action Recognition from a Small Number of Frames Computer Vision Winter Workshop 2009, Adrian Ion and Walter G. Kropatsch (eds.) Eibiswald, Austria, February 4 6 Publisher: PRIP, Vienna University of Technology, Austria Action Recognition from a Small

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map? Announcements I HW 3 due 12 noon, tomorrow. HW 4 to be posted soon recognition Lecture plan recognition for next two lectures, then video and motion. Introduction to Computer Vision CSE 152 Lecture 17

More information

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14 Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Minimizing hallucination in Histogram of Oriented Gradients

Minimizing hallucination in Histogram of Oriented Gradients Minimizing hallucination in Histogram of Oriented Gradients Javier Ortiz Sławomir Bąk Michał Koperski François Brémond INRIA Sophia Antipolis, STARS group 2004, route des Lucioles, BP93 06902 Sophia Antipolis

More information

Combined Central and Subspace Clustering for Computer Vision Applications

Combined Central and Subspace Clustering for Computer Vision Applications for Computer Vision Applications Le Lu Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA LELU@CS.JHU.EDU René Vidal RVIDAL@CIS.JHU.EDU Center for Imaging Science, Biomedical

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Online Motion Classification using Support Vector Machines

Online Motion Classification using Support Vector Machines TO APPEAR IN IEEE 4 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, APRIL 6 - MAY, NEW ORLEANS, LA, USA Online Motion Classification using Support Vector Machines Dongwei Cao, Osama T Masoud, Daniel

More information

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Maxime Devanne 1,2, Hazem Wannous 1, Stefano Berretti 2, Pietro Pala 2, Mohamed Daoudi 1, and Alberto Del

More information

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Literature Survey Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This literature survey compares various methods

More information

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid

More information

Learning Human Motion Models from Unsegmented Videos

Learning Human Motion Models from Unsegmented Videos In IEEE International Conference on Pattern Recognition (CVPR), pages 1-7, Alaska, June 2008. Learning Human Motion Models from Unsegmented Videos Roman Filipovych Eraldo Ribeiro Computer Vision and Bio-Inspired

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Gait analysis for person recognition using principal component analysis and support vector machines

Gait analysis for person recognition using principal component analysis and support vector machines Gait analysis for person recognition using principal component analysis and support vector machines O V Strukova 1, LV Shiripova 1 and E V Myasnikov 1 1 Samara National Research University, Moskovskoe

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

Discriminative human action recognition using pairwise CSP classifiers

Discriminative human action recognition using pairwise CSP classifiers Discriminative human action recognition using pairwise CSP classifiers Ronald Poppe and Mannes Poel University of Twente, Dept. of Computer Science, Human Media Interaction Group P.O. Box 217, 7500 AE

More information

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Recognition: Face Recognition. Linda Shapiro EE/CSE 576 Recognition: Face Recognition Linda Shapiro EE/CSE 576 1 Face recognition: once you ve detected and cropped a face, try to recognize it Detection Recognition Sally 2 Face recognition: overview Typical

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de

More information

Color-Based Classification of Natural Rock Images Using Classifier Combinations

Color-Based Classification of Natural Rock Images Using Classifier Combinations Color-Based Classification of Natural Rock Images Using Classifier Combinations Leena Lepistö, Iivari Kunttu, and Ari Visa Tampere University of Technology, Institute of Signal Processing, P.O. Box 553,

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL

ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL Maria Sagrebin, Daniel Caparròs Lorca, Daniel Stroh, Josef Pauli Fakultät für Ingenieurwissenschaften Abteilung für Informatik und Angewandte

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics

More information

Dynamic Texture with Fourier Descriptors

Dynamic Texture with Fourier Descriptors B. Abraham, O. Camps and M. Sznaier: Dynamic texture with Fourier descriptors. In Texture 2005: Proceedings of the 4th International Workshop on Texture Analysis and Synthesis, pp. 53 58, 2005. Dynamic

More information

Alternative Statistical Methods for Bone Atlas Modelling

Alternative Statistical Methods for Bone Atlas Modelling Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional

More information

Shape Descriptor using Polar Plot for Shape Recognition.

Shape Descriptor using Polar Plot for Shape Recognition. Shape Descriptor using Polar Plot for Shape Recognition. Brijesh Pillai ECE Graduate Student, Clemson University bpillai@clemson.edu Abstract : This paper presents my work on computing shape models that

More information

Available online at ScienceDirect. Procedia Computer Science 60 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 60 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science (15 ) 43 437 19th International Conference on Knowledge Based and Intelligent Information and Engineering Systems Human

More information

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France ABSTRACT

More information

Learning Human Actions with an Adaptive Codebook

Learning Human Actions with an Adaptive Codebook Learning Human Actions with an Adaptive Codebook Yu Kong, Xiaoqin Zhang, Weiming Hu and Yunde Jia Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

3D Object Recognition using Multiclass SVM-KNN

3D Object Recognition using Multiclass SVM-KNN 3D Object Recognition using Multiclass SVM-KNN R. Muralidharan, C. Chandradekar April 29, 2014 Presented by: Tasadduk Chowdhury Problem We address the problem of recognizing 3D objects based on various

More information

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators HW due on Thursday Face Recognition: Dimensionality Reduction Biometrics CSE 190 Lecture 11 CSE190, Winter 010 CSE190, Winter 010 Perceptron Revisited: Linear Separators Binary classification can be viewed

More information

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION 1 J.H. Ma, 1 P.C. Yuen, 1 W.W. Zou, 2 J.H. Lai 1 Hong Kong Baptist University 2 Sun Yat-sen University ICCV workshop on Machine

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

Fast Motion Consistency through Matrix Quantization

Fast Motion Consistency through Matrix Quantization Fast Motion Consistency through Matrix Quantization Pyry Matikainen 1 ; Rahul Sukthankar,1 ; Martial Hebert 1 ; Yan Ke 1 The Robotics Institute, Carnegie Mellon University Intel Research Pittsburgh Microsoft

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Shape Context Matching For Efficient OCR

Shape Context Matching For Efficient OCR Matching For Efficient OCR May 14, 2012 Matching For Efficient OCR Table of contents 1 Motivation Background 2 What is a? Matching s Simliarity Measure 3 Matching s via Pyramid Matching Matching For Efficient

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Silhouette Based Human Motion Detection and Recognising their Actions from the Captured Video Streams Deepak. N. A

Silhouette Based Human Motion Detection and Recognising their Actions from the Captured Video Streams Deepak. N. A Int. J. Advanced Networking and Applications 817 Silhouette Based Human Motion Detection and Recognising their Actions from the Captured Video Streams Deepak. N. A Flosolver Division, National Aerospace

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Temporal Feature Weighting for Prototype-Based Action Recognition

Temporal Feature Weighting for Prototype-Based Action Recognition Temporal Feature Weighting for Prototype-Based Action Recognition Thomas Mauthner, Peter M. Roth, and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {mauthner,pmroth,bischof}@icg.tugraz.at

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Final Report Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This report describes a method to align two videos.

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Artistic ideation based on computer vision methods

Artistic ideation based on computer vision methods Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,

More information

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford Video Google faces Josef Sivic, Mark Everingham, Andrew Zisserman Visual Geometry Group University of Oxford The objective Retrieve all shots in a video, e.g. a feature length film, containing a particular

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information