Supervised Spatio-Temporal Neighborhood Topology Learning for Action Recognition

Size: px
Start display at page:

Download "Supervised Spatio-Temporal Neighborhood Topology Learning for Action Recognition"

Transcription

1 Supervised Spatio-Temporal Neighborhood Topology Learning for Action Recognition Andy J Ma Abstract Supervised manifold learning has been successfully applied to human action recognition, in which class label information could improve recognition performance. However, the learned manifold may not be able to well preserve both the local structure and temporal pose correspondence of sequences from the same action. To overcome this problem, this paper proposes a new supervised manifold learning algorithm namely supervised spatio-temporal neighborhood topology learning (SSTNTL) for action recognition. By analyzing the topological characteristics in the context of action recognition, we propose to construct the neighborhood topology using both supervised spatial and temporal pose correspondence information. Employing the locality preserving property in LPP, SSTNTL solves the generalized eigenvalue problem to obtain the best projections that not only separating data points from different classes, but also preserving local structures and temporal pose correspondence of sequences from the same class. Experimental results demonstrate that SSTNTL outperforms the manifold embedding methods with other topologies as well as local discriminant information. Moreover, compared with stateof-the-art action recognition algorithms, SSTNTL gives the best performance for both human and gesture action recognition. 1 Introduction Action recognition is an active research topic in computer vision and pattern recognition, due to a wide range of potential applications, such as intelligent video surveillance, perceptual interface and content-based video retrieval. While many algorithms and systems have been developed in the last decade [32] [25] [1], recognizing actions in videos still remains challenging. In action recognition, a key issue is to extract useful action information from raw video data. So far various approaches have been proposed This paper has been submitted to IEEE Transactions on Circuits and Systems for Video Technology. to extract features from video sequences, such as key frame extraction [35] [2], space-time interest point detection and description [22] [13], key point trajectory based approaches [21] [26], etc. However, recognizing actions using key frames lacks of motion information. And the interest point and trajectory based methods extract local features, but do not take advantage of global space-time shape information. Action sequences characterized by poses deforming continuously over time, can be regarded as data points on dynamic action manifolds. In [33] [34], Locality Preserving Projection (LPP) [8] was employed to learn and match the dynamic shape manifolds for human action recognition. Other than LPP, there are many manifold learning methods such as Isometric feature Map (Isomap) [31], Locally Linear Embedding (LLE) [27] and Laplacian Eigenmap (LE) [3], which aim to discover the intrinsic geometrical structure of a data manifold. However, all these manifold learning algorithms do not make use of the temporal information in action sequences which is thought to be important. In [9], Spatio-Temporal Isomap (ST-Isomap) is proposed to construct the local neighbors emphasizing the similarity between the temporal related blocks. Besides ST-Isomap, the temporal extension to Laplacian Eigenmap (TLE) is proposed in [14] and achieves better performance. Both ST-Isomap and TLE are unsupervised methods. That means class label information does not take into account. In turn, it may not be efficiently applied to supervised dimensionality reduction problem. Recently, supervised manifold embedding methods have been proposed such as Locality Sensitive Discriminant Analysis (LSDA) [4], Local Fisher Discriminant Analysis (LFDA) [30], Marginal Fisher Analysis (MFA) [37], and Supervised Locality Preserving Projection (SLPP) [39]. Although these methods show that label information is important, they do not take temporal cues into account. In this context, Jia and Yeung [10] proposed a local spatio-temporal discriminant embedding (LSTDE) method to discover the local spatial and local temporal discriminant structures. LSTDE is derived by maximizing the inter-class variance and minimizing the intra-class variance based on

2 the local spatio-temporal information. In this case, recognition based on the whole sequence but not local segment may not benefit from LSTDE too much. Moreover, the solution to LSTDE is based on an iterative scheme, but there is no justification showing that whether the iterative solution is converged and optimal. In addition, different from Isomap, LLE, or LE, LSTDE is not derived from the aspect of manifold property, so the manifold interpretation of LSTDE is unclear. To overcome the limitations mentioned above, this paper proposes a new method to learn the manifold structure by using supervised spatial and temporal pose correspondence information from topological aspect of manifold. As we know, topology is an important concept in the definition of a manifold [5]. By analyzing the topological base in existing methods for action recognition, we define a new topology by spatial and class label information, which is used to construct the supervised spatial neighborhood (SSN). However, SSN does not take full advantage of the temporal information. Thus, we further analyze the temporal pose correspondence between sequences of the same action, and develop the temporal pose correspondence neighborhood (T- PCN). The supervised spatial and temporal pose correspondence information is fused by taking the union of SSN and TPCN. At last, the optimal linear projection functions preserving neighborhood relationship are obtained by solving a generalized eigenvalue problem. Our proposed method not only takes advantage of the supervised spatial distribution, but also temporal pose correspondence information to learn the neighborhood topology. We call it as supervised spatiotemporal neighborhood topology learning (SSTNTL). The rest of this paper is organized as follows. In section 2, we present the proposed method for action recognition. Experimental results and conclusion are reported in Sections 3 and 4, respectively. 2 Supervised Spatio-Temporal Neighborhood Topology Learning As we know, the major difference between LPP and SLPP is the adjacency graph. In this section, we show that the adjacency graph is intimately related to the topological base from mathematical perspective. As we know, topology is an important concept in the definition of a manifold, so we will first give a topological analysis in the context of action recognition. Then, we propose to construct the neighborhood topology by the supervised spatial as well as temporal pose correspondence information. Finally, employing the locality preserving property in LPP, we propose the SSTNTL for action recognition. Figure 1. Similar pose in two different actions. 2.1 Topological Analysis for Action Recognition In action recognition, actions are regarded as data points on a manifold. Supposes there are c classes of actions and data points from all the actions lie on a smooth and compact manifold. Different actions may be close to each other on the manifold, because different actions share similar poses. This is illustrated in Fig. 1, which shows two actions, namely run and walk. We can see that poses (frames) indicated with red rectangle in run and walk actions are similar. As such, certain data points from these two actions will be close together. This means sequences representing these two actions may be close with each other. In this case, the two actions in Fig. 1 can be represented by data points (diamond and circle represent different actions) as shown in Fig. 2(a). Denote the two actions as A i and A j. And let A = A i A j. Usually, the topology defined on A is induced by the Euclidean topology. Denote this topology as τ e. Since topology is too abstract to understand, we investigate the topological base [11] instead. In mathematics, if τ = B, where B represents a family of sets generated by the family of sets B, i.e. B = {U X x U, B B, s.t.x B U}, B is called the topological base of the topological space (A, τ). With this representation, τ e can be written as τ e = B e, B e = {A B(x n, ε) x n A, ε > 0} (1) where B(x n, ε) is a ball centered at x n and with radius ε. The topology in LPP is constructed with fixed ε in τ e, which can be written as τ LPP = B LPP, B LPP = {A B(x n, ε LPP ) x n A} (2) where ε LPP is the parameter for the ε-neighborhood. For suitable parameter ε LPP, the topological base in LPP can be represented by ovals in Fig. 2(b). Based on the topological base, the adjacency graph in LPP is constructed by putting a link between any two points in the same oval, which is also shown in Fig. 2(b). From Fig. 2(b), we can see that close points from two different actions are connected together, so projections learned by LPP will map these points into lowdimensional space as close as possible. However, this is not what we want for classification.

3 (a) (b) (c) (d) Figure 2. (a) Visualization of two close actions. Visualization of the topological base and adjacency graphs in (b) LPP, (c) SLPP and (d) the proposed supervised spatial method. Topologies defined on A can have many possible variations. Besides the Euclidean topology, in SLPP, the topology is given by τ SLPP = B SLPP, B SLPP = {A i, A j } (3) The topological base and adjacency graph in SLPP are shown as ovals and edges in Fig. 2(c). Since SLPP only consider the class information, every two points from the same class are connected by an edge on the graph in Fig. 2(c). Therefore, the topology in SLPP hardly contains any local information of the data. 2.2 Supervised Spatial Neighborhood Topology Construction According to the analysis in the above section, we can see that the topological base plays an important role in both LPP and SLPP. With spatial distribution and class label information of the data, the topological bases B LPP in LPP and B SLPP SLPP can be constructed. However, B LPP cannot separate data points from the same class, and B SLPP hardly contains any local information. In order to ensure that the topological base not only preserves local structure, but also be discriminative, we take the intersection of B LPP and B SLPP and define the supervised spatial topological base as B SS = {B LPP B SLPP B LPP B LPP, B SLPP B SLPP } (4) Since B LPP = A B(x n, ε LPP ) and B SLPP = A i or A j, B SS in (5) can be rewritten and the novel topology can be defined as τ SS = B SS, B SS = {A k B(x n, ε SS ) x n A k, k = i or j} The topological base B SS with (5) and adjacency graph constructed by B SS are shown in Fig. 2(d). In this new topology and adjacency graph, A can be separated as two connected components, A i and A j. In addition, close points from the (5) Figure 3. Visualization of the temporal continuity in an action sequence. same action are connected by an edge on the graph. Therefore, the proposed supervised spatial topology τ SS not only preserves local structure of data points from the same class but also separates data points from different classes. Besides spatial and label information, the supervised s- patial topology τ SS contains temporal adjacent information as well. As shown in Fig. 3, action sequences are characterized by poses deforming continuously over time. In mathematics, the temporal continuity means that for suitable ε SS, there exists a positive integer δ, s.t. x n+t x n ε SS, when t δ. By the definition of the supervised spatial topological base, it has x n+t B SS B SS, when t δ (6) This equation means that the temporal adjacent neighbors as indicated by red ovals in Fig. 3 are contained in the supervised spatial neighbors. Moreover, the construction of B SS avoids the non-trivial problem of selection of the adjacent parameter, which is another advantage of the proposed method. In practice, it is difficult to choose an optimal ε SS when using ε neighborhood to construct the adjacency graph. Therefore, we construct the graph using k-nn. However, the number of data points from different classes may not be the same, so we choose a percentage parameter a SS instead of k to construct the neighborhood of each data point. The algorithmic procedure to construct the supervised spatial neighborhood N SS (x n ) for each x n is stated in Fig Temporal Pose Correspondence Neighborhood Topology Construction Although the supervised spatial neighborhood N SS contains temporal adjacent neighbors as mentioned in the above section, the construction of N SS still does not take full advantage of the temporal information. If we consider different sequences of the same action bend as shown in Fig. 5, it can be observed that different sequences share the similar poses deforming similarly over time. However, if the neighborhood is constructed only by spatial information, the corresponding poses in different sequences of the same actions may not be close to each other, due to the background

4 Algorithm 1. Constructing N SS Input: Training samples x 1,, x N ; Corresponding labels y 1,, y N ; Percentage parameter a SS ; Output: Supervised spatial neighborhood N SS ; for i = 1,, c Compute number of samples for class i, N i ; Calculate k-nn parameter k SS i = a SS N i for n = 1,, N SS Select the k yn nearest samples respect to x n from class y n as N SS (x n ); return N SS (x 1 ),, N SS (x N ). Figure 4. Algorithm 1. Construction of the supervised spatial neighborhood. and appearance changes. Thus, the temporal pose correspondence between sequences of the same action may not be discovered by the supervised spatial neighbors. In this context, we propose to construct the neighborhood by the temporal pose correspondence directly. Denote two complete sequences of the same action 1 with segmented regions of interest as A i1 = {x i1 1,, xi1 N i1 } and A i2 = {x i2 1,, xi2 N i2 }. For non-periodic actions, s- ince we assume that the action sequences are complete, the beginning poses and ending ones in A i1 and A i2 are the same. However, different sequences of the same action are performed by different subjects, so A i1 and A i2 may contain different numbers of frames, and the pose correspondence between A i1 and A i2 needs to be found out. Dynamic time warping (DTW) [28] is an algorithm which measures the similarity bewteen two sequences which may vary in speed. With DTW, the temporal pose correspondence is represented by the warping function F = [(f i1 (1), f i2 (1)),, (f i1 (t), f i2 (t)),, (f i1 (T ), f i2 (T ))] where 1 f i1 (t) N i1, 1 f i2 (t) N i2, and T is integer, s.t. T max{n i1, N i2 }. With the five constraints proposed in [28], i.e. monotonic, continuity, boundary, adjustment window, and slop constraints, DTW finds the warping function F by solving the following optimization problem min F T t=1 (7) ω(t) x i1 f i1 (t) xi2 f i2 (t) (8) 1 For actions like run and walk, the moving direction is assumed to be the same Figure 5. Topological visualization of temporal pose correspondence neighborhood. where ω(t) is the non-negative weighting coefficient. For symmetric form, ω(t) = (f i1 (t) f i1 (t 1)) + (f i2 (t) f i2 (t 1)), while ω(t) = (f i1 (t) f i1 (t 1)) or ω(t) = (f i2 (t) f i2 (t 1)) for asymmetric form. In [28], the DTW distance matrix is constructed by the dynamic programming (DP) based time-normalization algorithm. Then, the optimal F is obtained by searching the path in matrix with minimal cost. (More details about DTW can be referred to [28].) For the periodic action, the beginning poses and ending ones in A i1 and A i2 may not be the same, so DTW cannot be applied directly. Suppose N i1 N i2. And denote the permute function σ t0, s.t. σ t0 (t) = t t 0, if t 0 < t N i1 ; σ t0 (t) = t + N i1 t 0, if 0 t t 0, where t 0 is a shifting integer from 0 to N i1 1. In this case, the optimal F is obtained by min σ min F T t=1 ω(t) x i1 σ t0 (f i1 (t)) xi2 f i2 (t) (9) In order to solve optimization problem (9), we fix t 0 to compute the minimum value respective to F by DTW. Then, choosing the lowest cost with 1 t 0 N i1, the optimal solution is achieved. The optimal warping function F defined by (7) gives the correspondence between similar poses in different sequences of the same action. However, the difference between the corresponding frames may be very large due to the background and appearance changes. Consequently, if corresponding frames of the same pose in different sequences are in the same neighborhood, the manifold structure may be destroyed. Thus, neighboring corresponding frames are selected as the base of the temporal pose correspondence neighborhood topology, i.e. τ TPC = B TPC, B TPC = {C(x n ) B(x n, ε TPC ) x n X} (10) where X is the universal set and C(x n ) denotes the set containing poses in different sequences corresponding to x n.

5 Algorithm 2. Constructing N TPC Input: Training sequences A 1,, A M ; Corresponding labels y 1,, y M ; Percentage parameter a TPC ; Output: Temporal pose correspondence neighborhood N TPC ; for i = 1,, c Find action sequences with label i; for m = 1,, M i for n = m + 1,, M i Obtain warping function F Ai m,a in by solving (9) or (10); for n = 1,, N Construct corresponding set to x n, C(x n ) by F; Count the number of elements in C(x n ), # C(x n ) ; Set k n TPC = a TPC # C(x n ) ; Select the k n TPC nearest samples in C(x n ) as N TPC (x n ); return N TPC (x 1 ),, N TPC (x N ). Figure 6. Algorithm 2. Construction of the temporal pose correspondence neighborhood. The topological base constructed by the temporal pose correspondence is indicated by the red ovals in Fig. 5. Similarly, the temporal pose correspondence neighborhood is constructed by k-nn. And a percentage parameter a TPC is used instead of k. Denote M i is the number of sequences for action i. The algorithmic procedure to construct the temporal pose correspondence neighborhood N TPC (x n ) for each x n is stated in Fig Supervised Spatial and Temporal Pose Correspondence Neighborhood Topology Learning As mentioned in Sections 2.2 and 2.3, the topological base B SS contains supervised spatial information, while B TPC consists of temporal pose correspondence. In order to ensure that the topological base embodies both information, we take the union of B SS and B TPC, and the fused topology namely, supervised spatio-temporal neighborhood topology is defined as τ ST = B ST, B ST = {B ST B ST B SS or B ST B TPC } (11) Based on (11), the fused neighborhood of each x n is given by N ST (x n ) = N SS (x n ) N TPC (x n ) (12) Suppose manifold M ST is defined with the neighborhood topology τ ST. In [3], it is shown that the optimal mapping f preserving locality on manifold M ST is given by solving the following optimization problem arg min f, s.t. f L 2 (MST ) =1 M ST f 2 (13) In order to avoid the out-of-sample problem, the mapping f is restricted to be linear. In [8], it is shown that the optimal linear projections preserving locality can be obtained by solving the following optimization problem, arg min e, s.t. e T XDX T e=1 e T XLX T e (14) where L and D are the Laplacian and diagonal matrices. Since the Laplacian L and diagonal matrix D are s- parse and with special structure, XLX T and XDX T can be computed by the following equations XLX T = ( L X1 T ) X 1 X c L c Xc T c = X i L i Xi T i=1 i=1 (15) XDX T = ( D X1 T ) X 1 X c D c Xc T (16) c = X i D i Xi T At last, the algorithmic procedure of the proposed SST- NTL method is presented in Fig Experiments In this section, we evaluate the proposed SSTNTL on three publicly available action databases: Weizmann [7], KTH [29] human action databases and Cambridge-Gesture database [12]. First, we give a brief introduction to these 3 databases and the experimental settings in Sections 3.1. Then, the results on these 3 databases are reported in Sections 3.2, 3.3 and 3.4.

6 Algorithm 3. SSTNTL Input: Training sequences A 1,, A M ; Corresponding labels y 1,, y M ; Parameters a SS, a TPC, l; Output: Projection matrix P = (e 1,, e l ); Construct N SS by Algorithm 1; Construct N TPC by Algorithm 2; Construct N ST by (12); Set matrices M L = 0 and M D = 0; for i = 1,, c T Compute M L = M L + X i L i X i and M D = M D + X i D i X T i ; Solve the eigenvalue problem M L e = λm D e; Sort the eigenvectors e 1,, e l by eigenvalues λ 1 λ l ; return P = (e 1,, e l ). Figure 7. Algorithm 3. The algorithmic procedure of SSTNTL. 3.1 Database and Setting Weizmann database [7] contains 93 videos from nine persons, each performing ten actions, i.e. bend, jumping jack, jump in place on two legs, jump forward on two legs, galloping sideways, skip, run, walk, wave one hand, and wave two hands. Motion detection method introduced in [18] is used to extract region of interest from the raw videos. Since the actions in this database except bend are periodic, each video of the periodic actions contains multiple periods of the same action. We segment the action sequences into sets of periodic motion cycles by the information saliency curve [18] following the procedure in [16] [17]. One period in each sequence is used in our experiments, and all the frames are normalized into the same dimension as pixels. We follow the common nine-fold cross validation protocol [15] [38] to e- valuate the proposed method. The average recognition rate is recorded. There are 25 subjects performing six actions under four scenarios in KTH database [29]. The six actions include boxing, hand clapping, hand waving, jogging, running and walking. The four different scenarios are outdoors (S1), outdoors with scale variations (S2), outdoors with different clothes (S3) and indoors (S4). Similar procedure is performed on this database to extract action periods by the information saliency method. In order to make the features less sensitive to background and appearance changes, we use the subtractions of every successive two frames instead of the original grey-scale images in this database. The leave-one-out cross validation protocol in [26] [23] [36] is used to evaluate the proposed method. And the average recognition rate over 25 runs for each scenario is recorded. Cambridge-Gesture database [12] consists of 900 image sequences of nine hand gesture classes under five kinds of illuminations. The nine hand gesture actions include Flat-Leftward (FL), Flat-Rightward (FR), Flat- Contract (FC), Spread-Leftward (SL), Spread-Rightward (SR), Spread-Contract (SC), V-Shape-Leftward (VL), V- Shape-Rightward (VR), and V-Shape-Contract (VC). Each class and illumination set contains 20 sequences. The holistic representation of spatial envelope [24] is used to compute feature vector of 960 dimension for each image. Following the experimental protocol in [12] [19], 20 sequences per class with plain illumination is randomly partitioned into 10 sequences for training and the other 10 sequences for validation, while testing is performed in the data sets with the other four illuminations. For these three databases, Principal Component Analysis (PCA) [6] is used for dimensionality reduction to avoid the singular matrix problem and improve the computational efficiency. The dimension of PCA is determined by keeping around 90% energy. After performing PCA, manifold embedding methods are used to extract features for action recognition. In our experiments, we compare the proposed SSTNTL with SNTL [20], LPP [8], SLPP [39], LSDA [4] and LSTDE [10]. The weight matrix is calculated by the simple minded method to avoid the parameter selection by the heat kernel weighting. As shown in Algorithm 3 in Fig. 7, parameters a SS, a TPC, l need to be determined in SSTNTL. Since it is difficult and time-consuming to search the best combination of the 3 parameters simultaneously, we determine the parameters in a two-stage scheme. First, setting l = 60, a SS and a TPC are s- elected from {0.02, 0.04,, 0.2} and {0.1, 0.2,, 0.9} respectively. Second, the best dimension l is selected from two to the determined PCA dimension with step size two for fixed a SS and a TPC. On the other hand, the parameter selection procedure in other methods is similar to that in SSTNTL. The neighborhood parameter in SNTL, LPP and LSDA is selected from the same set as a SS in SST- NTL. Since LSTDE is very slow when the neighborhood is large, the neighborhood parameter in LSTDE is selected from {0.01, 0.02, 0.03} instead. After feature extraction, we use a nearest neighbor framework in the classification stage. Let A test be a testing action sequence. The class label of A test is given by y test = y arg min m d(p T A test,p T A m) (17) where d measures the distance between two sequences u and v, e.g. u = P T A test and v = P T A m. Since median Hausdorff distance gives more robust results as mentioned

7 Figure 8. Recognition accuracy (%) of SST- NTL with different values of a SS and a TPC, and fixed l = 60 on Weizmann database. Figure 9. Recognition accuracy with the best neighborhood parameter and different embedded dimensions on Weizmann database. in [33], we define d as (18) in our experiments. d(u, v) =median j min u i v j i + median i min u i v j j (18) 3.2 Results on Weizmann Human Action Database The recognition accuracies of SSTNTL with different values of a SS and a TPC, and fixed l = 60 on this database are shown in Fig. 8. The darker element of the matrix in Fig. 8 represents higher recognition rate. The columns are the recognition rates with changing a SS and specific a TPC, while rows are recognition rates with changing a TPC and specific a SS. From Fig. 8, we can see that the highest accuracy of 100% is achieved in this database, when a SS = 0.06 and a TPC = 0.9. From the last row in Fig. 8, we observe that the recognition rates do not change when a SS = 0.2 for different a TPC. This may be due to the reason that the supervised spatial neighborhood N SS already contains the temporal pose correspondence neighborhood N TPC, when a SS = 0.2. Thus, SSTNTL may be the same as SNTL, when a SS is large enough. Fig. 9 shows the recognition accuracies of different methods with different dimension. From Fig. 9, we can see that the recognition rate of SSTNTL is higher than those of all the other methods in every dimensions. On the other hand, when the dimension is less than 20, SSTNTL, SNTL, LPP and SLPP outperform LSDA and LSTDE a lot. This result suggests that the neighborhood topology learning methods with clear manifold interpretation are much better than those without clear one, when the dimension is small. The highest accuracy and corresponding optimal dimension of each method are recorded in the left hand side of Table 1. From Table 1, we can see that SSTNTL not only achieves the best recognition accuracy of 100%, but also has the lowest optimal dimension. This means SSTNTL can recognize actions correctly, while only need low storage. On the other hand, compared with state-of-the-art algorithms on this database, SSTNTL outperforms Expandable Data-Driven Graphical Model (EDDGM) [15] and gives the equally best performance with the Boosted Exemplar Learning (BEL) method [38]. 3.3 Results on KTH Human Action Database On this database, the highest recognition accuracies of different manifold embedding methods under the four scenarios are presented in the top six rows in Table 2. It can be seen that both SSTNTL and SNTL achieve the best performance under scenario 1 and 3. SSTNTL, SNTL and L- STDE are performing better than other methods under scenario 4, while SSTNTL outperforms other manifold embedding methods under scenario 2. As observed in Table 2, the recognition rates under scenario 1, 3 and 4 are higher than those under scenario 2. This means that the motion features for scenario 1, 3 and 4 are more discriminative than those in scenario 2. Under this situation, it is possible that the temporal corresponding poses in different sequences of the same actions are nearly the same and in the same supervised spatial neighborhood. Consequently, SSTNTL and SNTL give the same recognition rates under scenario 1, 3 and 4. For scenario 2, the features are less discriminative. However, SSTNTL achieves a remarkable improvement compared to the other manifold embedding methods, since SSTNTL takes advantage of the temporal pose cor-

8 Method SSTNTL SNTL [20] LPP [8] SLPP [39] LSDA [4] LSTDE [10] EDDGM [15] BEL [38] Accuracy Dimension Table 1. Recognition accuracy (%) and corresponding optimal dimension on Weizmann database. Method S1 S2 S3 S4 Mean SSTNTL SNTL [20] LPP [8] SLPP [39] LSDA [4] LSTDE [10] Tracklet [26] HSTM [23] AFMKL [36] (a) SSTNTL (b) SNTL [20] Table 2. Recognition accuracy (%) of different methods on KTH database respondence to construct the topology. This also convinces that the temporal pose correspondence information can help to recognize actions. We also compare the manifold embedding methods with state-of-the-art human action recognition algorithms. The recognition rates of Tracklet method [26], Hierarchical Space-Time Model (HSTM) [23] and Multiple Kernel Learning with Augmented Features (AFMKL) [36] are p- resented in the last three rows in Table 2. Compared with all the methods in Table 2, SSTNTL gives the best performance together with SNTL under scenario 1 and 3, and with SNTL and LSTDE under scenario 4, while the Tracklet outperforms the others under scenario 2. The reason that why SSTNTL is not better than Tracklet under scenario 2 can be explained as follows. The features used in SSTNTL are obtained by motion detection. Since motion detection under scenario 2 with scale variations is more challenged than that under the other three scenarios, the features obtained in scenario 2 must be less robust than those in other scenarios. Thus, the performance of SSTNTL drops together with the other manifold embedding methods. Since the Tracklet and AFMKL methods rely on local features without motion detection, they outperform the embedding methods under scenario 2. In addition, we compare the mean accuracies of all the methods in the last column in Table 2. The mean recognition rate of SSTNTL is higher than those of other manifold embedding methods as well as the state-of-the-art human action recognition algorithms. This convinces that SSTNTL is preferable for human action recognition. While SSTNTL achieves the highest average recognition rate of 95.7%, we are interested in investigating which ac- (c) Tracklet [26] (d) AFMKL [36] Figure 10. Confusion matrix on KTH database tions were misclassified. The confusion matrices of the best four algorithms are shown in Fig 10. We can see that actions Jogging and Running were confused with each other for all the methods. This is because Jogging and Running are nearly the same except the speed performing the two actions. Another observation is that the recognition rates for each action with SSTNTL, SNTL and AFMKL are larger than or equal to 90%, while only SSTNTL achieves the 100% accuracy among these 3 methods. This means that SSTNTL not only gives high performance on all the actions, but also recognizes some action totally correctly, which is another advantage of the proposed method. 3.4 Results on Hand Gesture Action Database Recognition accuracies for different illumination sets on this database are presented in Table 3. Compared with the manifold embedding methods, SSTNTL and SNTL obtain the highest recognition rate for illumination set 1, while SSTNTL outperforms the others for sets 2, 3, 4 and the mean accuracy. This convinces that SSTNTL is better than the topology learning methods with other topologies, as well as the local discriminative algorithms even with temporal information. We also compare the proposed method with state-of-

9 Method Set 1 Set 2 Set 3 Set 4 Mean SSTNTL SNTL [20] LPP [8] SLPP [39] LSDA [4] LSTDE [10] TCCA [12] PM [19] (a) SSTNTL (b) LPP [8] Table 3. Recognition accuracy (%) of different methods on Cambridge-Gesture database. the-art gesture action recognition algorithms. The Tensor Canonical Correlation Analysis (TCCA) [12] and Product Manifold (PM) [19] are used for comparison with the manifold embedding methods. From Table 3, we can see that all the SSTNLT, SNTL and PM methods get the highest accuracy 89% for illumination set 1. The product manifold method outperforms the others for set 3, while SSTNTL gives the best performance for sets 2 and 4. Compared with mean recognition rates, SSTNTL achieves the highest mean accuracy of 89% in this database. At last, the confusion matrices of the best four algorithms are shown in Fig. 11. Compared the diagonal elements in the confusion matrices, SSTNTL outperforms S- NTL on eight actions, TCCA and product manifold method on five actions, respectively. This means SSTNTL outperforms the others on more than half of the gesture actions in this database. Overall speaking, SSTNTL is also performing well for hand gesture action recognition. 4 Conclusion In this paper, we have proposed a novel manifold learning method, namely supervised spatio-temporal neighborhood topology learning (SSTNTL) for action classification. Starting from analyzing the topological characteristics in the context of action recognition, we proposed to construct the neighborhood topology from two aspects. First, the spatial distribution containing local structures, as well as the label information separating data points from different class are used to construct the supervised spatial topology. Second, the temporal pose correspondence neighborhood is constructed by discovering the global similarity between action sequences of the same class. These two neighborhood topologies are fused by taking the union of them. Based on the fused topology, SSTNTL employs the locality preserving property in LPP, and solves the generalized eigenvalue problem to obtain the best projections that not only separating data points from different classes, but also preserving local structures and temporal pose correspondence (c) TCCA [12] (d) PM [19] Figure 11. Confusion matrix on Hand Gesture confusion matrices of sequences from the same class. The proposed algorithm is evaluated on three publicly available action video databases. Experimental results show that SSTNTL outperforms the neighborhood topology learning methods with other topologies, as well as the local discriminant algorithms even with temporal information. On the other hand, by analyzing the 2D visualizations of the training and testing data, we show that the generalization ability of the topology learning methods is better than that of the local discriminability based methods, when the reduced dimension is low. And SSTNTL shows the best generalization ability. In addition, compared with stateof-the-art action recognition algorithms, SSTNTL gives the best performance for both human and gesture action recognition. References [1] J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43(3):16:1 16:43, [2] S. Baysal, M. C. Kurt, and P. Duygulu. Recognizing human actions using key poses. Proceedings of the IEEE International Conference on Pattern Recognition, pages , [3] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, [4] D. Cai, X. He, K. Zhou, J. Han, and H. Bao. Locality sensitive discriminant analysis. Proceedings of the 20th International Joint Conference on Artifical intelligence, [5] M. P. do Carmo. Riemannian geometry. Birkhauser, 1993.

10 [6] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, [7] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12): , [8] X. He and P. Niyogi. Locality preserving projections. Advances in Neural Information Processing Systems, [9] O. C. Jenkins and M. J. Matarić. A spatio-temporal extension to Isomap nonlinear dimension reduction. Proceedings of The International Conference on Machine Learning, pages , [10] K. Jia and D.-Y. Yeung. Human action recognition using local spatio-temporal discriminant embedding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 8, [11] J. L. Kelley. General topology. Birkhauser, [12] T.-K. Kim and R. Cipolla. Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8): , [13] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 8, [14] M. Lewandowski, J. M. del Rincon, D. Makris, and J.-C. Nebel. Temporal extension of laplacian eigenmaps for unsupervised dimensionality reduction of time series. Proceedings of the IEEE International Conference on Pattern Recognition, pages , [15] W. Li, Z. Zhang, and Z. Liu. Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Transactions on Circuits and Systems for Video Technology, 18(11): , [16] C. Liu and P. C. Yuen. Human action recognition using boosted eigenactions. Image and Vision Computing, 28(5): , [17] C. Liu and P. C. Yuen. A boosted co-training algorithm for human action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 21(9): , [18] C. Liu, P. C. Yuen, and G. Qiu. Object motion detection using information theoretic spatio-temporal saliency. Pattern Recognition, 42(11): , [19] Y. M. Lui, J. R. Beveridge, and M. Kirby. Action classification on product manifolds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [20] A. J. Ma, P. C. Yuen, W. Zou, and J.-H. Lai. Supervised neighborhood topology learning for human action recognition. IEEE International Conference on Computer Vision Workshops, pages , [21] R. Messing, C. Pal, and H. Kautz. Activity recognition using the velocity histories of tracked keypoints. Proceedings of the IEEE International Conference on Computer Vision, pages , [22] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3): , [23] H. Ning, T. X. Han, D. B. Walther, M. Liu, and T. S. Huang. Hierarchical space-time model enabling efficient search for human actions. IEEE Transactions on Circuits and Systems for Video Technology, 19(6): , [24] A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3): , [25] R. Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28(6): , [26] M. Raptis and S. Soatto. Tracklet descriptors for action modeling and video analysis. Proceedings of European Conference on Computer Vision, 6311/2010: , [27] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290: , [28] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 2(1):43 49, [29] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. Proceedings of the IEEE International Conference on Pattern Recognition, [30] M. Sugiyama. Local fisher discriminant analysis for supervised dimensionality reduction. Proceedings of the 23rd International Conference on Machine learning, [31] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290: , [32] P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea. Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 18(11): , [33] L. Wang and D. Suter. Learning and matching of dynamic shape manifolds for human action recognition. IEEE Transactions on Image Processing, 16(6): , [34] L. Wang and D. Suter. Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Computer Vision and Image Understanding, 110(2): , [35] D. Weinland and E. Boyer. Action recognition using exemplar-based embedding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 7, [36] X. Wu, D. Xu, L. Duan, and J. Luo;. Action recognition using context and appearance distribution features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [37] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1):40 51, [38] T. Zhang, J. Liu, C. X. Si Liu, and H. Lu. Boosted exemplar learning for action recognition and annotation. IEEE Transactions on Circuits and Systems for Video Technology, 21(7): , [39] Z. Zheng, F. Yang, W. Tan, J. Jia, and J. Yang. Gabor featurebased face recognition using supervised locality preserving projection. Signal Processing, 87: , 2007.

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION 1 J.H. Ma, 1 P.C. Yuen, 1 W.W. Zou, 2 J.H. Lai 1 Hong Kong Baptist University 2 Sun Yat-sen University ICCV workshop on Machine

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

Appearance Manifold of Facial Expression

Appearance Manifold of Facial Expression Appearance Manifold of Facial Expression Caifeng Shan, Shaogang Gong and Peter W. McOwan Department of Computer Science Queen Mary, University of London, London E1 4NS, UK {cfshan, sgg, pmco}@dcs.qmul.ac.uk

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

The Analysis of Parameters t and k of LPP on Several Famous Face Databases The Analysis of Parameters t and k of LPP on Several Famous Face Databases Sujing Wang, Na Zhang, Mingfang Sun, and Chunguang Zhou College of Computer Science and Technology, Jilin University, Changchun

More information

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China Send Orders for Reprints to reprints@benthamscienceae The Open Automation and Control Systems Journal, 2015, 7, 253-258 253 Open Access An Adaptive Neighborhood Choosing of the Local Sensitive Discriminant

More information

Manifold Learning for Video-to-Video Face Recognition

Manifold Learning for Video-to-Video Face Recognition Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose

More information

Head Frontal-View Identification Using Extended LLE

Head Frontal-View Identification Using Extended LLE Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging

More information

Globally and Locally Consistent Unsupervised Projection

Globally and Locally Consistent Unsupervised Projection Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering

More information

Human Action Recognition Using Independent Component Analysis

Human Action Recognition Using Independent Component Analysis Human Action Recognition Using Independent Component Analysis Masaki Yamazaki, Yen-Wei Chen and Gang Xu Department of Media echnology Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577,

More information

1646 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 6, JUNE 2007

1646 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 6, JUNE 2007 1646 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 6, JUNE 2007 Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition Liang Wang and David Suter Abstract In this paper, we

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Action Recognition with HOG-OF Features

Action Recognition with HOG-OF Features Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient

More information

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:

More information

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR-3000 Camera Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

More information

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology

More information

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Isometric Mapping Hashing

Isometric Mapping Hashing Isometric Mapping Hashing Yanzhen Liu, Xiao Bai, Haichuan Yang, Zhou Jun, and Zhihong Zhang Springer-Verlag, Computer Science Editorial, Tiergartenstr. 7, 692 Heidelberg, Germany {alfred.hofmann,ursula.barth,ingrid.haas,frank.holzwarth,

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Human Activity Recognition Using a Dynamic Texture Based Method

Human Activity Recognition Using a Dynamic Texture Based Method Human Activity Recognition Using a Dynamic Texture Based Method Vili Kellokumpu, Guoying Zhao and Matti Pietikäinen Machine Vision Group University of Oulu, P.O. Box 4500, Finland {kello,gyzhao,mkp}@ee.oulu.fi

More information

Face Recognition using Laplacianfaces

Face Recognition using Laplacianfaces Journal homepage: www.mjret.in ISSN:2348-6953 Kunal kawale Face Recognition using Laplacianfaces Chinmay Gadgil Mohanish Khunte Ajinkya Bhuruk Prof. Ranjana M.Kedar Abstract Security of a system is an

More information

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Reed M. Williams and Horea T. Ilieş Department of Mechanical Engineering University of Connecticut Storrs, CT

More information

Spatio-temporal Feature Classifier

Spatio-temporal Feature Classifier Spatio-temporal Feature Classifier Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 1-7 1 Open Access Yun Wang 1,* and Suxing Liu 2 1 School

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Semi-supervised Data Representation via Affinity Graph Learning

Semi-supervised Data Representation via Affinity Graph Learning 1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073

More information

Generating Different Realistic Humanoid Motion

Generating Different Realistic Humanoid Motion Generating Different Realistic Humanoid Motion Zhenbo Li,2,3, Yu Deng,2,3, and Hua Li,2,3 Key Lab. of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing

More information

Heat Kernel Based Local Binary Pattern for Face Representation

Heat Kernel Based Local Binary Pattern for Face Representation JOURNAL OF LATEX CLASS FILES 1 Heat Kernel Based Local Binary Pattern for Face Representation Xi Li, Weiming Hu, Zhongfei Zhang, Hanzi Wang Abstract Face classification has recently become a very hot research

More information

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk

More information

A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning

A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning B. Raducanu 1 and F. Dornaika 2,3 1 Computer Vision Center, Barcelona, SPAIN 2 Department of Computer Science and Artificial

More information

Expanding gait identification methods from straight to curved trajectories

Expanding gait identification methods from straight to curved trajectories Expanding gait identification methods from straight to curved trajectories Yumi Iwashita, Ryo Kurazume Kyushu University 744 Motooka Nishi-ku Fukuoka, Japan yumi@ieee.org Abstract Conventional methods

More information

Neighbor Line-based Locally linear Embedding

Neighbor Line-based Locally linear Embedding Neighbor Line-based Locally linear Embedding De-Chuan Zhan and Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhandc, zhouzh}@lamda.nju.edu.cn

More information

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition A Discriminative Non-Linear Manifold Learning Technique for Face Recognition Bogdan Raducanu 1 and Fadi Dornaika 2,3 1 Computer Vision Center, 08193 Bellaterra, Barcelona, Spain bogdan@cvc.uab.es 2 IKERBASQUE,

More information

The Role of Manifold Learning in Human Motion Analysis

The Role of Manifold Learning in Human Motion Analysis The Role of Manifold Learning in Human Motion Analysis Ahmed Elgammal and Chan Su Lee Department of Computer Science, Rutgers University, Piscataway, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract.

More information

Person Action Recognition/Detection

Person Action Recognition/Detection Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the

More information

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model Caifeng Shan, Shaogang Gong, and Peter W. McOwan Department of Computer Science Queen Mary University of London Mile End Road,

More information

Non-linear dimension reduction

Non-linear dimension reduction Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods

More information

Laplacian Faces: A Face Recognition Tool

Laplacian Faces: A Face Recognition Tool Laplacian Faces: A Face Recognition Tool Prof. Sami M Halwani 1, Prof. M.V.Ramana Murthy 1, Prof. S.B.Thorat 1 Faculty of Computing and Information Technology, King Abdul Aziz University, Rabigh, KSA,Email-mv.rm50@gmail.com,

More information

A new Graph constructor for Semi-supervised Discriminant Analysis via Group Sparsity

A new Graph constructor for Semi-supervised Discriminant Analysis via Group Sparsity 2011 Sixth International Conference on Image and Graphics A new Graph constructor for Semi-supervised Discriminant Analysis via Group Sparsity Haoyuan Gao, Liansheng Zhuang, Nenghai Yu MOE-MS Key Laboratory

More information

Distance-driven Fusion of Gait and Face for Human Identification in Video

Distance-driven Fusion of Gait and Face for Human Identification in Video X. Geng, L. Wang, M. Li, Q. Wu, K. Smith-Miles, Distance-Driven Fusion of Gait and Face for Human Identification in Video, Proceedings of Image and Vision Computing New Zealand 2007, pp. 19 24, Hamilton,

More information

Human Motion Recognition Using Isomap and Dynamic Time Warping

Human Motion Recognition Using Isomap and Dynamic Time Warping Human Motion Recognition Using Isomap and Dynamic Time Warping Jaron Blackburn and Eraldo Ribeiro Computer Vision and Bio-Inspired Computing Laboratory Department of Computer Sciences Florida Institute

More information

Sparsity Preserving Canonical Correlation Analysis

Sparsity Preserving Canonical Correlation Analysis Sparsity Preserving Canonical Correlation Analysis Chen Zu and Daoqiang Zhang Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China {zuchen,dqzhang}@nuaa.edu.cn

More information

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 38 CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 3.1 PRINCIPAL COMPONENT ANALYSIS (PCA) 3.1.1 Introduction In the previous chapter, a brief literature review on conventional

More information

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 Face Recognition ased on LDA and Improved Pairwise-Constrained

More information

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction Wei Wang 1, Yan Huang 1, Yizhou Wang 2, Liang Wang 1 1 Center for Research on Intelligent Perception and Computing, CRIPAC

More information

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

FACE RECOGNITION USING SUPPORT VECTOR MACHINES FACE RECOGNITION USING SUPPORT VECTOR MACHINES Ashwin Swaminathan ashwins@umd.edu ENEE633: Statistical and Neural Pattern Recognition Instructor : Prof. Rama Chellappa Project 2, Part (b) 1. INTRODUCTION

More information

Laplacian MinMax Discriminant Projection and its Applications

Laplacian MinMax Discriminant Projection and its Applications Laplacian MinMax Discriminant Projection and its Applications Zhonglong Zheng and Xueping Chang Department of Computer Science, Zhejiang Normal University, Jinhua, China Email: zhonglong@sjtu.org Jie Yang

More information

Manifold Clustering. Abstract. 1. Introduction

Manifold Clustering. Abstract. 1. Introduction Manifold Clustering Richard Souvenir and Robert Pless Washington University in St. Louis Department of Computer Science and Engineering Campus Box 1045, One Brookings Drive, St. Louis, MO 63130 {rms2,

More information

Sparse Manifold Clustering and Embedding

Sparse Manifold Clustering and Embedding Sparse Manifold Clustering and Embedding Ehsan Elhamifar Center for Imaging Science Johns Hopkins University ehsan@cis.jhu.edu René Vidal Center for Imaging Science Johns Hopkins University rvidal@cis.jhu.edu

More information

Dependency Modeling for Information Fusion with Applications in Visual Recognition

Dependency Modeling for Information Fusion with Applications in Visual Recognition Dependency Modeling for Information Fusion with Applications in Visual Recognition MA Jinhua A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Principal

More information

Jointly Learning Data-Dependent Label and Locality-Preserving Projections

Jointly Learning Data-Dependent Label and Locality-Preserving Projections Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Jointly Learning Data-Dependent Label and Locality-Preserving Projections Chang Wang IBM T. J. Watson Research

More information

Semi-supervised Learning by Sparse Representation

Semi-supervised Learning by Sparse Representation Semi-supervised Learning by Sparse Representation Shuicheng Yan Huan Wang Abstract In this paper, we present a novel semi-supervised learning framework based on l 1 graph. The l 1 graph is motivated by

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Bag of Optical Flow Volumes for Image Sequence Recognition 1

Bag of Optical Flow Volumes for Image Sequence Recognition 1 RIEMENSCHNEIDER, DONOSER, BISCHOF: BAG OF OPTICAL FLOW VOLUMES 1 Bag of Optical Flow Volumes for Image Sequence Recognition 1 Hayko Riemenschneider http://www.icg.tugraz.at/members/hayko Michael Donoser

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds

Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds Ahmed Elgammal a,,, Chan-Su Lee b a Department of Computer Science, Rutgers University, Frelinghuysen Rd,

More information

Video-based Human Activity Analysis: An Operator-based Approach

Video-based Human Activity Analysis: An Operator-based Approach Video-based Human Activity Analysis: An Operator-based Approach Xiao Bian North Carolina State University Department of Electrical and Computer Engineering 27606, Raleigh, NC, USA xbian@ncsu.edu Hamid

More information

Discriminative Embedding via Image-to-Class Distances

Discriminative Embedding via Image-to-Class Distances X. ZHEN, L. SHAO, F. ZHENG: 1 Discriminative Embedding via Image-to-Class Distances Xiantong Zhen zhenxt@gmail.com Ling Shao ling.shao@ieee.org Feng Zheng cip12fz@sheffield.ac.uk Department of Medical

More information

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1 ABSTRACT. This work is

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Evaluation of local descriptors for action recognition in videos

Evaluation of local descriptors for action recognition in videos Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,

More information

Fuzzy Bidirectional Weighted Sum for Face Recognition

Fuzzy Bidirectional Weighted Sum for Face Recognition Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 447-452 447 Fuzzy Bidirectional Weighted Sum for Face Recognition Open Access Pengli Lu

More information

3D Posture Representation Using Meshless Parameterization with Cylindrical Virtual Boundary

3D Posture Representation Using Meshless Parameterization with Cylindrical Virtual Boundary 3D Posture Representation Using Meshless Parameterization with Cylindrical Virtual Boundary Yunli Lee and Keechul Jung School of Media, College of Information Technology, Soongsil University, Seoul, South

More information

ACTIVE CLASSIFICATION FOR HUMAN ACTION RECOGNITION. Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas

ACTIVE CLASSIFICATION FOR HUMAN ACTION RECOGNITION. Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas ACTIVE CLASSIFICATION FOR HUMAN ACTION RECOGNITION Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas Depeartment of Informatics, Aristotle University of Thessaloniki, Greece {aiosif,tefas,pitas}@aiia.csd.auth.gr

More information

Computation Strategies for Volume Local Binary Patterns applied to Action Recognition

Computation Strategies for Volume Local Binary Patterns applied to Action Recognition Computation Strategies for Volume Local Binary Patterns applied to Action Recognition F. Baumann, A. Ehlers, B. Rosenhahn Institut für Informationsverarbeitung (TNT) Leibniz Universität Hannover, Germany

More information

Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

Incremental Action Recognition Using Feature-Tree

Incremental Action Recognition Using Feature-Tree Incremental Action Recognition Using Feature-Tree Kishore K Reddy Computer Vision Lab University of Central Florida kreddy@cs.ucf.edu Jingen Liu Computer Vision Lab University of Central Florida liujg@cs.ucf.edu

More information

Learning a Spatially Smooth Subspace for Face Recognition

Learning a Spatially Smooth Subspace for Face Recognition Learning a Spatially Smooth Subspace for Face Recognition Deng Cai Xiaofei He Yuxiao Hu Jiawei Han Thomas Huang University of Illinois at Urbana-Champaign Yahoo! Research Labs {dengcai2, hanj}@cs.uiuc.edu,

More information

Image-Based Face Recognition using Global Features

Image-Based Face Recognition using Global Features Image-Based Face Recognition using Global Features Xiaoyin xu Research Centre for Integrated Microsystems Electrical and Computer Engineering University of Windsor Supervisors: Dr. Ahmadi May 13, 2005

More information

Facial Expression Recognition Using Expression- Specific Local Binary Patterns and Layer Denoising Mechanism

Facial Expression Recognition Using Expression- Specific Local Binary Patterns and Layer Denoising Mechanism Facial Expression Recognition Using Expression- Specific Local Binary Patterns and Layer Denoising Mechanism 1 2 Wei-Lun Chao, Jun-Zuo Liu, 3 Jian-Jiun Ding, 4 Po-Hung Wu 1, 2, 3, 4 Graduate Institute

More information

Stratified Structure of Laplacian Eigenmaps Embedding

Stratified Structure of Laplacian Eigenmaps Embedding Stratified Structure of Laplacian Eigenmaps Embedding Abstract We construct a locality preserving weight matrix for Laplacian eigenmaps algorithm used in dimension reduction. Our point cloud data is sampled

More information

Adaptive Action Detection

Adaptive Action Detection Adaptive Action Detection Illinois Vision Workshop Dec. 1, 2009 Liangliang Cao Dept. ECE, UIUC Zicheng Liu Microsoft Research Thomas Huang Dept. ECE, UIUC Motivation Action recognition is important in

More information

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu

More information

FACE RECOGNITION FROM A SINGLE SAMPLE USING RLOG FILTER AND MANIFOLD ANALYSIS

FACE RECOGNITION FROM A SINGLE SAMPLE USING RLOG FILTER AND MANIFOLD ANALYSIS FACE RECOGNITION FROM A SINGLE SAMPLE USING RLOG FILTER AND MANIFOLD ANALYSIS Jaya Susan Edith. S 1 and A.Usha Ruby 2 1 Department of Computer Science and Engineering,CSI College of Engineering, 2 Research

More information

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING Nandita M. Nayak, Amit K. Roy-Chowdhury University of California, Riverside ABSTRACT We present an approach which incorporates spatiotemporal

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Learning Human Actions with an Adaptive Codebook

Learning Human Actions with an Adaptive Codebook Learning Human Actions with an Adaptive Codebook Yu Kong, Xiaoqin Zhang, Weiming Hu and Yunde Jia Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

SINGLE-SAMPLE-PER-PERSON-BASED FACE RECOGNITION USING FAST DISCRIMINATIVE MULTI-MANIFOLD ANALYSIS

SINGLE-SAMPLE-PER-PERSON-BASED FACE RECOGNITION USING FAST DISCRIMINATIVE MULTI-MANIFOLD ANALYSIS SINGLE-SAMPLE-PER-PERSON-BASED FACE RECOGNITION USING FAST DISCRIMINATIVE MULTI-MANIFOLD ANALYSIS Hsin-Hung Liu 1, Shih-Chung Hsu 1, and Chung-Lin Huang 1,2 1. Department of Electrical Engineering, National

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

Statistics of Pairwise Co-occurring Local Spatio-Temporal Features for Human Action Recognition

Statistics of Pairwise Co-occurring Local Spatio-Temporal Features for Human Action Recognition Statistics of Pairwise Co-occurring Local Spatio-Temporal Features for Human Action Recognition Piotr Bilinski, Francois Bremond To cite this version: Piotr Bilinski, Francois Bremond. Statistics of Pairwise

More information

Texture Representations Using Subspace Embeddings

Texture Representations Using Subspace Embeddings Texture Representations Using Subspace Embeddings Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu Abstract In this paper,

More information

SUSPICIOUS MOTION DETECTION IN SURVEILLANCE VIDEO

SUSPICIOUS MOTION DETECTION IN SURVEILLANCE VIDEO SUSPICIOUS MOTION DETECTION IN SURVEILLANCE VIDEO V.Padmavathi 1 Dr.M.Kalaiselvi Geetha 2 ABSTRACT Video Surveillance systems are used for any type of monitoring. Action recognition has drawn much attention

More information

Learning Semantic Features for Action Recognition via Diffusion Maps

Learning Semantic Features for Action Recognition via Diffusion Maps Learning Semantic Features for Action Recognition via Diffusion Maps Jingen Liu a, Yang Yang, Imran Saleemi and Mubarak Shah b a Department of EECS, University of Michigan, Ann Arbor, MI, USA b Department

More information

Multidirectional 2DPCA Based Face Recognition System

Multidirectional 2DPCA Based Face Recognition System Multidirectional 2DPCA Based Face Recognition System Shilpi Soni 1, Raj Kumar Sahu 2 1 M.E. Scholar, Department of E&Tc Engg, CSIT, Durg 2 Associate Professor, Department of E&Tc Engg, CSIT, Durg Email:

More information

Facial Expression Recognition with Emotion-Based Feature Fusion

Facial Expression Recognition with Emotion-Based Feature Fusion Facial Expression Recognition with Emotion-Based Feature Fusion Cigdem Turan 1, Kin-Man Lam 1, Xiangjian He 2 1 The Hong Kong Polytechnic University, Hong Kong, SAR, 2 University of Technology Sydney,

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Gesture Recognition Under Small Sample Size

Gesture Recognition Under Small Sample Size Gesture Recognition Under Small Sample Size Tae-Kyun Kim 1 and Roberto Cipolla 2 1 Sidney Sussex College, University of Cambridge, Cambridge, CB2 3HU, UK 2 Department of Engineering, University of Cambridge,

More information