Action Recognition using Multi-layer Depth Motion Maps and Sparse Dictionary Learning

Size: px
Start display at page:

Download "Action Recognition using Multi-layer Depth Motion Maps and Sparse Dictionary Learning"

Transcription

1 Action Recognition using Multi-layer Depth Motion Maps and Sparse Dictionary Learning Chengwu Liang #1, Enqing Chen #2, Lin Qi #3, Ling Guan #1 # School of Information Engineering, Zhengzhou University, Zhengzhou, China 1 liangchengwu0615@126.com, 2 ieeqchen@zzu.edu.cn, 3 ielqi@zzu.edu.cn Department of Electrical and Computer Engineering, Ryerson University, Toronto, Canada 1 lguan@ee.ryerson.ca Abstract In this paper, we propose a new spatio-temporal feature based method for human action recognition using depth image sequence. Fist, Layered Depth Motion maps (LDM) are utilized to capture the temporal motion feature. Next, multiscale HOG descriptors are computed on LDM to characterize the structural information of actions. Then sparse coding is applied for feature representation. Extending Sparse fisher Discriminative Dictionary Learning (SDDL) model and its corresponding classification scheme are also introduced. In SDDL model, the sub-dictionary is updated class by class, leading to class-specific compact discriminative dictionaries. The proposed method is evaluated on public MSR Action3D datasets and demonstrates great performance, especially in cross subject test. I. INTRODUCTION As one of the active research topics in computer vision, human action recognition has been widely studied. It is a key issue in natural human computer interaction, virtual reality, video surveillance, video retrieval, gaming and smart assistive living [1][2][3][4][5]. It is a challenging problem due to different hitches like occlusions, view changes of cameras, cluttering, large variations in human motion and appearance and so on. Successful action recognition is characterized by three aspects: effective feature extraction, appropriate feature representation or description, and suitable classifier. Traditional action recognition methods are based on RGB videos. With the release of RGB-depth sensor (e.g. Microsoft Kinect), depth images and RGB videos can be captured simultaneously. Depth images provide the 3D structural information, which gives us a new view for human action recognition. Moreover, the positions of human body skeleton joints can be extracted from single depth image [6]. Based on the data source, the existing approaches are divided into two categories: traditional color RGB video-based and the more recent depth images-based. Many RGB video-based methods have been studied, such as motion history image (MHI) method [7], space-time interest points (STIP) [8], bag of words (BOW) [9], trajectory-based methods [10] and hidden conditional random field (HCRF) [11]. However, a key limitation of BOW is that it is not able to capture adequate spatial and temporal information due to the local nature of the model. Trajectory-based methods [10] are sufficient to distinguish actions by the tracking of human joint positions but computationally expensive. Although significant Fig. 1. Examples of the depth images for actions of High wave (top) and Draw X (below). progress has been made by RGB video-based methods, human action recognition still faces challenges, such as occlusions, large intra-class variations and illumination changes. Depth images that provide the 3D information of the scene may mitigate these challenges. Action recognition approaches based on depth images feature have grown rapidly recently [3]. In this paper, what we focus on is human action recognition based on the original depth data. Based on depth images (Fig.1), a new spatio-temporal image motion feature, i.e. Sparse coding-based multi-layer Depth Motion maps feature (ScLDM) and multi-scale HOG descriptors, are proposed to characterize the local spatial structure (shape) of an action and the local temporal change of human motion. By preserving temporal local motion and spatial structure of the human action occurring in a video sequence, and utilizing the complementary nature of the two types of information, the proposed multi-layered depth motion feature and multi-scale structure feature effectively capture the primary characteristics of a depth action. In addition, human action recognition has to deal with the problem of sparsity of high-dimensional data distribution and the overlap of feature subspaces which may lead to performance degradation in recognition, especially for certain similar actions. Dictionary learning classification method is one way to solve this case. In order to cope with the intersection of /15$31.00 c 2015 IEEE

2 different feature sets, different from conventional Orthogonal Matching Pursuit (OMP) methods and K-SVD algorithm [12], SDDL is coupled with ScLDM for action recognition. By adding certain constraints and a discriminative term to the sparse coefficients in Sparse Dictionary Learning (SDL) model, the sparse coefficients have small intra-class variances and large inter-class variances, leading to more discriminative representation. With the introduction of the SDDL model to action recognition, the new feature representation method we proposed is effective for depth images-based human action recognition. The experiment results demonstrate that human action recognition performance is improved by the proposed method. II. RELATED WORKS According to the features extracted for action recognition, the current methods based on depth sensor are roughly divided into three categories: original depth features-based, skeleton features-based and the fusion of different features-based. For feature extraction based on original depth data, the first work of action recognition is [13]. They employed a bag of 3D points from the original 3D depth map and Gaussian mixture models to describe a salient posture, and proposed to model the dynamics of actions by action graph. In this method, each depth map is projected onto three orthogonal Cartesian planes to select the representative 3D points. Ni et al. [14] proposed a Three-Dimensional Motion History Images (3D- MHIs) approach and a Depth-Layered Multi-Channel STIPs (DLMC-STIPs) framework. However, this framework augments depth images as the auxiliary information for extracting the STIP in RGB channel [3]. To address the noisy and missing values problem in depth videos, Xia et. al [15] used noise suppression functions to extract Depth STIP (DSTIP) and proposed a self-similarity depth cuboid feature (DCSF) to boost the performance. Based on depth maps, 3D local occupancy feature is used individually in 3D spatio-temporal volume [16][17], in which data points could be projected to the 4D (x, y, z, t) space, for activity recognition. Yang et al. [18] employed Depth Motion Maps (DMM) feature and Histogram of Oriented Gradient (HOG) descriptor to characterize body shape and motion information. This method has good performance and is computationally simple. Chen et al. [19] utilized DMM and collaborative representation classifier to achieve real-time action recognition. Histogram of oriented 4D normals (HON4D feature) [20] was used for activity recognition from depth sequences. 3D motion trial model and pyramid HOG (3DMTM-PHPG) [21] was proposed to represent the actions of depth maps. For skeleton features-based methods, some researchers use skeleton tracker [6] to construct various skeleton joins features. EigenJoints feature descriptor [22] based on the differences of skeleton joints, and Naive Bayes Nearest Neighbour (NBNN) were used for action recognition. Xia et al. [23] proposed to use histograms of 3D joint locations (HOJ3D) as a representation of static postures, and applied discrete hidden markov models (HMMs) for action recognition. However, a limitation z y Front view :x-y projection Side view :y-z projection Top view :x-z projection Projection to three planes x LDMfront LDMSide (a)golf swing (b)high throw Multi-layered LDM LDMTop HOG(s=3) HOG(s=2) HOG(s=1) Multi-scale HOG HOG(s=3) HOG(s=2) HOG(s=1) Multi-scale HOG Fig. 2. The feature extraction from a depth action video. The LDMv of actions (a) Golf swing and (b) High throw. of these methods is that 3D joint positions extracted by [6] are not optimal due to the challenges caused by occlusions or cluttering. By fusing spatio-temporal feature from colour images and 3D skeleton joints feature from depth maps, [24] achieved good recognition results. In [25], three STIP based features and six local descriptors are evaluated for depth-based action recognition. They proposed two schemes to refine STIP features and a fusion approach to evaluate the performance of combining STIP with skeleton features. However, these methods generate a considerable amount of data and have high computational complexity. III. SPARSE CODING-BASED LOCAL DEPTH MOTION FEATURES (SCLDM) In [18][19][26], based on depth images, Depth Motion Maps (DMM) are used to characterize the 3D local motion and shape information. Although it is computationally simple and has demonstrated good performance, DMM [18][19] calculated from the entire sequence may not capture the transitional motion cues. Previous motion history may get overwritten when a more recent action occurs at the same location. This observation motivates us to divide a depth action sequence into several temporal layers and calculate the individual depth motion maps within each layer to better capture the detailed temporal local motion cues. By introducing ScLDM both in temporal multi-layer and spatial multi-scale, the procedure of obtaining depth motion feature and feature representation becomes more effective. It has three components, multi-layer local temporal Depth Motion map features extraction (LDM), multi-scale HOG descriptors on LDM and sparse coding-based feature representation. A. Multi-layer Depth Motion Maps Feature For a depth action sequence, it contains 3D depth information. We first project the 3D depth frame onto three orthogonal 2D planes [13], as showed in Fig.2. Each plane is a view, corresponding to a projected map, denoted by DM v, where v {front, side, top}. LDM features are extracted from three views to characterize the depth motion feature of an action. For each projected map, its motion history map energy LDMv L is

3 obtained by computing the difference between two maps at different temporal interval L. Each fixed temporal interval is a layer. Then the binary map of motion energy is obtained, which indicates motion regions, i.e. where movement occurs in each temporal interval. It provides a clue of the action category being performed. At each layer, we stack the motion energy through entire video sequences to generate the local depth motion map features for each projection view, then concatenate the three motion energy terms to form LDM L feature: LDM L v = max b ( DM i+l v DM i v ε ), (1) i=a LDM L = [LDMfront, L LDMside, L LDMtop] L T where i is the frame index, max is the number of video frames, a is the starting frame index and b is the ending frame index, L is the temporal interval (in frames), L with fixed value is a layer. When L = 1, LDM L is equal to DMM described in [18]. ε is the noise threshold, N is the number of layers. As noted in [19], for each depth video sequence, at the beginning and the end, the subjects were mostly at standstill position with only small body movements, which did not contribute to the motion characteristics. The first a frames and the last b frames were removed. The smaller L is, the more depth motion map characteristic details between frames we can have. Different depth action video samples have different duration time. For a short video clip, in each projection view, the difference between frames with smaller L could obtain more motion information of an action. Intuitively, for a long video clip, its motion information is well preserved by large L. Then we concatenate the N layers LDM L feature to form LDM feature. LDM = [LDM L=L1, LDM L=L2,..., LDM L=L N ] T (2) In our experiment, we set a = b =5, the number of layers as 3 and L 1 = 1, L 2 = 3, L 3 = 5, ε = 30 or 50. B. Multi-scale Structure Feature (HOG Pyramid) Although multi-layered LDM features are able to encode accurate motion information in temporal dimension, they lack structural information of the action. Simultaneously, multilayered LDM features are actually pixel-based features. One disadvantage is that the feature dimension could be fairly high. To better fit the sparse representation classification framework, we build a compact representation based on the HOG descriptor [27]. The edge and gradient information captured by the HOG descriptor effectively describe the action appearance and motion orientations. Structural information of LDM features is composed of different sacles. To encode the structural information of LDM features, HOG descriptors at all three scales of the spatial pyramid are computed on LDM to characterize the multiscale action shapes and motion orientations. The principal component analysis (PCA) or Random Projection (RP) is then employed to reduce the high dimensionality of the feature vectors. C. Sparse coding-based Feature Representation Upon extracting features from both the temporal motion feature (i.e., the LDM feature) and the spatial shape feature (i.e., the multi-scale HOG feature), the next step is feature representation. Sparse coding plays an important role in human perception and has been used in pattern recognition and computer vision tasks. A predefined dictionary containing the training samples of all classes directly is used to code the query image[28]. In this paper, the sparse coding is applied on the extracted features to learn representative features. The whole feature representation process is referred to as Sparse coding LDM feature and multi-scale HOG feature (ScLDM). Assume that there are K class actions. Let A = [A 1, A 2,..., A i,..., A K ] be the extracted feature sets of all training samples, where A i is the feature set of training samples from class i. A structured dictionary D = [D 1, D 2,..., D i,..., D K ] is learned, with D i being a class-specific sub-dictionary associated with class i. Let y be the feature of a query sample. Sparse representation codes y over the dictionary D, thus y DX, where X = [X 1 ; X 2 ;...; X i ;...; X K ] R n is a vector of sparse coefficients and X i contains sparse sub-coefficients associated with class i. The calculated sparse codes for one feature correspond to the responses of that feature to all the atoms in the dictionary. This is formulated as { } ˆX = arg min X y DX λ X 1 where λ is scalar parameters. IV. ACTION RECOGNITION USING SDDL Predefined dictionary [28] has high coding complexity and is unable to fully exploit the discriminative information hidden in the training samples. Although there are more generalized ways of dictionary learning, such as K-SVD [12], such methods are not suitable for classification tasks because they only guarantee that the learned dictionary can faithfully represent the training samples, not guarantee that the learned dictionary is discriminative, which is important for action recognition. In this paper, by using combined sparse coding LDM feature and multi-scale HOG feature (ScLDM) extracted from depth image, we construct an SDDL model, which was descriped in [29] for image classification, to classify different actions. To the best of our knowledge, it is the first time this modal is applied to depth data-based action recognition. One advantage of SDDL is that the dictionary can be learned class by class offline and tested online. When a new class of training sets is added, the dictionary updates itself with incremental parts, without repeating the entire training process. Ideally, the dictionary D in Eq. 3 not only faithfully represents the query samples, but also has powerful discriminative power for action recognition. The SDDL model for action recognition is given below { } r (A, D, X) J(D, X) = argmin (D,X) +λ 1 X 1 + λ 2 f (X) (4) s.t. d n 2 = 1, n (3)

4 where r (A, D, X) is the data fidelity term, X 1 is the sparse constrains, f (X) is a discrimination term imposed on the coefficient matrix X, λ 1 and λ 2 are scalar parameters. Each atom d n 2 of D is constrained to have a unit l 2 -norm. A. Data Fidelity Term and Discriminative Coefficient Term [ ] Set X i = Xi 1;...; Xj i ;...; XK i, where X j i is the representation coefficient of A i over D j. First of all, the whole dictionary D should represent A i as faithfully as possible, namely A i DX i = D i Xi D ixi i DK i. Second, since D i is a class-specific sub-dictionary associated with class i, it is expected that Xi i has the most significant coefficients in X i and other X j i (j i) should be as small as possible, i.e. X i is block sparse. The data fidelity term r(a, D, X) is formulated as r (A i, D, X i ) = A i DX i 2 F + Ai D i Xi i 2 F K + D j X j (5) i j=1,j i The discriminative constraint f (X) makes class-specific sub-dictionary efficiently represent the corresponding action but less efficiently for other kinds of action. Therefore, it leads to generate smaller and more compact dictionaries. This is formulated as f (X) = tr (S W (X)) tr (S B (X)) + η X 2 F (6) where S W (X) is the intra-class scatter and S B (X) is the inter-class scatter of X, respectively. X 2 F is an elastic term, making f(x) convex and stable for optimization. In Eq. 6, we set η = 1. B. The Classification Strategy The classification strategy is inspired by the recent success of [29] in face recognition. When the information of extracted action feature is not enough for the representation of the sample feature space, the learned sub-dictionary D i may not be able to faithfully represent the query samples of this class. We need collaborative sparse representation (CRC) on the whole dictionary D, called Global Classifier (SDDL-GC), similar to [29]. On the other hand, in the test stage the l1-norm regularization on the representation coefficient may be relaxed to l2-norm regularization for faster speed. When the information of extracted action feature is enough for the representation of the sample feature space, the subdictionary D i is able to well span the subspace of class i. In this case, we can represent y locally over each sub-dictionary D i instead of the whole dictionary D, which is called Local Classifier (SDDL-LC)[29]. 2 F V. EXPERIMENT SETTINGS AND RESULTS A. Datasets and Experiment Settings MSR Action3D [13] is a public dataset with sequences of depth maps captured by a RGB-Depth sensor (Kinect), as shown in Table I. It includes 20 action categories performed by 10 subjects. For each subject, each action was performed TABLE I THREE ACTION SUBSETS OF MSR ACTION3D DATASET Action Set 1 (AS1) Action Set 2 (AS2) Action Set 3 (AS3) Horizontal wave (How) High wave (Hiw) High throw (Ht) Hammer (Hamm) Hand catch (Hcat) Forward kick (Fk) Forward punch (Fp) Draw x (Dx) Side kick (Sk) High throw (Ht) Draw tick (Dt) Jogging (Jog) Hand clap (Hcla) Draw circle (Dc) Tennis swing (Tsw) Bend (Bend) Two hand wave (Thw) Tennis serve (Tse) Tennis serve (Tse) Forward kick (Fk) Golf swing (Gs) Pickup throw (Pt) Side boxing (Sb) Pickup throw (Pt) TABLE II PERFORMANCE EVALUATION OF THE PROPOSED METHODS AND SOME EXISTING METHODS WITH THREE TESTS (TEST ONE, TEST TWO, CROSS SUBJECT TEST) ON THREE SUBSETS (AS1, AS2, AS3) Tests [13] [23] [18] [22] M1 M2 M3 M4 AS1One AS2One AS3One Overall AS1Two AS2Two AS3Two Overall AS1Cross AS2Cross AS3Cross Overall Average or 3 times. There are 567 depth action sequences in total. The resolution is 320x actions are divided into three subsets, AS1, AS2, AS3. Each subset includes 8 actions with certain overlaps. AS1 and AS2 were actions with similar movement, while AS3 was complex actions. Note that AS1 and AS2 have small inter-class variations, while AS3 has large intra-class variation. We use this dataset to evaluate the proposed method using the same experiment settings as described in [13]. As for each subset, there are three different tests. In Test One, 1/3 of the subset is used for training and the rest for testing; in Test Two, 2/3 of the subset is used for training and the rest for testing; and in Cross Subject Test, half subjects, namely subjects 1, 3, 5, 7, 9 (if existed) were used for training. As described in [13], the samples used for training were fixed. In Test One or Test Two, for each actio and each subject, the first One or first Two action videos were choosed as training samples. In each projected map, the foreground region is normalized to a fixed size. This normalization is able to reduce intra-class variations caused by subject heights and motion extents. In SDDL, the number of dictionary atoms is set to the number of training samples. The parameters of SDDL-GC are λ 1 = 05, λ 2 =5, γ =05, ω =5; parameters of SDDL-LC are λ 1 = 0.1, λ 2 =01, µ 1 =0.1, µ 2 =05. B. Action Recognition Results on Three Subsets and Analysis As shown in Table II, the recognition rates of Test One and Test Two are higher than Cross Test, and those of Test Two, in general, are the highest. In Cross subject Test, because the test samples and training samples come from different subjects,

5 Action Action Action How 10 Hiw 10 Ht 63.6 Hamm 91.7 Hcat 75.0 Fk 10 AS1 Recogniton Rate Fp Ht Hcla Bend AS2 Recogniton Rate Dx Dt Dc Thw AS3 Recogniton Rate Sk Jog Tsw Tse Tse 93.3 Fk 10 Gs 10 Pt 10 Sb 10 Pt How Hamm Fp Ht Hcla Bend Tse Pt Hiw Hcat Dx Dt Dc Thw Fk Sb Ht Fk Sk Jog Tsw Tse Gs Pt Fig. 3. Confusion matrices using the proposed method on Cross Test in subsets AS1 (left), AS2 (middle) and AS3 (right). TABLE III EVALUATION OF METHODS ON THE CROSS SUBJECT TEST Methods Year Data Channel Cross DEP, SK Test Bag-of-3D-points [13] 2010 DEP 74.7 Actionlet ensemble [16] 2012 DEP 88.2 HOJ3D & DHMM [23] 2012 SK 79.0 STOP [17] 2012 DEP 84.8 DMM HOG & SVM [18] 2012 DEP 91.6 HON4D [20] 2013 DEP 88.9 DSTIP+DCSF & SVM [15] 2013 DEP 85.8 Eigen joints & NBNN [22] 2014 SK DMTM-PHOG & SVM [21] 2014 DEP 90.7 DMM & SDDL-GC+PCA Ours DEP 90.5 DMM & SDDL-LC+RP Ours DEP 91.1 ScLDM & SDDL-GC Ours DEP 93.2 ScLDM & SDDL-LC Ours DEP 93.7 DEP, SK denote depth images, skeleton joints respectively. We further studied Cross Subject Test. The results are shown in Table III. The results indicate that the proposed four methods are, in general, better than the other methods. The highest accuracy is 93.7% by M4, the second highest accuracy is 93.2% by M3. Both of them outperform state-ofthe-art results Single layered LDM ScLDM (LDM+Multi Scale Spatial HOG) (a) Classifiers comparison SVM SDDL GC SDDL LC and subjects chose to perform actions freely on their own styles, it has big intra-class variations and is more challenging. Using the same experimental setup, four different configurations of the proposed method (M1, M2, M3, M4) are compared with state-of-the-art on three subsets of MSR Action3D. The results are summarized in Table II. First, we use M1 (DMM with SDDL-GC) and M2 (DMM with SDDL-LC) to evaluate how well the DMM feature described in [18] fit with the SDDL model. Then, we evaluate the performance of the proposed ScLDM feature with the two classification strategies, M3 (ScLDM+SDDL-GC) and M4 (ScLDM+SDDL-LC). Table II clearly shows that with the same settings, the average recognition rates by the four configurations of the proposed method outperform [13] [18], [22] and [23]. We also observe that when using combined sparse coding LDM feature and multi-scale HOG feature (ScLDM), the performance of SDDL-GC and SDDL-LC are both better than single layer LDM feature (i.e. DMM). Among the four configurations, M4 achieves the best performance. C. Cross Subject Test and Analysis Single layered LDM ScLDM SVM SDDL_GC SDDL_LC (b) Features comparison Fig. 4. Average recognition rates of two features (single layered LDM (L = 1) and ScLDM) with three classification methods (SVM, SDDL-GC, and SDDL-LC) on the Cross subject Test: (a) Classifiers comparison, (b)features comparison The confusion matrices of Cross Test based on ScLDM (L=1) and SDDL-GC are showed in Fig.3. In AS1 and AS3, most actions are recognized correctly, despite the fact that different people may perform the same action differently, leading to large intra-class variation in each class. In AS2, most state-of-the-art methods achieved very low accuracies for some of the similar actions, such as Hand catch, Draw x, Draw tick and Draw circle. But for these four actions,

6 75%, 79%, 100%, 53% are achieved by the proposed method, indicating the effectiveness and distinctiveness of combining multi-layered LDM features in time and multi-scale HOG features in space. D. Performance evaluation of SDDL and SVM classifiers on Cross Subject Test As illustrated in Fig.4, using only the single layered LDM feature, without computing multi-scale structure feature (HOG pyramid), the cross test accuracies are 90.5% (SDDL-GC classifier), 91.1% (SDDL-LC classifier), which outperform 82.3% using DMM HOG-SVM in [18] and 90.7% using 3DMTM-SVM in [21] respectively. This demonstrates the effectiveness of the proposed sparse representation with incremental discriminative dictionary learning. By imposing Fisher discriminative criterion in dictionary learning, the performance of human action recognition is greatly enhanced. VI. CONCLUSIONS In this paper, based on depth images, we proposed an effective method to extract features of human actions. Using multi-layered LDM and multi-scale HOG, we successfully captured temporal local image motion feature and the action structure and shape feature respectively. We then combined these complementary information as a holistic descriptor to form an effective representation for human actions. We also introduced Sparse Discriminative incremental Dictionary Learning (SDDL) model to action recognition, which is effective by learning a sparse dictionary class by class. This dictionary makes the sparse coding coefficients sufficiently discriminative for classification of similar actions. The experimental results on MSR Action3D dataset demonstrate the effectiveness of ScLDM features and the classification performance of SDDL. ACKNOWLEDGMENT This work is supported by National Natural Science Foundation of China (NSFC, No ), the Key International Collaboration Program of NSFC (No ) and the Canada Research Chair Program. REFERENCES [1] J. Aggarwal and M. S. Ryoo, Human activity analysis: A review, ACM Computing Surveys (CSUR), vol. 43, no. 3, p. 16, [2] C. Chen, R. Jafari, and N. Kehtarnavaz, Improving human action recognition using fusion of depth camera and inertial sensors, Human- Machine Systems, IEEE Transactions on, vol. 45, no. 1, pp , [3] J. Aggarwal and L. Xia, Human activity recognition from 3d data: A review, Pattern Recognition Letters, vol. 48, no. 0, pp , [4] R. D. Green and L. Guan, Quantifying and recognizing human movement patterns from monocular video images-part II: applications to biometrics, Circuits and System for Video Techonology, IEEE Transactions on, vol. 14, no. 2, pp , [5] M. J. Kyan, G. Sun, H. Li, L. Zhong, P. Muneesawang, N. Dong, B. Elderand, and L. Guan, An approach to ballet dance training through ms kinect and visualization in a cave virtual reality environment, ACM Transactions on Intelligent Systems and Technology (TISI), vol. 6, no. 2, p. 23, [6] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, Real-time human pose recognition in parts from single depth images, in CVPR, June 2011, pp [7] A. Bobick and J. Davis, The recognition of human movement using temporal templates, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no. 3, pp , Mar [8] I. Laptev and T. Lindeberg, On space-time interest points, Internal Journal Computer Vision, vol. 64, no. 2, pp , [9] J. C. Niebles, H. Wang, and L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, International Journal of Computer Vision, vol. 79, no. 3, pp , [10] H. Wang and C. Schmid, Action recognition with improved trajectories, in Computer Vision (ICCV), 2013 IEEE International Conference on, Dec 2013, pp [11] Y. Wang and G. Mori, Hidden part models for human action recognition: Probabilistic versus max margin, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 7, pp , July [12] M. Aharon, M. Elad, and A. Bruckstein, k -svd: An algorithm for designing overcomplete dictionaries for sparse representation, Signal Processing, IEEE Transactions on, vol. 54, no. 11, pp , Nov [13] W. Li, Z. Zhang, and Z. Liu, Action recognition based on a bag of 3d points, in CVPR Workshops (CVPRW), 2010 IEEE Computer Society Conference on, June 2010, pp [14] B. Ni, G. Wang, and P. Moulin, Rgbd-hudaact: A color-depth video database for human daily activity recognition, in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, Nov 2011, pp [15] L. Xia and J. Aggarwal, Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, in CVPR, 2013 IEEE Computer Society Conference on. IEEE, 2013, pp [16] J. Wang, Z. Liu, Y. Wu, and J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in CVPR, 2012 IEEE Conference on, pp [17] A. W. Vieira, E. R. Nascimento, G. L. Oliveira, Z. Liu, and M. F. M. Campos, Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications(CIARP), vol Springer, 2012, pp [18] X. Yang, C. Zhang, and Y. Tian, Recognizing actions using depth motion maps-based histograms of oriented gradients, in 20th ACM international conference on Multimedia, 2012, pp [19] C. Chen, K. Liu, and N. Kehtarnavaz, Real-time human action recognition based on depth motion maps, Journal of Real-Time Image Processing, pp. 1 9, [20] O. Oreifej and Z. Liu, Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences, in CVPR, 2013 IEEE Conference on. IEEE, 2013, pp [21] B. Liang and L. Zheng, 3D motion trail model based pyramid histograms of oriented gradient for action recognition, in ICPR, nd International Conference on, pp [22] X. Yang and Y. Tian, Effective 3d action recognition using eigenjoints, Journal of Visual Communication and Image Representation, vol. 25, pp. 2 11, Jan [23] L. Xia, C.-C. Chen, and J. Aggarwal, View invariant human action recognition using histograms of 3d joints, in CVPRW, 2012 IEEE Computer Society Conference on. IEEE, 2012, pp [24] J. Luo, W. Wang, and H. Qi, Spatio-temporal feature extraction and representation for rgb-d human action recognition, Pattern Recognition Letters, vol. 50, no. 0, pp , [25] Y. Zhu, W. Chen, and G. Guo, Evaluating spatiotemporal interest point features for depth-based action recognition, Image and Vision Computing, vol. 32, no. 8, pp , [26] C. Chen, R. Jafari, and N. Kehtarnavaz, Action recognition from depth sequences using depth motion maps-based local binary patterns, in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), jan 2015, pp [27] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in CVPR, IEEE Computer Society Conference on, vol. 1, 2005, pp [28] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, Robust face recognition via sparse representation, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 2, pp , [29] M. Yang, L. Zhang, X. Feng, and D. Zhang, Sparse representation based fisher discrimination dictionary learning for image classification, International Journal of Computer Vision, pp. 1 24, 2014.

Action Recognition Using Motion History Image and Static History Image-based Local Binary Patterns

Action Recognition Using Motion History Image and Static History Image-based Local Binary Patterns , pp.20-214 http://dx.doi.org/10.14257/ijmue.2017.12.1.17 Action Recognition Using Motion History Image and Static History Image-based Local Binary Patterns Enqing Chen 1, Shichao Zhang* 1 and Chengwu

More information

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu

More information

Human Action Recognition Using Temporal Hierarchical Pyramid of Depth Motion Map and KECA

Human Action Recognition Using Temporal Hierarchical Pyramid of Depth Motion Map and KECA Human Action Recognition Using Temporal Hierarchical Pyramid of Depth Motion Map and KECA our El Din El Madany 1, Yifeng He 2, Ling Guan 3 Department of Electrical and Computer Engineering, Ryerson University,

More information

1. INTRODUCTION ABSTRACT

1. INTRODUCTION ABSTRACT Weighted Fusion of Depth and Inertial Data to Improve View Invariance for Human Action Recognition Chen Chen a, Huiyan Hao a,b, Roozbeh Jafari c, Nasser Kehtarnavaz a a Center for Research in Computer

More information

Real-Time Human Action Recognition Based on Depth Motion Maps

Real-Time Human Action Recognition Based on Depth Motion Maps *Manuscript Click here to view linked References Real-Time Human Action Recognition Based on Depth Motion Maps Chen Chen. Kui Liu. Nasser Kehtarnavaz Abstract This paper presents a human action recognition

More information

J. Vis. Commun. Image R.

J. Vis. Commun. Image R. J. Vis. Commun. Image R. 25 (2014) 2 11 Contents lists available at SciVerse ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci Effective 3D action recognition using

More information

Temporal-Order Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams

Temporal-Order Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Temporal-Order Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye University of Central Florida jye@cs.ucf.edu Kai Li University of Central Florida kaili@eecs.ucf.edu

More information

Improving Surface Normals Based Action Recognition in Depth Images

Improving Surface Normals Based Action Recognition in Depth Images Improving Surface Normals Based Action Recognition in Depth Images Xuan Nguyen, Thanh Nguyen, François Charpillet To cite this version: Xuan Nguyen, Thanh Nguyen, François Charpillet. Improving Surface

More information

Histogram of 3D Facets: A Characteristic Descriptor for Hand Gesture Recognition

Histogram of 3D Facets: A Characteristic Descriptor for Hand Gesture Recognition Histogram of 3D Facets: A Characteristic Descriptor for Hand Gesture Recognition Chenyang Zhang, Xiaodong Yang, and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {czhang10,

More information

Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition

Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition Chenyang Zhang and Yingli Tian Department of Electrical Engineering

More information

A Survey of Human Action Recognition Approaches that use an RGB-D Sensor

A Survey of Human Action Recognition Approaches that use an RGB-D Sensor IEIE Transactions on Smart Processing and Computing, vol. 4, no. 4, August 2015 http://dx.doi.org/10.5573/ieiespc.2015.4.4.281 281 EIE Transactions on Smart Processing and Computing A Survey of Human Action

More information

Action Recognition from Depth Sequences Using Depth Motion Maps-based Local Binary Patterns

Action Recognition from Depth Sequences Using Depth Motion Maps-based Local Binary Patterns Action Recognition from Depth Sequences Using Depth Motion Maps-based Local Binary Patterns Chen Chen, Roozbeh Jafari, Nasser Kehtarnavaz Department of Electrical Engineering The University of Texas at

More information

Human Daily Action Analysis with Multi-View and Color-Depth Data

Human Daily Action Analysis with Multi-View and Color-Depth Data Human Daily Action Analysis with Multi-View and Color-Depth Data Zhongwei Cheng 1, Lei Qin 2, Yituo Ye 1, Qingming Huang 1,2, and Qi Tian 3 1 Graduate University of Chinese Academy of Sciences, Beijing

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Human Action Recognition via Fused Kinematic Structure and Surface Representation

Human Action Recognition via Fused Kinematic Structure and Surface Representation University of Denver Digital Commons @ DU Electronic Theses and Dissertations Graduate Studies 8-1-2013 Human Action Recognition via Fused Kinematic Structure and Surface Representation Salah R. Althloothi

More information

Research Article Improved Collaborative Representation Classifier Based on

Research Article Improved Collaborative Representation Classifier Based on Hindawi Electrical and Computer Engineering Volume 2017, Article ID 8191537, 6 pages https://doi.org/10.1155/2017/8191537 Research Article Improved Collaborative Representation Classifier Based on l 2

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Maxime Devanne 1,2, Hazem Wannous 1, Stefano Berretti 2, Pietro Pala 2, Mohamed Daoudi 1, and Alberto Del

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

Range-Sample Depth Feature for Action Recognition

Range-Sample Depth Feature for Action Recognition Range-Sample Depth Feature for Action Recognition Cewu Lu Jiaya Jia Chi-Keung Tang The Hong Kong University of Science and Technology The Chinese University of Hong Kong Abstract We propose binary range-sample

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Pat Jangyodsuk Department of Computer Science and Engineering The University

More information

A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth Images

A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth Images 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Action Recognition with HOG-OF Features

Action Recognition with HOG-OF Features Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient

More information

Multi-Temporal Depth Motion Maps-Based Local Binary Patterns for 3-D Human Action Recognition

Multi-Temporal Depth Motion Maps-Based Local Binary Patterns for 3-D Human Action Recognition SPECIAL SECTION ON ADVANCED DATA ANALYTICS FOR LARGE-SCALE COMPLEX DATA ENVIRONMENTS Received September 5, 2017, accepted September 28, 2017, date of publication October 2, 2017, date of current version

More information

Scale Invariant Human Action Detection from Depth Cameras using Class Templates

Scale Invariant Human Action Detection from Depth Cameras using Class Templates Scale Invariant Human Action Detection from Depth Cameras using Class Templates Kartik Gupta and Arnav Bhavsar School of Computing and Electrical Engineering Indian Institute of Technology Mandi, India

More information

3D Activity Recognition using Motion History and Binary Shape Templates

3D Activity Recognition using Motion History and Binary Shape Templates 3D Activity Recognition using Motion History and Binary Shape Templates Saumya Jetley, Fabio Cuzzolin Oxford Brookes University (UK) Abstract. This paper presents our work on activity recognition in 3D

More information

DMM-Pyramid Based Deep Architectures for Action Recognition with Depth Cameras

DMM-Pyramid Based Deep Architectures for Action Recognition with Depth Cameras DMM-Pyramid Based Deep Architectures for Action Recognition with Depth Cameras Rui Yang 1,2 and Ruoyu Yang 1,2 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Department

More information

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid

More information

Robust 3D Action Recognition with Random Occupancy Patterns

Robust 3D Action Recognition with Random Occupancy Patterns Robust 3D Action Recognition with Random Occupancy Patterns Jiang Wang 1, Zicheng Liu 2, Jan Chorowski 3, Zhuoyuan Chen 1, and Ying Wu 1 1 Northwestern University 2 Microsoft Research 3 University of Louisville

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

Human-Object Interaction Recognition by Learning the distances between the Object and the Skeleton Joints

Human-Object Interaction Recognition by Learning the distances between the Object and the Skeleton Joints Human-Object Interaction Recognition by Learning the distances between the Object and the Skeleton Joints Meng Meng, Hassen Drira, Mohamed Daoudi, Jacques Boonaert To cite this version: Meng Meng, Hassen

More information

Skeletal Quads: Human Action Recognition Using Joint Quadruples

Skeletal Quads: Human Action Recognition Using Joint Quadruples Skeletal Quads: Human Action Recognition Using Joint Quadruples Georgios Evangelidis, Gurkirt Singh, Radu Horaud To cite this version: Georgios Evangelidis, Gurkirt Singh, Radu Horaud. Skeletal Quads:

More information

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France ABSTRACT

More information

Spatio-Temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera

Spatio-Temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera Spatio-Temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera Lu Xia and J.K. Aggarwal Computer & Vision Research Center/Department of ECE The University of Texas at Austin

More information

Development in Object Detection. Junyuan Lin May 4th

Development in Object Detection. Junyuan Lin May 4th Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,

More information

Minimizing hallucination in Histogram of Oriented Gradients

Minimizing hallucination in Histogram of Oriented Gradients Minimizing hallucination in Histogram of Oriented Gradients Javier Ortiz Sławomir Bąk Michał Koperski François Brémond INRIA Sophia Antipolis, STARS group 2004, route des Lucioles, BP93 06902 Sophia Antipolis

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

PROCEEDINGS OF SPIE. Weighted fusion of depth and inertial data to improve view invariance for real-time human action recognition

PROCEEDINGS OF SPIE. Weighted fusion of depth and inertial data to improve view invariance for real-time human action recognition PROCDINGS OF SPI SPIDigitalLibrary.org/conference-proceedings-of-spie Weighted fusion of depth and inertial data to improve view invariance for real-time human action recognition Chen Chen, Huiyan Hao,

More information

Background subtraction in people detection framework for RGB-D cameras

Background subtraction in people detection framework for RGB-D cameras Background subtraction in people detection framework for RGB-D cameras Anh-Tuan Nghiem, Francois Bremond INRIA-Sophia Antipolis 2004 Route des Lucioles, 06902 Valbonne, France nghiemtuan@gmail.com, Francois.Bremond@inria.fr

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences Omar Oreifej University of Central Florida Orlando, FL oreifej@eecs.ucf.edu Zicheng Liu Microsoft Research Redmond,

More information

Arm-hand Action Recognition Based on 3D Skeleton Joints Ling RUI 1, Shi-wei MA 1,a, *, Jia-rui WEN 1 and Li-na LIU 1,2

Arm-hand Action Recognition Based on 3D Skeleton Joints Ling RUI 1, Shi-wei MA 1,a, *, Jia-rui WEN 1 and Li-na LIU 1,2 1 International Conference on Control and Automation (ICCA 1) ISBN: 97-1-9-39- Arm-hand Action Recognition Based on 3D Skeleton Joints Ling RUI 1, Shi-wei MA 1,a, *, Jia-rui WEN 1 and Li-na LIU 1, 1 School

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Person Identity Recognition on Motion Capture Data Using Label Propagation

Person Identity Recognition on Motion Capture Data Using Label Propagation Person Identity Recognition on Motion Capture Data Using Label Propagation Nikos Nikolaidis Charalambos Symeonidis AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Greece email:

More information

Human Action Recognition Using Silhouette Histogram

Human Action Recognition Using Silhouette Histogram Human Action Recognition Using Silhouette Histogram Chaur-Heh Hsieh, *Ping S. Huang, and Ming-Da Tang Department of Computer and Communication Engineering Ming Chuan University Taoyuan 333, Taiwan, ROC

More information

Class 9 Action Recognition

Class 9 Action Recognition Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition

More information

Real Time Person Detection and Tracking by Mobile Robots using RGB-D Images

Real Time Person Detection and Tracking by Mobile Robots using RGB-D Images Real Time Person Detection and Tracking by Mobile Robots using RGB-D Images Duc My Vo, Lixing Jiang and Andreas Zell Abstract Detecting and tracking humans are key problems for human-robot interaction.

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

A novel template matching method for human detection

A novel template matching method for human detection University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 A novel template matching method for human detection Duc Thanh Nguyen

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Human Activities Recognition Based on Skeleton Information via Sparse Representation

Human Activities Recognition Based on Skeleton Information via Sparse Representation Regular Paper Journal of Computing Science and Engineering, Vol. 12, No. 1, March 2018, pp. 1-11 Human Activities Recognition Based on Skeleton Information via Sparse Representation Suolan Liu School of

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals

Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals Neha Dawar 1, Chen Chen 2, Roozbeh Jafari 3, Nasser Kehtarnavaz 1 1 Department of Electrical and Computer Engineering,

More information

CovP3DJ: Skeleton-parts-based-covariance Descriptor for Human Action Recognition

CovP3DJ: Skeleton-parts-based-covariance Descriptor for Human Action Recognition CovP3DJ: Skeleton-parts-based-covariance Descriptor for Human Action Recognition Hany A. El-Ghaish 1, Amin Shoukry 1,3 and Mohamed E. Hussein 2,3 1 CSE Department, Egypt-Japan University of Science and

More information

STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences

STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences Antonio W. Vieira 1,2,EricksonR.Nascimento 1, Gabriel L. Oliveira 1, Zicheng Liu 3,andMarioF.M.Campos 1, 1 DCC - Universidade

More information

Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions TO APPEAR IN IEEE TRANSACTIONS ON MULTIMEDIA 1 Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions Mengyuan Liu, Member, IEEE, Hong Liu, Member, IEEE, Chen Chen, Member,

More information

Human Action Recognition Using Independent Component Analysis

Human Action Recognition Using Independent Component Analysis Human Action Recognition Using Independent Component Analysis Masaki Yamazaki, Yen-Wei Chen and Gang Xu Department of Media echnology Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577,

More information

Discriminative sparse model and dictionary learning for object category recognition

Discriminative sparse model and dictionary learning for object category recognition Discriative sparse model and dictionary learning for object category recognition Xiao Deng and Donghui Wang Institute of Artificial Intelligence, Zhejiang University Hangzhou, China, 31007 {yellowxiao,dhwang}@zju.edu.cn

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

3D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold

3D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold 3D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold Maxime Devanne, Hazem Wannous, Stefano Berretti, Pietro Pala, Mohamed Daoudi, Alberto Del Bimbo To cite this

More information

Skeleton-based Action Recognition Based on Deep Learning and Grassmannian Pyramids

Skeleton-based Action Recognition Based on Deep Learning and Grassmannian Pyramids Skeleton-based Action Recognition Based on Deep Learning and Grassmannian Pyramids Dimitrios Konstantinidis, Kosmas Dimitropoulos and Petros Daras ITI-CERTH, 6th km Harilaou-Thermi, 57001, Thessaloniki,

More information

Object Tracking using HOG and SVM

Object Tracking using HOG and SVM Object Tracking using HOG and SVM Siji Joseph #1, Arun Pradeep #2 Electronics and Communication Engineering Axis College of Engineering and Technology, Ambanoly, Thrissur, India Abstract Object detection

More information

Adaptive Action Detection

Adaptive Action Detection Adaptive Action Detection Illinois Vision Workshop Dec. 1, 2009 Liangliang Cao Dept. ECE, UIUC Zicheng Liu Microsoft Research Thomas Huang Dept. ECE, UIUC Motivation Action recognition is important in

More information

Energy-based Global Ternary Image for Action Recognition Using Sole Depth Sequences

Energy-based Global Ternary Image for Action Recognition Using Sole Depth Sequences 2016 Fourth International Conference on 3D Vision Energy-based Global Ternary Image for Action Recognition Using Sole Depth Sequences Mengyuan Liu Key Laboratory of Machine Perception Shenzhen Graduate

More information

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

FACE RECOGNITION USING SUPPORT VECTOR MACHINES FACE RECOGNITION USING SUPPORT VECTOR MACHINES Ashwin Swaminathan ashwins@umd.edu ENEE633: Statistical and Neural Pattern Recognition Instructor : Prof. Rama Chellappa Project 2, Part (b) 1. INTRODUCTION

More information

Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition

Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition Abstract Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly

More information

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg Human Detection A state-of-the-art survey Mohammad Dorgham University of Hamburg Presentation outline Motivation Applications Overview of approaches (categorized) Approaches details References Motivation

More information

Tracking Using Online Feature Selection and a Local Generative Model

Tracking Using Online Feature Selection and a Local Generative Model Tracking Using Online Feature Selection and a Local Generative Model Thomas Woodley Bjorn Stenger Roberto Cipolla Dept. of Engineering University of Cambridge {tew32 cipolla}@eng.cam.ac.uk Computer Vision

More information

Motion Sensors for Activity Recognition in an Ambient-Intelligence Scenario

Motion Sensors for Activity Recognition in an Ambient-Intelligence Scenario 5th International Workshop on Smart Environments and Ambient Intelligence 2013, San Diego (22 March 2013) Motion Sensors for Activity Recognition in an Ambient-Intelligence Scenario Pietro Cottone, Giuseppe

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Dictionary of gray-level 3D patches for action recognition

Dictionary of gray-level 3D patches for action recognition Dictionary of gray-level 3D patches for action recognition Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin To cite this version: Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin. Dictionary of

More information

3D Human Action Segmentation and Recognition using Pose Kinetic Energy

3D Human Action Segmentation and Recognition using Pose Kinetic Energy 214 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO) September 11-13, 214. Evanston, Illinois, USA 3D Human Action Segmentation and Recognition using Pose Kinetic Energy Junjie Shan and

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

RECOGNIZING human activities has been widely applied to

RECOGNIZING human activities has been widely applied to 1028 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 5, MAY 2017 Super Normal Vector for Human Activity Recognition with Depth Cameras Xiaodong Yang, Member, IEEE, and YingLi

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b

More information

Structure-adaptive Image Denoising with 3D Collaborative Filtering

Structure-adaptive Image Denoising with 3D Collaborative Filtering , pp.42-47 http://dx.doi.org/10.14257/astl.2015.80.09 Structure-adaptive Image Denoising with 3D Collaborative Filtering Xuemei Wang 1, Dengyin Zhang 2, Min Zhu 2,3, Yingtian Ji 2, Jin Wang 4 1 College

More information

Gradient Local Auto-Correlations and Extreme Learning Machine for Depth-Based Activity Recognition

Gradient Local Auto-Correlations and Extreme Learning Machine for Depth-Based Activity Recognition Gradient Local Auto-Correlations and Extreme Learning Machine for Depth-Based Activity Recognition Chen Chen 1, Zhenjie Hou 2(&), Baochang Zhang 3, Junjun Jiang 4, and Yun Yang 3 1 Department of Electrical

More information

arxiv: v1 [cs.cv] 6 Jun 2015

arxiv: v1 [cs.cv] 6 Jun 2015 First-Take-All: Temporal Order-Preserving Hashing for 3D Action Videos Jun Ye jye@cs.ucf.edu Hao Hu hao hu@knights.ucf.edu Kai Li kaili@cs.ucf.edu arxiv:1506.02184v1 [cs.cv] 6 Jun 2015 Guo-Jun Qi guojun.qi@ucf.edu

More information

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING Nandita M. Nayak, Amit K. Roy-Chowdhury University of California, Riverside ABSTRACT We present an approach which incorporates spatiotemporal

More information

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada Spatio-Temporal Salient Features Amir H. Shabani Vision and Image Processing Lab., University of Waterloo, ON CRV Tutorial day- May 30, 2010 Ottawa, Canada 1 Applications Automated surveillance for scene

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained

More information

Face Detection and Recognition in an Image Sequence using Eigenedginess

Face Detection and Recognition in an Image Sequence using Eigenedginess Face Detection and Recognition in an Image Sequence using Eigenedginess B S Venkatesh, S Palanivel and B Yegnanarayana Department of Computer Science and Engineering. Indian Institute of Technology, Madras

More information

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 54 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 54 (2015 ) 746 755 Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015) Video-Based Face

More information

NAIVE BAYESIAN FUSION FOR ACTION RECOGNITION FROM KINECT

NAIVE BAYESIAN FUSION FOR ACTION RECOGNITION FROM KINECT NAIVE BAYESIAN FUSION FOR ACTION RECOGNITION FROM KINECT Amel Ben Mahjoub 1, Mohamed Ibn Khedher 2, Mohamed Atri 1 and Mounim A. El Yacoubi 2 1 Electronics and Micro-Electronics Laboratory, Faculty of

More information

Semi-Supervised PCA-based Face Recognition Using Self-Training

Semi-Supervised PCA-based Face Recognition Using Self-Training Semi-Supervised PCA-based Face Recognition Using Self-Training Fabio Roli and Gian Luca Marcialis Dept. of Electrical and Electronic Engineering, University of Cagliari Piazza d Armi, 09123 Cagliari, Italy

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information