Action Recognition By Learnt Class-Specific Overcomplete Dictionaries

Size: px
Start display at page:

Download "Action Recognition By Learnt Class-Specific Overcomplete Dictionaries"

Transcription

1 Action Recognition By Learnt Class-Specific Overcomplete Dictionaries Tanaya Guha Electrical and Computer Engineering University of British Columbia Vancouver, Canada Rabab K. Ward Electrical and Computer Engineering University of British Columbia Vancouver, Canada Abstract This paper presents a sparse signal representation based approach to address the problem of human action recognition in videos. For each action, a set of redundant basis (dictionary) is learnt by solving a sparse optimization problem. A dictionary is learnt using the image patches of its corresponding action, such that every patch vector is represented by some linear combination of a small number of basis vectors. By learning one dictionary per action, it is expected that each dictionary can efficiently represent one particular action. We show that such class-specific dictionaries - each representative of one action - provide a powerful means of action classification. Given a query sequence, the classifier seeks the dictionary that best approximates the query class. We have evaluated the proposed approach on the standard datasets. Experimental results demonstrate high accuracy and robustness against occlusion or viewpoint changes. I. INTRODUCTION Understanding human actions is a key component in computer vision research because of its application possibilities in varied fields such as human-computer interface, video surveillance, sports events, video indexing etc. [1]. Various methods have been developed to analyze simple actions and complex activities. One of the existing lines of research is optical flow based data matching [2]. The computation of optical flow vectors is difficult due to smooth surfaces and motion discontinuities [1]. Another approach is to analyze human motion dynamically using state-space modeling [3]. These methods are fit for modeling complex activities but can be computationally intense. Some researchers have also developed algorithms that consider action sequences as 3D space-time volumes or shapes [1], [4]. Feature tracking is an another popular action recognition method that largely depends on the accuracy of the tracking system [5]. Another effective approach is Bag-Of-Words (BOW) modeling which represents an action in terms of codewords of a predefined codebook [6]. Despite the significant research effort, the applicability of most of the action recognition systems is limited by real-life conditions like occlusion, changes in appearance and by computational complexity. Sparse representation of signals has grown into a major field of research in recent years. It is now well-established that signals like natural images admit sparse decomposition, when they are represented by some redundant basis, called dictionary. A crucial step is the design or selection of such a dictionary. One option is to choose from a set of pre-defined dictionaries e.g. curvelets, bandlets, variants of wavelets etc. [7]. Another interesting way is to use the training samples 143 directly as the dictionary columns as proposed by Wright et al. [8]. Recent research shows that it is also possible to learn a dictionary from a set of training examples so that the dictionary is better-adapted to the given data. Practical dictionary learning algorithms like MOD (Method of Optimal Directions) [9] and K-SVD [10] have been proposed. Dictionaries learnt using these algorithms are non-parametric and can yield more compact representation of the signal as they are better adapted to capture the inherent structure of the data. Dictionaries learnt by K-SVD method have been shown to achieve better results compared to the off-theshelf dictionaries in image denoising [11]. Though primarily developed for the need of signal reconstruction, the idea of sparse decomposition has been used to solve several classification problems. Inspired by the success of sparse analysis based solutions to texture segmentation [12], [13], face recognition [8] and audio classification [14], we propose a sparse signal representation based approach for learning and recognizing human actions in videos. This paper aims at recognizing human actions using learnt overcomplete dictionaries. Our method is silhouette-based; it requires representing each video sequence by a single image, computed by averaging over all the silhouettes extracted in each frame. We refer to such an averaged silhouette image as an averaged Motion Energy Image (MEI) [4]. The training phase of the proposed method involves learning one overcomplete dictionary for every action. A dictionary for a particular action is learnt by adapting a set of basis vectors to a large number of random patches extracted from the averaged MEIs corresponding to that particular action. This is done in such a way that each input patch has a sparse representation over the basis vectors. A dictionary tailored to represent a particular class is expected to have an efficient i.e. more sparse representation for that class of signals and to be less efficient (not so sparse) in representing signals that belong to a different class. Thus class-specific dictionaries become representative of their corresponding classes. Given a query action video, the classification problem is equivalent to finding the dictionary among the learnt ones that best approximates the query. The proposed method has the following advantages: The dictionary learning process of one class is independent of other classes. This training process can be easily parallelized. Also, no change is necessary on the part of the existing data when a new class is added to the

2 system. This is a huge advantage, as in practice most databases have to be continuously updated with addition of new classes. We do not assume the availability of a large number of training samples per class. The dictionaries can be learnt even when a single training sample is available per class. The use of Random Projections for dimensionality reduction, makes the approach fast and computationally efficient. Difference from BOW modeling: The dictionary learning approach is significantly different from the traditional BOW approach. We represent each input vector as a linear combination of a small number of codewords called dictionary columns, while in BOW modeling, an input vector is approximated by only one codeword i.e. a column vector in a codebook/dictionary. We also advocate the learning of class-specific dictionaries as opposed to constructing a single dictionary for all classes in the BOW approach. Note that, our method also notably differs from the sparse face recognition approach presented in [8]. While we learn the dictionary atoms from the training data by solving an optimization problem, in [8] the training samples are directly used as dictionary atoms. To evaluate the proposed method, we have designed various experiments on the Weizmann human action and Robustness datasets [1], subsets of the KTH human motion [15] and UCF sports dataset [16]. The experimental results show high accuracy and robustness of the approach against partial occlusion, viewpoint and scale changes. II. THE PROPOSED METHOD Assume that there are C classes of actions and c training samples per class; the training samples are represented by A ij, i [1, 2,..., C] and j [1, 2,..., c]. Our aim is to learn C separate dictionaries and to use them for classification of any new video sequence. A. Video Representation For each action sequence A ij, i [1, 2,..., C], j [1, 2,..., c] silhouettes are extracted at each frame. A simple normalized cross-correlation based registration technique is used to align the human figures. The aligned silhouettes are then transformed to an averaged MEI denoted as I ij R M M, by simply taking their mean. This MEI representation implicitly captures both the shape and the temporal information of human actions and is very fast to compute. Figure 1 shows examples of the averaged MEIs for the 10 actions in the Weizmann dataset. B. Multiscale Patch Domain Analysis Each MEI is subjected to local image analysis at two scales; Random overlapping patches are extracted from the original MEI and its downsampled (and smoothed) version. At each scale ρ patches of size η η are extracted. Ideally, one patch centered at every image pixel is to be extracted; but in practice, extracting any large number of patches is 144 Fig. 1. Averaged MEIs for actions (clockwise) bend, jack, jump, pjump, run, wave2, wave1,, skip, side. sufficient. Every single patch can be represented as a vector of length η 2. The process of patch extraction can be described as the act of a linear operator Φ. Φ:R M M R η2 2ρ The patches with no or little information are removed by hard thresholding. The set of all the informative patches collected over c training samples of a class constitute the training set for dictionary learning. Let such a collection of patches be denoted by P i R η2 2cρ, i [1, 2,..., C] and 2cρ is the number of patches extracted per class. C. Dimensionality Reduction A patch size as small as produces a vector of dimension [256 1]. It asks for more than 500 basis vectors (with redundancy 2) to be learnt for each dictionary, in order to secure a sparse representation of the input data. This high dimensionality seriously limits the speed and efficiency of our algorithm. A natural solution is to reduce the dimensionality. To obtain a lower dimensional representation, the application of standard methods like Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) etc. is wide-spread. Recently Random Projection (RP) has emerged as a powerful tool [17]. Theoretical results show that the random projections of high dimensional data into a lower dimensional subspace can preserve the distances between vectors quite well. RP is very simple to use and has low computational complexity. The original d(= η 2 )-dimensional data is projected onto an n dimensional subspace (n << d) by premultiplying the data matrix P i by a random matrix R proj R n d. In practice, any normally distributed R proj with zero mean and unit variance serves the purpose. The dimensionality reduction is performed as (1) Y i = R proj P i (2) where i [1, 2,..., C]. The dimensionality reduced data matrix Y i R n 2cρ contains the projections (not true projections, because they are not orthogonal) of P i into some random n dimensional subspace. D. Dictionary Learning The next step is to learn C redundant dictionaries D i, i [1, 2,..., C] i.e. one dictionary per class. The idea of fitting

3 Dictionary size Accuracy (%) TABLE I RELATION BETWEEN CLASS-SPECIFIC DICTIONARY SIZE AND CLASSIFICATION ACCURACY (τ 1 10% OF K AND τ 2 =2) Fig. 2. Sample dictionary atoms from the dictionary learnt by K-SVD method for the action bend a dictionary to a set of training examples was originally proposed by Olshausen and Field in 1996 [18]. Recently an algorithm known as K-SVD has been proposed for adapting dictionaries to the data [10]. We have applied K-SVD to learn the class-specific dictionaries because it exhibits faster convergence than its competitors. Consider a set of dimensionality reduced patches as our training signals Y = {y 1, y 2,..., y N } where y i R n and N =2cρ. We intend to find a dictionary matrix D R n K having K (K > n) atoms {d 1, d 2,..., d K }, over which Y has a sparse representation X = {x 1, x 2,..., x N } where x i R K. In other words, we seek a decomposition such that every patch in Y is represented by a linear combination of no more than τ 1 (τ 1 << K) dictionary atoms. This optimization problem can be formally written as min D, X { } Y DX 2 2 subject to x i 0 τ 1 (3) K-SVD solves (3) by performing two steps at each iteration: (i) sparse coding and (ii) dictionary update. In the sparse coding stage, D is kept fixed and the coefficient matrix X is computed by a pursuit algorithm. Next, the dictionary D is updated sequentially allowing the relevant coefficients in X to change as well. This iterative process of simultaneously updating the dictionary atoms and the coefficients is unique to K-SVD and results in a faster convergence. The dictionary atoms are always normalized. For the details of this algorithm, please refer to the works of Aharon et al. [10], [11]. Fig. 2 displays some sample atoms of the dictionary learnt for the action bend. Note that, (3) has a term containing. 0 i.e. the l 0 seminorm that counts number of non-zero elements in a vector. The l 0 term renders the problem non-convex. Hence, solving (3) accurately is NP hard. Nonetheless, it is wellknown that if τ 1 is small compared to K, approximate solutions of such problems can be found using pursuit algorithms. The simplest pursuit algorithms are Matching Pursuit (MP) and Orthogonal MP (OMP). These two algorithms are greedy methods that select the dictionary elements sequentially by computing inner products between the input data and the dictionary atoms. OMP differs from MP by the orthogonalization step it performs after each basis is selected. Another approach is to render the problem as a convex one by replacing the l 0 norm by the l 1 norm. This is known as Basis Pursuit (BP) algorithm. The FOCal Under-determined Approach Accuracy (%) Time/classification Simple reconstruction sec Proposed method sec TABLE II COMPARISON WITH SIMPLE RECONSTRUCTIVE APPROACH System Solver (FOCUSS) uses the more generalized l p norm where p 1. K-SVD is flexible and can work with any of these pursuit algorithms. E. Classification Strategy Consider a query sequence V Q. After transforming it to an averaged MEI, a large number of random patches are extracted and projected into the same lower dimensional subspace denoted by R proj. The set of dimensionality reduced patches taken from V Q is denoted as Q = {q 1, q 2,..., q N } where q i R n. Given the dictionaries of all classes, the most simple classification approach is to approximate the test patches by each of the C dictionaries with some constant sparsity τ 2. This generates C different reconstruction errors. The dictionary that produces the minimum error determines the class of V Q. This will be referred to as the simple reconstructive approach. The reconstruction error has already been proved to be discriminative for texture segmentation [12]. We propose an alternative classification strategy that uses sparsity as the discriminating criterion. This method concatenates the class-specific dictionaries together to create a bigger dictionary, D S. D S =[D 1 D 2... D C ] (4) where D S R n KC. With the help of a pursuit algorithm we find X Q which is the sparse representation of Q over D S as follows: min Q D S X Q 2 X 2 subject to x Q 0 τ 2 (5) Q The resulting coefficient matrix X Q can also be written as X Q =[X D1 X D2... X DC ] T (6) where X Di is the coefficient matrix corresponding to the ith dictionary. Assuming V Q belongs to class q, ideally Q should use only the atoms of D q for its linear decomposition, which means all the non-zeros coefficients should be concentrated in X Dq. Although, this condition is difficult to achieve in practice because of the correlation amongst atoms of different dictionaries and input vectors, we can still expect 145

4 Fig. 4. Confusion matrix for the Weizmann action dataset. Overall recognition rate 97.7% Fig. 3. Recognition accuracy vs. sparsity for dictionary size that a large number of non-zero coefficients should come from X Dq. The estimated class î of Q is given by î = argmax X Di 0 (7) i [1,2,...,C] This means that the query V Q is assigned to the class whose dictionary has the maximum contribution (number of nonzeros) towards the sparse representation of the patches of Q. Clearly Q is block sparse; this is because the non-zero coefficients in X Q occur in clusters. This encourages us to exploit block sparsity as an additional structure. But, each block of D S is a redundant dictionary, which makes it difficult to use block sparsity promoting algorithms like Block OMP (BOMP) [19]. We have used BOMP and observed that results are neither consistent nor very accurate. Why do we choose to solve a concatenated system? Solving a single system with a huge dictionary of size n KC seems identical to solving each of the C systems where the dictionary size of each system is n K. But this is not the case. Assuming that the least squares problem can be solved with a cost of λ per iteration, the cost of solving the bigger system is approximately τ 2 (nkl + λ), for τ 2 iterations. If the systems are solved separately L times, the cost equals Lτ 2 (nk + λ). This indicates that solving the system with the concatenated dictionary is L times cheaper. Fig. 5. Sample MEIs from the Robustness dataset: subjects ing at different angles of 90 low-resolution ( , deinterlaced 50 fps) videos of 9 different subjects, each performing 10 natural actions: bend, jumping jack (jack), jump forward (jump), jump in place (pjump), run, gallop sideways (side), skip,, wave one hand (wave1) and wave both hands (wave2). This dataset uses a fixed camera setting and a simple background. The training process uses 8 averaged MEIs per class. From each averaged MEI, a large number (ρ =2, 000) patches of size are extracted, a thousand at each scale. To learn a class-specific dictionary, we thus have a training set of 16, 000 patches. These patches have energy values above some empirically set threshold level. Thresholding is important because lots of homogeneous patches can be extracted from silhouettes that do not contribute to the training process. Each patch is converted to a vector of dimension η 2 = 256 to form P i R ,000 where i [1, 2,..., C]. A normally distributed random matrix R proj R with zero mean and unit variance is constructed. The dimension Approach Avg. Accuracy (%) Test sequence Classified as Gorelick et al. 07 [1] 97.8 Niebles et al. 08 [20] 90.0 ing in 0 Lin et al. 09 [21] 100 ing in 9 Yao et al. 10 [22] 95.6 ing in 18 Proposed 97.7 ing in 27 ing in 36 TABLE III ing in 45 COMPARISON WITH RECENT WORKS ON THE WEIZMANN DATASET ing in 54 ing in 63 jump ing in 72 jump ing in 81 jump III. PERFORMANCE EVALUATION TABLE IV A. Training on the Weizmann action dataset To evaluate the proposed approach, we first train the PERFORMANCE UNDER VIEWPOINT CHANGES (SYSTEM TRAINED WITH SUBJECTS WALKING IN 0 ) system on the Weizmann human action dataset [1]. It consists 146

5 Fig. 6. Sample MEIs from the Robustness dataset: ing with a dog, knees up, occluded legs and swinging a bag Test sequence normal ing in a skirt carrying briefcase limping man occluded legs knees up ing with a dog sleeping swinging a bag occluded by a pole Classified as jump TABLE V PERFORMANCE UNDER OCCLUSION AND OTHER DIFFICULT SCENARIOS of P i is reduced by premultiplying it with R proj as in (2). The resulting training set has dimension 64 16, 000. At every run a new R proj is constructed. For every class of action, a dictionary of size (K = 128) is learnt. The dictionary columns are initialized with random vectors taken from the reduced Y. Ten dictionaries, one per action, are learnt using the K-SVD algorithm with τ 1 =12(approximately 10% of the value of K) and 20 K-SVD iterations. Selecting the right size of the dictionary is critical. We set the redundancy to be 2 and simply find the optimal dictionary size empirically (refer to Table I). Fig. 7. Sample MEIs from the KTH subset - run,, wave Action Accuracy (%) Run 75.0 Walk 75.0 Wave 87.5 TABLE VI RESULTS ON THE KTH SUBSET, TRAINED ON THE WEIZMANN DATASET fig.3). This trend roughly indicates that the number of highly discriminative basis vectors present in a class-specific dictionary is small and yet those dictionary atoms are sufficient for decision-making. The highest accuracy achieved in our experiments is 97.7% for τ 2 =2and individual dictionary size [64 128]. Note that, for τ 2 =1, OMP performs only 1 iteration. In this iteration it selects, for every input vector, the best dictionary atom (i.e. the dictionary atom which forms the biggest inner product with the input vector). This corresponds to finding the nearest neighbor of each vector. We have compared our method with the simple reconstructive approach, which with the best combination of parameters (dictionary size: , τ 2 =6) misclassifies 8 out of the 90 sequences. The results shown in Table II confirms that the proposed method is superior in terms of both accuracy and speed. Finally, we compare our approach with a number of recent works in Table III. The proposed method is comparable to the state-of-the-art. B. Results on the Weizmann action dataset In the classification stage, all 10 class-specific dictionaries C. Results on the Robustness dataset are concatenated to create D S R Given a new test video sequence, a set of 2, 000 patches are extracted and projected on the same lower subspace defined by R proj. The projected set of data is denoted by Q R 64 2,000 which is sparse-coded over all the elements of D S.Wehave used OMP to solve the sparse representation problem since OMP provides a straightforward control over the number of nonzeros in terms of iterations. Also from a run-time point of view, OMP is more efficient compared to other sophisticated pursuit algorithms. At each run we train the system with the video sequences of 8 subjects and perform testing with the remaining one subject. The recognition rates reported below are the results averaged over 9 runs. Figure 4 presents the recognition results achieved by the proposed approach. Our method classifies all the actions perfectly except for the action wave1, which is confused D. Results on the KTH subset with a very similar action wave2. It misclassifies only 2 out of 90 sequences producing an error rate of only 2.22%. We also observed that the recognition accuracy improves as the sparsity increases i.e. as τ 2 in (5) gets smaller (refer to 147 This dataset is designed to test the robustness of an algorithm against changes in viewpoint, partial occlusion and unusual action scenarios. It consists of 20 videos of subjects ing in a non-uniform background, creating various difficult scenarios. Sample METs of this dataset can be found in figs. 5 and 6. We have used the videos of all 9 subjects available in the action dataset to train the dictionary and performed testing with the sequences of the robustness dataset. The proposed method perform well under changes in viewpoint (Table. IV) other than extreme conditions, since the system is only trained with subjects ing in 0. The performance of the proposed method is encouraging (90%) under occlusion and other difficult scenarios (Table V). The system is also tested on a subset of the KTH dataset, consisting of 48 videos of three actions (the common actions between Weizmann and KTH) - running, ing and waving. These test sequences use fixed camera setting and

6 will be to enhance the system to work for real-world videos, complex multi-subject activities etc. Fig. 8. dataset Sample MEIs of running sequences taken from the UCF sports homogeneous background bu contains viewpoint changes, illumination and scale variations. Some sample MEIs are presented in fig. 7 and results are shown in Table VI. E. Results on the UCF sports subset Finally we test the system on 10 running sequences taken from the UCF sports dataset. These are real-world videos having unrestricted camera motion, occlusion, background clutter etc. Since the background is not known, the silhouettes are extracted by simple thresholding and morphological operations, which are noisy and crude (fig. 8). The proposed method could recognize only 4 out of the 10 sequences i.e. produces a recognition rate of 40%. IV. CONCLUSION We propose a novel approach for modeling and recognizing human actions. The main contributions of our work are: Sparse representation of human actions by learning nonparametric, overcomplete dictionaries, instead of the commonly used predefined dictionaries. This, to the best of our knowledge, has not been explored yet. A classification strategy involving the class-specific learnt dictionaries instead of using a single shared dictionary. The proposed method achieves high recognition accuracy and is robust to partial occlusion, viewpoint and scale changes. Though the method works reasonably well on the KTH dataset even being trained on the Weizmann dataset, the performance is not up to the mark for the realistic sports videos. This is because segmentation of silhouettes are difficult for such sequences. The proposed approach completely relies on local image analysis. The local patches though help the recognition process under occlusion and other distortions, if there are many classes with similar local appearance (e.g. wave1 and wave2), the method needs to be improved by incorporating global information and using more sophisticated video representations. Our work is an attempt to show that the sparse representations and class-specific dictionaries can be the key towards developing robust action recognition systems. There are many scopes of future work. Some significant improvements REFERENCES [1] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, IEEE Trans. PAMI, vol. 29, no. 12, pp , December [2] A. A. Efros, A. C. Berg, G. Mori, and J. Malik, Recognizing action at a distance, in Proc. ICCV, [3] J. Yamato, J. Ohya, and K. Ishii, Recognizing human action in timesequential images using hidden markov model, in Proc. CVPR, 1992, pp [4] Aaron F. Bobick and James W. Davis, The recognition of human movement using temporal templates, IEEE Trans. PAMI, vol. 23, pp , [5] C. Bregler, Learning and recognizing human dynamics in video sequences, CVPR, [6] Ziming Zhang, Yiqun Hu, Syin Chan, and Liang-Tien Chia, Motion context: A new representation for human action recognition, in Proc. ECCV, 2008, vol. 5305, pp [7] S. Mallat, A wavelet tour of signal processing: The sparse way, 3rd Ed, Academic Press, NY, [8] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, Robust face recognition via sparse representation, IEEE Trans. PAMI, vol. 31, pp , [9] K. Engan, S. O. Aase, and J. H. Husoy, Frame based signal compression using method of optimal directions (mod), in Proc. ISCAS, [10] M. Aharon, M. Elad, and A. Bruckstein, K-svd: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Trans. Signal Processing, vol. 54, pp , Nov [11] M. Elad and M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Trans. Image Processing, vol. 15, no. 12, pp , Dec [12] G. Peyré, Sparse modeling of textures, J. Math. Imaging Vis., vol. 34, no. 1, pp , [13] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, Discriminative learned dictionaries for local image analysis, in Proc. CVPR, [14] R. Grosse, R. Raina, H. Kwong, and A. Y. Ng., Shift-invariant sparse coding for audio classification, in Proc. UAI, [15] C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local svm approach, in Proc. ICPR, 2004, vol. 3, pp Vol.3. [16] M.D. Rodriguez, J. Ahmed, and M. Shah, Action mach a spatiotemporal maximum average correlation height filter for action recognition, in Proc. CVPR 2008, 2008, pp [17] Ella Bingham and Heikki Mannila, Random projection in dimensionality reduction: applications to image and text data, in Proc. ACM int. conf. Knowledge discovery and data mining, New York, NY, USA, 2001, pp [18] B.A. Olshausen and D. J. Field, Natural image statistics and efficient coding, Network: Computation in Neural Systems, vol. 7, no. 2, pp , [19] Yonina C. Eldar and Helmut Bolcskei, Block-sparsity: Coherence and efficient recovery, in Proc. ICASSP, 2009, pp [20] Juan Niebles, Hongcheng Wang, and Li Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, Int. J. Computer Vision, vol. 79, pp , [21] Zhe Lin, Zhuolin Jiang, and Larry S. Davis, Recognizing actions by shape-motion prototype trees, in Proc. ICCV, [22] A. Yao, J. Gall, and L. Van Gool, A hough transform-based voting framework for action recognition, in Proc. CVPR, 2010, pp

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France ABSTRACT

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Human Activity Recognition Using a Dynamic Texture Based Method

Human Activity Recognition Using a Dynamic Texture Based Method Human Activity Recognition Using a Dynamic Texture Based Method Vili Kellokumpu, Guoying Zhao and Matti Pietikäinen Machine Vision Group University of Oulu, P.O. Box 4500, Finland {kello,gyzhao,mkp}@ee.oulu.fi

More information

Discriminative sparse model and dictionary learning for object category recognition

Discriminative sparse model and dictionary learning for object category recognition Discriative sparse model and dictionary learning for object category recognition Xiao Deng and Donghui Wang Institute of Artificial Intelligence, Zhejiang University Hangzhou, China, 31007 {yellowxiao,dhwang}@zju.edu.cn

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Image Restoration and Background Separation Using Sparse Representation Framework

Image Restoration and Background Separation Using Sparse Representation Framework Image Restoration and Background Separation Using Sparse Representation Framework Liu, Shikun Abstract In this paper, we introduce patch-based PCA denoising and k-svd dictionary learning method for the

More information

Dictionary of gray-level 3D patches for action recognition

Dictionary of gray-level 3D patches for action recognition Dictionary of gray-level 3D patches for action recognition Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin To cite this version: Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin. Dictionary of

More information

Robust Face Recognition via Sparse Representation

Robust Face Recognition via Sparse Representation Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of

More information

Gesture Recognition Under Small Sample Size

Gesture Recognition Under Small Sample Size Gesture Recognition Under Small Sample Size Tae-Kyun Kim 1 and Roberto Cipolla 2 1 Sidney Sussex College, University of Cambridge, Cambridge, CB2 3HU, UK 2 Department of Engineering, University of Cambridge,

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

IMA Preprint Series # 2281

IMA Preprint Series # 2281 DICTIONARY LEARNING AND SPARSE CODING FOR UNSUPERVISED CLUSTERING By Pablo Sprechmann and Guillermo Sapiro IMA Preprint Series # 2281 ( September 2009 ) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS UNIVERSITY

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

A Learned Dictionary Model for Texture Classification

A Learned Dictionary Model for Texture Classification Clara Fannjiang clarafj@stanford.edu. Abstract. Introduction Biological visual systems process incessant streams of natural images, and have done so since organisms first developed vision. To capture and

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Human Action Recognition in Videos Using Hybrid Motion Features

Human Action Recognition in Videos Using Hybrid Motion Features Human Action Recognition in Videos Using Hybrid Motion Features Si Liu 1,2, Jing Liu 1,TianzhuZhang 1, and Hanqing Lu 1 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Automatic Gait Recognition. - Karthik Sridharan

Automatic Gait Recognition. - Karthik Sridharan Automatic Gait Recognition - Karthik Sridharan Gait as a Biometric Gait A person s manner of walking Webster Definition It is a non-contact, unobtrusive, perceivable at a distance and hard to disguise

More information

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2 Vol.3, Issue 3, 2015, Page.1115-1021 Effect of Anti-Forensics and Dic.TV Method for Reducing Artifact in JPEG Decompression 1 Deepthy Mohan, 2 Sreejith.H 1 PG Scholar, 2 Assistant Professor Department

More information

An Optimized Pixel-Wise Weighting Approach For Patch-Based Image Denoising

An Optimized Pixel-Wise Weighting Approach For Patch-Based Image Denoising An Optimized Pixel-Wise Weighting Approach For Patch-Based Image Denoising Dr. B. R.VIKRAM M.E.,Ph.D.,MIEEE.,LMISTE, Principal of Vijay Rural Engineering College, NIZAMABAD ( Dt.) G. Chaitanya M.Tech,

More information

Patch-Based Color Image Denoising using efficient Pixel-Wise Weighting Techniques

Patch-Based Color Image Denoising using efficient Pixel-Wise Weighting Techniques Patch-Based Color Image Denoising using efficient Pixel-Wise Weighting Techniques Syed Gilani Pasha Assistant Professor, Dept. of ECE, School of Engineering, Central University of Karnataka, Gulbarga,

More information

Space-Time Shapelets for Action Recognition

Space-Time Shapelets for Action Recognition Space-Time Shapelets for Action Recognition Dhruv Batra 1 Tsuhan Chen 1 Rahul Sukthankar 2,1 batradhruv@cmu.edu tsuhan@cmu.edu rahuls@cs.cmu.edu 1 Carnegie Mellon University 2 Intel Research Pittsburgh

More information

Human Action Recognition Using Independent Component Analysis

Human Action Recognition Using Independent Component Analysis Human Action Recognition Using Independent Component Analysis Masaki Yamazaki, Yen-Wei Chen and Gang Xu Department of Media echnology Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577,

More information

Bag of Optical Flow Volumes for Image Sequence Recognition 1

Bag of Optical Flow Volumes for Image Sequence Recognition 1 RIEMENSCHNEIDER, DONOSER, BISCHOF: BAG OF OPTICAL FLOW VOLUMES 1 Bag of Optical Flow Volumes for Image Sequence Recognition 1 Hayko Riemenschneider http://www.icg.tugraz.at/members/hayko Michael Donoser

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING Idan Ram, Michael Elad and Israel Cohen Department of Electrical Engineering Department of Computer Science Technion - Israel Institute of Technology

More information

Temporal Feature Weighting for Prototype-Based Action Recognition

Temporal Feature Weighting for Prototype-Based Action Recognition Temporal Feature Weighting for Prototype-Based Action Recognition Thomas Mauthner, Peter M. Roth, and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {mauthner,pmroth,bischof}@icg.tugraz.at

More information

Face Recognition via Sparse Representation

Face Recognition via Sparse Representation Face Recognition via Sparse Representation John Wright, Allen Y. Yang, Arvind, S. Shankar Sastry and Yi Ma IEEE Trans. PAMI, March 2008 Research About Face Face Detection Face Alignment Face Recognition

More information

An Approach for Reduction of Rain Streaks from a Single Image

An Approach for Reduction of Rain Streaks from a Single Image An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute

More information

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

FACE RECOGNITION USING SUPPORT VECTOR MACHINES FACE RECOGNITION USING SUPPORT VECTOR MACHINES Ashwin Swaminathan ashwins@umd.edu ENEE633: Statistical and Neural Pattern Recognition Instructor : Prof. Rama Chellappa Project 2, Part (b) 1. INTRODUCTION

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

IMAGE SUPER-RESOLUTION BASED ON DICTIONARY LEARNING AND ANCHORED NEIGHBORHOOD REGRESSION WITH MUTUAL INCOHERENCE

IMAGE SUPER-RESOLUTION BASED ON DICTIONARY LEARNING AND ANCHORED NEIGHBORHOOD REGRESSION WITH MUTUAL INCOHERENCE IMAGE SUPER-RESOLUTION BASED ON DICTIONARY LEARNING AND ANCHORED NEIGHBORHOOD REGRESSION WITH MUTUAL INCOHERENCE Yulun Zhang 1, Kaiyu Gu 2, Yongbing Zhang 1, Jian Zhang 3, and Qionghai Dai 1,4 1 Shenzhen

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

NTHU Rain Removal Project

NTHU Rain Removal Project People NTHU Rain Removal Project Networked Video Lab, National Tsing Hua University, Hsinchu, Taiwan Li-Wei Kang, Institute of Information Science, Academia Sinica, Taipei, Taiwan Chia-Wen Lin *, Department

More information

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION 1 J.H. Ma, 1 P.C. Yuen, 1 W.W. Zou, 2 J.H. Lai 1 Hong Kong Baptist University 2 Sun Yat-sen University ICCV workshop on Machine

More information

INCOHERENT DICTIONARY LEARNING FOR SPARSE REPRESENTATION BASED IMAGE DENOISING

INCOHERENT DICTIONARY LEARNING FOR SPARSE REPRESENTATION BASED IMAGE DENOISING INCOHERENT DICTIONARY LEARNING FOR SPARSE REPRESENTATION BASED IMAGE DENOISING Jin Wang 1, Jian-Feng Cai 2, Yunhui Shi 1 and Baocai Yin 1 1 Beijing Key Laboratory of Multimedia and Intelligent Software

More information

Learning View-invariant Sparse Representations for Cross-view Action Recognition

Learning View-invariant Sparse Representations for Cross-view Action Recognition 2013 IEEE International Conference on Computer Vision Learning View-invariant Sparse Representations for Cross-view Action Recognition Jingjing Zheng, Zhuolin Jiang University of Maryland, College Park,

More information

Multi-Class Image Classification: Sparsity Does It Better

Multi-Class Image Classification: Sparsity Does It Better Multi-Class Image Classification: Sparsity Does It Better Sean Ryan Fanello 1,2, Nicoletta Noceti 2, Giorgio Metta 1 and Francesca Odone 2 1 Department of Robotics, Brain and Cognitive Sciences, Istituto

More information

3D EAR IDENTIFICATION USING LC-KSVD AND LOCAL HISTOGRAMS OF SURFACE TYPES. Lida Li, Lin Zhang, Hongyu Li

3D EAR IDENTIFICATION USING LC-KSVD AND LOCAL HISTOGRAMS OF SURFACE TYPES. Lida Li, Lin Zhang, Hongyu Li 3D EAR IDENTIFICATION USING LC-KSVD AND LOCAL HISTOGRAMS OF SURFACE TYPES Lida Li, Lin Zhang, Hongyu Li School of Software Engineering Tongji University Shanghai 201804, China ABSTRACT In this paper, we

More information

Image Deblurring Using Adaptive Sparse Domain Selection and Adaptive Regularization

Image Deblurring Using Adaptive Sparse Domain Selection and Adaptive Regularization Volume 3, No. 3, May-June 2012 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Image Deblurring Using Adaptive Sparse

More information

Learning Dictionaries of Discriminative Image Patches

Learning Dictionaries of Discriminative Image Patches Downloaded from orbit.dtu.dk on: Nov 22, 2018 Learning Dictionaries of Discriminative Image Patches Dahl, Anders Bjorholm; Larsen, Rasmus Published in: Proceedings of the British Machine Vision Conference

More information

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Ron Rubinstein, Michael Zibulevsky and Michael Elad Abstract The K-SVD algorithm is a highly effective method of training overcomplete

More information

Spatio-temporal Shape and Flow Correlation for Action Recognition

Spatio-temporal Shape and Flow Correlation for Action Recognition Spatio-temporal Shape and Flow Correlation for Action Recognition Yan Ke 1, Rahul Sukthankar 2,1, Martial Hebert 1 1 School of Computer Science, Carnegie Mellon; 2 Intel Research Pittsburgh {yke,rahuls,hebert}@cs.cmu.edu

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Learning Human Actions with an Adaptive Codebook

Learning Human Actions with an Adaptive Codebook Learning Human Actions with an Adaptive Codebook Yu Kong, Xiaoqin Zhang, Weiming Hu and Yunde Jia Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute

More information

Hyperspectral Data Classification via Sparse Representation in Homotopy

Hyperspectral Data Classification via Sparse Representation in Homotopy Hyperspectral Data Classification via Sparse Representation in Homotopy Qazi Sami ul Haq,Lixin Shi,Linmi Tao,Shiqiang Yang Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer

More information

Patch-Based Image Classification Using Image Epitomes

Patch-Based Image Classification Using Image Epitomes Patch-Based Image Classification Using Image Epitomes David Andrzejewski CS 766 - Final Project December 19, 2005 Abstract Automatic image classification has many practical applications, including photo

More information

SHIP WAKE DETECTION FOR SAR IMAGES WITH COMPLEX BACKGROUNDS BASED ON MORPHOLOGICAL DICTIONARY LEARNING

SHIP WAKE DETECTION FOR SAR IMAGES WITH COMPLEX BACKGROUNDS BASED ON MORPHOLOGICAL DICTIONARY LEARNING SHIP WAKE DETECTION FOR SAR IMAGES WITH COMPLEX BACKGROUNDS BASED ON MORPHOLOGICAL DICTIONARY LEARNING Guozheng Yang 1, 2, Jing Yu 3, Chuangbai Xiao 3, Weidong Sun 1 1 State Key Laboratory of Intelligent

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu

More information

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics

More information

Activities as Time Series of Human Postures

Activities as Time Series of Human Postures Activities as Time Series of Human Postures William Brendel and Sinisa Todorovic Oregon State University, Kelley Engineering Center, Corvallis, OR 97331, USA brendelw@onid.orst.edu,sinisa@eecs.oregonstate.edu

More information

Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

Sparse Coding for Learning Interpretable Spatio-Temporal Primitives Sparse Coding for Learning Interpretable Spatio-Temporal Primitives Taehwan Kim TTI Chicago taehwan@ttic.edu Gregory Shakhnarovich TTI Chicago gregory@ttic.edu Raquel Urtasun TTI Chicago rurtasun@ttic.edu

More information

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk

More information

BSIK-SVD: A DICTIONARY-LEARNING ALGORITHM FOR BLOCK-SPARSE REPRESENTATIONS. Yongqin Zhang, Jiaying Liu, Mading Li, Zongming Guo

BSIK-SVD: A DICTIONARY-LEARNING ALGORITHM FOR BLOCK-SPARSE REPRESENTATIONS. Yongqin Zhang, Jiaying Liu, Mading Li, Zongming Guo 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) BSIK-SVD: A DICTIONARY-LEARNING ALGORITHM FOR BLOCK-SPARSE REPRESENTATIONS Yongqin Zhang, Jiaying Liu, Mading Li, Zongming

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Virtual Training Samples and CRC based Test Sample Reconstruction and Face Recognition Experiments Wei HUANG and Li-ming MIAO

Virtual Training Samples and CRC based Test Sample Reconstruction and Face Recognition Experiments Wei HUANG and Li-ming MIAO 7 nd International Conference on Computational Modeling, Simulation and Applied Mathematics (CMSAM 7) ISBN: 978--6595-499-8 Virtual raining Samples and CRC based est Sample Reconstruction and Face Recognition

More information

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid

More information

A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance

A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance Fabio Martínez, Antoine Manzanera, Eduardo Romero To cite this version: Fabio Martínez,

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Cross-View Action Recognition via a Transferable Dictionary Pair

Cross-View Action Recognition via a Transferable Dictionary Pair ZHENG et al:: CROSS-VIEW ACTION RECOGNITION 1 Cross-View Action Recognition via a Transferable Dictionary Pair Jingjing Zheng 1 zjngjng@umiacs.umd.edu Zhuolin Jiang 2 zhuolin@umiacs.umd.edu P. Jonathon

More information

Incremental Action Recognition Using Feature-Tree

Incremental Action Recognition Using Feature-Tree Incremental Action Recognition Using Feature-Tree Kishore K Reddy Computer Vision Lab University of Central Florida kreddy@cs.ucf.edu Jingen Liu Computer Vision Lab University of Central Florida liujg@cs.ucf.edu

More information

Sparse Coding and Dictionary Learning for Image Analysis

Sparse Coding and Dictionary Learning for Image Analysis Sparse Coding and Dictionary Learning for Image Analysis Part IV: Recent Advances in Computer Vision and New Models Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro CVPR 10 tutorial, San Francisco,

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Cervigram Image Segmentation Based On Reconstructive Sparse Representations

Cervigram Image Segmentation Based On Reconstructive Sparse Representations Cervigram Image Segmentation Based On Reconstructive Sparse Representations Shaoting Zhang 1, Junzhou Huang 1, Wei Wang 2, Xiaolei Huang 2, Dimitris Metaxas 1 1 CBIM, Rutgers University 110 Frelinghuysen

More information

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University Tracking Hao Guan( 管皓 ) School of Computer Science Fudan University 2014-09-29 Multimedia Video Audio Use your eyes Video Tracking Use your ears Audio Tracking Tracking Video Tracking Definition Given

More information

Expanding gait identification methods from straight to curved trajectories

Expanding gait identification methods from straight to curved trajectories Expanding gait identification methods from straight to curved trajectories Yumi Iwashita, Ryo Kurazume Kyushu University 744 Motooka Nishi-ku Fukuoka, Japan yumi@ieee.org Abstract Conventional methods

More information

Total Variation Denoising with Overlapping Group Sparsity

Total Variation Denoising with Overlapping Group Sparsity 1 Total Variation Denoising with Overlapping Group Sparsity Ivan W. Selesnick and Po-Yu Chen Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu 2 Abstract This paper describes

More information

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING Jianzhou Feng Li Song Xiaog Huo Xiaokang Yang Wenjun Zhang Shanghai Digital Media Processing Transmission Key Lab, Shanghai Jiaotong University

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Evaluation of local descriptors for action recognition in videos

Evaluation of local descriptors for action recognition in videos Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,

More information

Human Action Recognition Using Silhouette Histogram

Human Action Recognition Using Silhouette Histogram Human Action Recognition Using Silhouette Histogram Chaur-Heh Hsieh, *Ping S. Huang, and Ming-Da Tang Department of Computer and Communication Engineering Ming Chuan University Taoyuan 333, Taiwan, ROC

More information

Action Recognition by Learning Mid-level Motion Features

Action Recognition by Learning Mid-level Motion Features Action Recognition by Learning Mid-level Motion Features Alireza Fathi and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada {alirezaf,mori}@cs.sfu.ca Abstract This paper

More information

ACTIVE CLASSIFICATION FOR HUMAN ACTION RECOGNITION. Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas

ACTIVE CLASSIFICATION FOR HUMAN ACTION RECOGNITION. Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas ACTIVE CLASSIFICATION FOR HUMAN ACTION RECOGNITION Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas Depeartment of Informatics, Aristotle University of Thessaloniki, Greece {aiosif,tefas,pitas}@aiia.csd.auth.gr

More information

Greedy vs. L1 Convex Optimization in Sparse Coding Ren, Huamin; Pan, Hong; Olsen, Søren Ingvor; Moeslund, Thomas B.

Greedy vs. L1 Convex Optimization in Sparse Coding Ren, Huamin; Pan, Hong; Olsen, Søren Ingvor; Moeslund, Thomas B. Aalborg Universitet Greedy vs. L1 Convex Optimization in Sparse Coding Ren, Huamin; Pan, Hong; Olsen, Søren Ingvor; Moeslund, Thomas B. Creative Commons License Unspecified Publication date: 2015 Document

More information

View Invariant Movement Recognition by using Adaptive Neural Fuzzy Inference System

View Invariant Movement Recognition by using Adaptive Neural Fuzzy Inference System View Invariant Movement Recognition by using Adaptive Neural Fuzzy Inference System V. Anitha #1, K.S.Ravichandran *2, B. Santhi #3 School of Computing, SASTRA University, Thanjavur-613402, India 1 anithavijay28@gmail.com

More information

Sparse Variation Dictionary Learning for Face Recognition with A Single Training Sample Per Person

Sparse Variation Dictionary Learning for Face Recognition with A Single Training Sample Per Person Sparse Variation Dictionary Learning for Face Recognition with A Single Training Sample Per Person Meng Yang, Luc Van Gool ETH Zurich Switzerland {yang,vangool}@vision.ee.ethz.ch Lei Zhang The Hong Kong

More information

Learning Human Motion Models from Unsegmented Videos

Learning Human Motion Models from Unsegmented Videos In IEEE International Conference on Pattern Recognition (CVPR), pages 1-7, Alaska, June 2008. Learning Human Motion Models from Unsegmented Videos Roman Filipovych Eraldo Ribeiro Computer Vision and Bio-Inspired

More information

An Object Detection System using Image Reconstruction with PCA

An Object Detection System using Image Reconstruction with PCA An Object Detection System using Image Reconstruction with PCA Luis Malagón-Borja and Olac Fuentes Instituto Nacional de Astrofísica Óptica y Electrónica, Puebla, 72840 Mexico jmb@ccc.inaoep.mx, fuentes@inaoep.mx

More information

A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth Images

A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth Images 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth

More information

Robust and Secure Iris Recognition

Robust and Secure Iris Recognition Robust and Secure Iris Recognition Vishal M. Patel University of Maryland, UMIACS pvishalm@umiacs.umd.edu IJCB 2011 Tutorial Sparse Representation and Low-Rank Representation for Biometrics Outline Iris

More information

Heat Kernel Based Local Binary Pattern for Face Representation

Heat Kernel Based Local Binary Pattern for Face Representation JOURNAL OF LATEX CLASS FILES 1 Heat Kernel Based Local Binary Pattern for Face Representation Xi Li, Weiming Hu, Zhongfei Zhang, Hanzi Wang Abstract Face classification has recently become a very hot research

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

AUTOMATIC OBJECT EXTRACTION IN SINGLE-CONCEPT VIDEOS. Kuo-Chin Lien and Yu-Chiang Frank Wang

AUTOMATIC OBJECT EXTRACTION IN SINGLE-CONCEPT VIDEOS. Kuo-Chin Lien and Yu-Chiang Frank Wang AUTOMATIC OBJECT EXTRACTION IN SINGLE-CONCEPT VIDEOS Kuo-Chin Lien and Yu-Chiang Frank Wang Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan {iker, ycwang}@citi.sinica.edu.tw

More information

Available online at ScienceDirect. Procedia Computer Science 60 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 60 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science (15 ) 43 437 19th International Conference on Knowledge Based and Intelligent Information and Engineering Systems Human

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Multi-polarimetric SAR image compression based on sparse representation

Multi-polarimetric SAR image compression based on sparse representation . RESEARCH PAPER. Special Issue SCIENCE CHINA Information Sciences August 2012 Vol. 55 No. 8: 1888 1897 doi: 10.1007/s11432-012-4612-9 Multi-polarimetric SAR based on sparse representation CHEN Yuan, ZHANG

More information

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model TAE IN SEOL*, SUN-TAE CHUNG*, SUNHO KI**, SEONGWON CHO**, YUN-KWANG HONG*** *School of Electronic Engineering

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /

More information

Recognition Rate. 90 S 90 W 90 R Segment Length T

Recognition Rate. 90 S 90 W 90 R Segment Length T Human Action Recognition By Sequence of Movelet Codewords Xiaolin Feng y Pietro Perona yz y California Institute of Technology, 36-93, Pasadena, CA 925, USA z Universit a dipadova, Italy fxlfeng,peronag@vision.caltech.edu

More information

Action Recognition from a Small Number of Frames

Action Recognition from a Small Number of Frames Computer Vision Winter Workshop 2009, Adrian Ion and Walter G. Kropatsch (eds.) Eibiswald, Austria, February 4 6 Publisher: PRIP, Vienna University of Technology, Austria Action Recognition from a Small

More information

Greedy algorithms for Sparse Dictionary Learning

Greedy algorithms for Sparse Dictionary Learning Greedy algorithms for Sparse Dictionary Learning Varun Joshi 26 Apr 2017 Background. Sparse dictionary learning is a kind of representation learning where we express the data as a sparse linear combination

More information