Visual Action Recognition

Size: px
Start display at page:

Download "Visual Action Recognition"

Transcription

1 Visual Action Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL / 57

2 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 2 / 57

3 What is an Action? Action: Atomic motion(s) that can be unambiguously distinguished (e.g. sitting down, running). 3 / 57

4 What is an Action? Action: Atomic motion(s) that can be unambiguously distinguished (e.g. sitting down, running). An activity is composed of several actions performed in succession (e.g. dining, meeting a person). 3 / 57

5 What is an Action? Action: Atomic motion(s) that can be unambiguously distinguished (e.g. sitting down, running). An activity is composed of several actions performed in succession (e.g. dining, meeting a person). Event is a combination of activities (e.g. football match, traffic accident). 3 / 57

6 What is Action Recognition? What is Recognition? The recognition of action is to match the observation (e.g. videos) with previously defined patterns and then assign it a label, i.e. action type. Input: an action video; Output: an action label; 4 / 57

7 What is Action Recognition? What is Recognition? Verification: Is the walking man Michael? The recognition of action is to match the observation (e.g. videos) with previously defined patterns and then assign it a label, i.e. action type. Input: an action video; Output: an action label; 4 / 57

8 What is Action Recognition? What is Recognition? Verification: Is the walking man Michael? Identification: Who is the walking man? The recognition of action is to match the observation (e.g. videos) with previously defined patterns and then assign it a label, i.e. action type. Input: an action video; Output: an action label; 4 / 57

9 What is Action Recognition? What is Recognition? Verification: Is the walking man Michael? Identification: Who is the walking man? Recognition: What is the man doing? The recognition of action is to match the observation (e.g. videos) with previously defined patterns and then assign it a label, i.e. action type. Input: an action video; Output: an action label; 4 / 57

10 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; 5 / 57

11 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: 5 / 57

12 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: visual surveillance 5 / 57

13 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: visual surveillance crowd behavior analysis 5 / 57

14 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: visual surveillance crowd behavior analysis human-machine interfaces 5 / 57

15 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: visual surveillance crowd behavior analysis human-machine interfaces sports video analysis 5 / 57

16 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: visual surveillance crowd behavior analysis human-machine interfaces sports video analysis video retrieval 5 / 57

17 Why Need Action Recognition? Expensive human effort to handle rapidly increasing amount of video records; Large number of potential applications: visual surveillance crowd behavior analysis human-machine interfaces sports video analysis video retrieval etc. 5 / 57

18 Main Challenges in Action Recognition Different scales 6 / 57

19 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. 6 / 57

20 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion 6 / 57

21 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion Background clutter 6 / 57

22 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion Background clutter Other objects/humans are present in the video frame. 6 / 57

23 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion Background clutter Other objects/humans are present in the video frame. Partial Occlusions 6 / 57

24 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion Background clutter Other objects/humans are present in the video frame. Partial Occlusions Human/Action variation (large intra-class variation) 6 / 57

25 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion Background clutter Other objects/humans are present in the video frame. Partial Occlusions Human/Action variation (large intra-class variation) Walking movements can differ in speed and stride length. 6 / 57

26 Main Challenges in Action Recognition Different scales People may appear at different scales in different videos, yet perform the same action. View changes and camera motion Background clutter Other objects/humans are present in the video frame. Partial Occlusions Human/Action variation (large intra-class variation) Walking movements can differ in speed and stride length. Etc. 6 / 57

27 Action Recognition Methods Handling Space-Time Appearances for Actions Focus on extracting better appearance representation from action video; 7 / 57

28 Action Recognition Methods Handling Space-Time Appearances for Actions Focus on extracting better appearance representation from action video; hand-crafted features: HOG [7], HOF [4], MBH [18] or combinations [18]; 7 / 57

29 Action Recognition Methods Handling Space-Time Appearances for Actions Focus on extracting better appearance representation from action video; hand-crafted features: HOG [7], HOF [4], MBH [18] or combinations [18]; learned features: deep neural network [21, 5, 16] 7 / 57

30 Action Recognition Methods Handling Dynamics for Actions Focus on modeling the dynamics and motions in action video; 8 / 57

31 Action Recognition Methods Handling Dynamics for Actions Focus on modeling the dynamics and motions in action video; Deterministic models: dynamic time warping [25], maximum margin temporal warping [20], actom sequence model [6], graphs [3] and deep neural architectures [14, 17]; 8 / 57

32 Action Recognition Methods Handling Dynamics for Actions Focus on modeling the dynamics and motions in action video; Deterministic models: dynamic time warping [25], maximum margin temporal warping [20], actom sequence model [6], graphs [3] and deep neural architectures [14, 17]; Generative models: HMM [10], coupled HMM [2], CRF [22] and dynamic Bayes nets [24]. 8 / 57

33 Action Recognition Methods Kinect-Based Action Recognition and Detection Besides the RGB information, the depth information of action video is available from depth camera; 9 / 57

34 Action Recognition Methods Kinect-Based Action Recognition and Detection Besides the RGB information, the depth information of action video is available from depth camera; Specific feature descriptors and dynamic model for depth input are designed to alleviate the challenges of action recognition; 9 / 57

35 Action Recognition Methods Kinect-Based Action Recognition and Detection Besides the RGB information, the depth information of action video is available from depth camera; Specific feature descriptors and dynamic model for depth input are designed to alleviate the challenges of action recognition; Since depth information is available, it significantly reduces the difficulties in the estimation of human motion. The articulated human motion can be captured from the depth video. 9 / 57

36 Small Size Datasets The KTH Dataset [13] 10 / 57

37 Small Size Datasets The KTH Dataset [13] 6 actions (walking, jogging, running, boxing, hand waving and hand clapping) 10 / 57

38 Small Size Datasets The KTH Dataset [13] 6 actions (walking, jogging, running, boxing, hand waving and hand clapping) The Weizmann Dataset [1] 10 / 57

39 Small Size Datasets The KTH Dataset [13] 6 actions (walking, jogging, running, boxing, hand waving and hand clapping) The Weizmann Dataset [1] 10 actions (walk, run, jump, gallop sideways, bend, one-hand wave, two-hands wave, jump in place, jumping jack and skip) 10 / 57

40 Small Size Datasets The KTH Dataset [13] 6 actions (walking, jogging, running, boxing, hand waving and hand clapping) The Weizmann Dataset [1] 10 actions (walk, run, jump, gallop sideways, bend, one-hand wave, two-hands wave, jump in place, jumping jack and skip) The UCF Sports Action Dataset 10 / 57

41 Small Size Datasets The KTH Dataset [13] 6 actions (walking, jogging, running, boxing, hand waving and hand clapping) The Weizmann Dataset [1] 10 actions (walk, run, jump, gallop sideways, bend, one-hand wave, two-hands wave, jump in place, jumping jack and skip) The UCF Sports Action Dataset 9 actions (diving, golf swinging, kicking, weightlifting, horseback riding, running, skating, swinging a baseball bat and walking) 10 / 57

42 Large Size Datasets The IXMAS Dataset [23] 11 / 57

43 Large Size Datasets The IXMAS Dataset [23] 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) 11 / 57

44 Large Size Datasets The IXMAS Dataset [23] 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) Hollywood Human Action Dataset [11] 11 / 57

45 Large Size Datasets The IXMAS Dataset [23] 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) Hollywood Human Action Dataset [11] 12 actions (answer phone, get out of car, handshake, hug, kiss, sit down, sit up, stand up, drive car, eat, fight and run) 11 / 57

46 Large Size Datasets The IXMAS Dataset [23] 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) Hollywood Human Action Dataset [11] 12 actions (answer phone, get out of car, handshake, hug, kiss, sit down, sit up, stand up, drive car, eat, fight and run) The UCF50 Dataset [12]: 50 different actions/activities 11 / 57

47 Large Size Datasets The IXMAS Dataset [23] 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) Hollywood Human Action Dataset [11] 12 actions (answer phone, get out of car, handshake, hug, kiss, sit down, sit up, stand up, drive car, eat, fight and run) The UCF50 Dataset [12]: 50 different actions/activities The HMDB51 Dataset [8]: 51 different actions/activities 11 / 57

48 Large Size Datasets The IXMAS Dataset [23] 14 actions (check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, point, pick up, throw over head and throw from bottom up) Hollywood Human Action Dataset [11] 12 actions (answer phone, get out of car, handshake, hug, kiss, sit down, sit up, stand up, drive car, eat, fight and run) The UCF50 Dataset [12]: 50 different actions/activities The HMDB51 Dataset [8]: 51 different actions/activities The UCF101 Dataset [15]: 101 different actions/activities 11 / 57

49 Data Samples (a) KTH Dataset (b) Hollywood Dataset 12 / 57

50 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 13 / 57

51 Outline Introduction Modeling Space-Time Appearance Space-Time Interest Points (STIP) Recognizing Human Actions: A Local SVM Approach Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 14 / 57

52 Spatio-Temporal Interest Points 1 Motivated by the idea of Harris and Forstner spatial interest point operators, extended into the spatio-temporal domain; 1 I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2-3): , / 57

53 Spatio-Temporal Interest Points 1 Motivated by the idea of Harris and Forstner spatial interest point operators, extended into the spatio-temporal domain; Aim to find the good spatio-temporal positions in a sequence for feature extraction; 1 I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2-3): , / 57

54 Spatio-Temporal Interest Points 1 Motivated by the idea of Harris and Forstner spatial interest point operators, extended into the spatio-temporal domain; Aim to find the good spatio-temporal positions in a sequence for feature extraction; Distinct and stable descriptors are extracted from the obtained interest points; 1 I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2-3): , / 57

55 Spatio-Temporal Interest Points The points that have large variations along both the spatial and the temporal directions in local spatio temporal volumes. Figure : Detecting the strongest spatio-temporal interest points in a football sequence with a player heading the ball. 16 / 57

56 Spatio-Temporal Interest Point Detection In the spatial domain, we can model an image f sp by its linear scale-space representation L sp : L sp ( x, y; σl 2 ) = g sp ( x, y; σl 2 ) f sp (x, y) 17 / 57

57 Spatio-Temporal Interest Point Detection In the spatial domain, we can model an image f sp by its linear scale-space representation L sp : L sp ( x, y; σl 2 ) = g sp ( x, y; σl 2 ) f sp (x, y) Like the operation for image, we can model the sequence by a linear scale-space representation L: L ( ; σ 2 l, τ 2 l g ( x, y, t; σ 2 l, τ 2 l ) = g ( ; σ 2 l, τ 2 ) f ( ) l ) = exp( (x 2 + y 2 )/2σ 2 l t 2 /2τ 2 l ) (2π) 3 σ 4 l τ 2 l 17 / 57

58 Spatio-Temporal Interest Point Detection Construct a 3 3 spatio-temporal second-moment matrix: µ = g ( ; σi 2, τi 2 ) L 2 x L x L y L x L t L x L y L 2 y L y L t L x L t L y L t L 2 t 18 / 57

59 Spatio-Temporal Interest Point Detection Construct a 3 3 spatio-temporal second-moment matrix: µ = g ( ; σi 2, τi 2 ) L 2 x L x L y L x L t L x L y L 2 y L y L t L x L t L y L t L 2 t The first-order derivatives are defined as: (ξ = {x,y,t}) L ξ ( ; σ 2 l, τl 2 ) = ξ (g f ) 18 / 57

60 Spatio-Temporal Interest Point Detection Construct a 3 3 spatio-temporal second-moment matrix: µ = g ( ; σi 2, τi 2 ) L 2 x L x L y L x L t L x L y L 2 y L y L t L x L t L y L t L 2 t The first-order derivatives are defined as: (ξ = {x,y,t}) L ξ ( ; σ 2 l, τl 2 ) = ξ (g f ) Compute the three eigenvalues λ 1, λ 2 and λ 3 of µ, the Harris corner function is then defined as: H = det(µ) k trace 3 (µ) = λ 1 λ 2 λ 3 k(λ 1 + λ 2 + λ 3 ) 3 18 / 57

61 Spatio-Temporal Interest Point Detection Construct a 3 3 spatio-temporal second-moment matrix: µ = g ( ; σi 2, τi 2 ) L 2 x L x L y L x L t L x L y L 2 y L y L t L x L t L y L t L 2 t The first-order derivatives are defined as: (ξ = {x,y,t}) L ξ ( ; σ 2 l, τl 2 ) = ξ (g f ) Compute the three eigenvalues λ 1, λ 2 and λ 3 of µ, the Harris corner function is then defined as: H = det(µ) k trace 3 (µ) = λ 1 λ 2 λ 3 k(λ 1 + λ 2 + λ 3 ) 3 Detect the interest points by calculating the positive local maxima of H; 18 / 57

62 Space-Time Interest Points: Examples (a) Action : clapping hands (b) The detected interst points 19 / 57

63 Spatio-Temporal Scale Adaptation Let s recall the scale-space representation L ( ; σl 2, τ l 2 ), the two scale factors σl 2 and τl 2 influence the result a lot; 20 / 57

64 Spatio-Temporal Scale Adaptation Let s recall the scale-space representation L ( ; σl 2, τ l 2 ), the two scale factors σl 2 and τl 2 influence the result a lot; The larger the τl 2 is, the easier the space-time structures with long temporal extents are detected; 20 / 57

65 Spatio-Temporal Scale Adaptation Let s recall the scale-space representation L ( ; σl 2, τ l 2 ), the two scale factors σl 2 and τl 2 influence the result a lot; The larger the τl 2 is, the easier the space-time structures with long temporal extents are detected; The larger the σl 2 is, the easier the space-time structures with long spatial extents are detected; 20 / 57

66 Spatio-Temporal Scale Adaptation By finding the extrema of 2 norml over both spatial and temporal scales, we can automatically determine the scale. 21 / 57

67 Result Figure : STIP detection for a zoom-in sequence of a walking person. 22 / 57

68 Result Figure : (top): Correct matches in sequences with leg actions; (bottom): Correct matches in sequences with arm actions; 23 / 57

69 Outline Introduction Modeling Space-Time Appearance Space-Time Interest Points (STIP) Recognizing Human Actions: A Local SVM Approach Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 24 / 57

70 Recognition based on Local Space-time Features Figure : Local space-time features detected for a walking pattern 2 2 C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In Pattern Recognition, ICPR Proceedings of the 17th International Conference on, volume 3, pages IEEE, / 57

71 Representation of Features Spatial-temporal jets (4th order) are computed at each feature center: j = (L x, L y, L t, L xx,, L tttt ) σ 2 = σ 2 i,τ 2 = τ 2 i L x m y n t k = σm+n τ k ( x m y n t k g) f 26 / 57

72 Representation of Features Spatial-temporal jets (4th order) are computed at each feature center: j = (L x, L y, L t, L xx,, L tttt ) σ 2 = σ 2 i,τ 2 = τ 2 i L x m y n t k = σm+n τ k ( x m y n t k g) f Using k-means clustering over j, a vocabulary consisting of words h i is created from the jet descriptors; 26 / 57

73 Representation of Features Spatial-temporal jets (4th order) are computed at each feature center: j = (L x, L y, L t, L xx,, L tttt ) σ 2 = σ 2 i,τ 2 = τ 2 i L x m y n t k = σm+n τ k ( x m y n t k g) f Using k-means clustering over j, a vocabulary consisting of words h i is created from the jet descriptors; Finally, a given video is represented by a histogram of counts of occurrences of features corresponding to h i in that video: H = (h 1,..., h n ) 26 / 57

74 Recognition by Support Vector Machines For action recognition, the obtained local space-time features are used for SVM classification 27 / 57

75 Recognition by Support Vector Machines For action recognition, the obtained local space-time features are used for SVM classification Given a set of training data from different action classes {(H i, y i )} n i=1, a SVM classifier for each action class is learned: ( n ) f (H) = sgn α i y i H i + b i=1 27 / 57

76 Recognition by Support Vector Machines For action recognition, the obtained local space-time features are used for SVM classification Given a set of training data from different action classes {(H i, y i )} n i=1, a SVM classifier for each action class is learned: ( n ) f (H) = sgn α i y i H i + b i=1 Easy to extend to a kernelized version; 27 / 57

77 Results Figure : Results of action recognition for different methods and scenarios on KTH dataset. 28 / 57

78 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 29 / 57

79 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Coupled Hidden Markov Models for Complex Actions Action Recognition with Depth Cameras Action Detection 30 / 57

80 Coupled Hidden Markov Models 3 Hidden Markov model (HMM) is preferable for modeling and classifying dynamic behaviors. 3 M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex action recognition. In Computer vision and pattern recognition, proceedings., 1997 ieee computer society conference on, pages IEEE, / 57

81 Coupled Hidden Markov Models 3 Hidden Markov model (HMM) is preferable for modeling and classifying dynamic behaviors. But HMM is not suitable for multiple interacting processes (have structure in both time and space). 3 M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex action recognition. In Computer vision and pattern recognition, proceedings., 1997 ieee computer society conference on, pages IEEE, / 57

82 Coupled Hidden Markov Models 3 Hidden Markov model (HMM) is preferable for modeling and classifying dynamic behaviors. But HMM is not suitable for multiple interacting processes (have structure in both time and space). Coupled hidden Markov models can model multiple interacting processes without running afoul of the Markov condition. 3 M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex action recognition. In Computer vision and pattern recognition, proceedings., 1997 ieee computer society conference on, pages IEEE, / 57

83 Limitations of HMM HMM is suitable for implicitly handling time-varying signals, which satisfy the Markov properties. 32 / 57

84 Limitations of HMM HMM is suitable for implicitly handling time-varying signals, which satisfy the Markov properties. But HMMs are not appropriate to model systems that have compositional state, e.g., multiple interacting processes that have structure in both time and space. 32 / 57

85 Limitations of HMM HMM is suitable for implicitly handling time-varying signals, which satisfy the Markov properties. But HMMs are not appropriate to model systems that have compositional state, e.g., multiple interacting processes that have structure in both time and space. Think about how to model A gave B the C? 32 / 57

86 Coupling and Factoring HMMs In order to handle multiple interacting processes (to couple HMMs), we need to obtain a joint HMM C from two coupling HMMs A and B; 33 / 57

87 Coupling and Factoring HMMs In order to handle multiple interacting processes (to couple HMMs), we need to obtain a joint HMM C from two coupling HMMs A and B; Given the states a i and b j and transition parameters P ai a j and P bk b l, the joint states is c ij = {a i, b j }, the transition is: P cik c jl ) = Ψ (P ai a j, P bk b l, P ai b l, P bk a j 33 / 57

88 Coupling and Factoring HMMs In order to handle multiple interacting processes (to couple HMMs), we need to obtain a joint HMM C from two coupling HMMs A and B; Given the states a i and b j and transition parameters P ai a j and P bk b l, the joint states is c ij = {a i, b j }, the transition is: P cik c jl ) = Ψ (P ai a j, P bk b l, P ai b l, P bk a j P ai b l and P bk a j are the coupling parameters; 33 / 57

89 Coupling and Factoring HMMs We can also project the joint HMM back into its components: P ai a j P ai b l l k j k P cik c jl P cik c jl 34 / 57

90 Coupling and Factoring HMMs We can also project the joint HMM back into its components: P ai a j l k P cik c jl P ai b l j k P cik c jl So a joint HMM can be trained via standard HMM methods; 34 / 57

91 One Example Figure : A CHMM is used to model to represent the action performed by two hands. 35 / 57

92 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 36 / 57

93 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Mining Actionlet Ensemble Action Detection 37 / 57

94 Depth Map and 3D Joints The figure shows some samples of the depth maps; 38 / 57

95 Depth Map and 3D Joints The figure shows some samples of the depth maps; the 3D joint position can be estimated or annotated from the depth maps; 38 / 57

96 Mining Actionlet Ensemble for Action Recognition 4 An actionlet ensemble model is learnt from depth maps to represent each action; 4 J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages IEEE, / 57

97 Mining Actionlet Ensemble for Action Recognition 4 An actionlet ensemble model is learnt from depth maps to represent each action; Fourier Temporal Pyramid is proposed to represent the temporal dynamics; 4 J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages IEEE, / 57

98 Mining Discriminative Actionlets An actionlet is defined as a subset of joints S: P S (y (j) = c x (j) ) = Π i S P i (y (j) = c x (j) ) 40 / 57

99 Mining Discriminative Actionlets An actionlet is defined as a subset of joints S: P S (y (j) = c x (j) ) = Π i S P i (y (j) = c x (j) ) For each joint, how discriminative it is can be defined as large confidence Conf S and small ambiguity Amb S ; Conf S = max j X c log P S (y (j) = c x (j) ) Amb S = j / X c log P S (y (j) = c x (j) ) 40 / 57

100 Mining Discriminative Actionlets An actionlet is defined as a subset of joints S: P S (y (j) = c x (j) ) = Π i S P i (y (j) = c x (j) ) For each joint, how discriminative it is can be defined as large confidence Conf S and small ambiguity Amb S ; Conf S = max j X c log P S (y (j) = c x (j) ) Amb S = j / X c log P S (y (j) = c x (j) ) Observation: adding a new joint into one actionlet will always reduce the confidence. 40 / 57

101 Mining Discriminative Actionlets An actionlet is defined as a subset of joints S: P S (y (j) = c x (j) ) = Π i S P i (y (j) = c x (j) ) For each joint, how discriminative it is can be defined as large confidence Conf S and small ambiguity Amb S ; Conf S = max j X c log P S (y (j) = c x (j) ) Amb S = j / X c log P S (y (j) = c x (j) ) Observation: adding a new joint into one actionlet will always reduce the confidence. The mining strategy is to keep the actionlet P c that Amb S T amb and Conf S T conf 40 / 57

102 Learning Actionlet Ensemble A multiple kernel learning approach is employed to learn an actionlet ensemble structure that combines these discriminative actionlets; 41 / 57

103 Learning Actionlet Ensemble A multiple kernel learning approach is employed to learn an actionlet ensemble structure that combines these discriminative actionlets; For each actionlet S k, an SVM model on it is defined as a linear output function: f k (x, y) = w k, Φ k (x, y) + b k 41 / 57

104 Learning Actionlet Ensemble A multiple kernel learning approach is employed to learn an actionlet ensemble structure that combines these discriminative actionlets; For each actionlet S k, an SVM model on it is defined as a linear output function: f k (x, y) = w k, Φ k (x, y) + b k A final output function is a convex combination of p kernels, each kernel corresponds to an actionlet: p f final (x, y) = [β k w k, Φ k (x, y) + b k ] k=1 41 / 57

105 Learning Actionlet Ensemble A multiple kernel learning approach is employed to learn an actionlet ensemble structure that combines these discriminative actionlets; For each actionlet S k, an SVM model on it is defined as a linear output function: f k (x, y) = w k, Φ k (x, y) + b k A final output function is a convex combination of p kernels, each kernel corresponds to an actionlet: p f final (x, y) = [β k w k, Φ k (x, y) + b k ] k=1 The resulting optimization problem can be solved by iteratively optimizing between β and w, b: 41 / 57

106 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection 42 / 57

107 Outline Introduction Modeling Space-Time Appearance Modeling Action Dynamics Action Recognition with Depth Cameras Action Detection Discriminative Subvolume Search for Efficient Action Detection 43 / 57

108 Difficulties in Action Detection Actions can be treated as spatio-temporal patterns 44 / 57

109 Difficulties in Action Detection Actions can be treated as spatio-temporal patterns They can be characterized by collections of spatio-temporal invariant features 44 / 57

110 Difficulties in Action Detection Actions can be treated as spatio-temporal patterns They can be characterized by collections of spatio-temporal invariant features Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns; 44 / 57

111 Difficulties in Action Detection Actions can be treated as spatio-temporal patterns They can be characterized by collections of spatio-temporal invariant features Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns; Two critical issues: 44 / 57

112 Difficulties in Action Detection Actions can be treated as spatio-temporal patterns They can be characterized by collections of spatio-temporal invariant features Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns; Two critical issues: searching for action in the video space is much more time consuming; 44 / 57

113 Difficulties in Action Detection Actions can be treated as spatio-temporal patterns They can be characterized by collections of spatio-temporal invariant features Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns; Two critical issues: searching for action in the video space is much more time consuming; human actions involve tremendous intra-pattern variations. 44 / 57

114 Discriminative Subvolume Search for Efficient Action Detection 5 A discriminative pattern matching called naive Bayes based mutual information maximization (NBMIM) for multi-class action categorization; 5 J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages IEEE, / 57

115 Discriminative Subvolume Search for Efficient Action Detection 5 A discriminative pattern matching called naive Bayes based mutual information maximization (NBMIM) for multi-class action categorization; A novel search algorithm is proposed to locate the optimal subvolume in the 3D video space for efficient action detection; 5 J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages IEEE, / 57

116 The Proposed Idea Action detection is formulated as searching for a subvolume in video that has the maximum mutual information toward the action class; 46 / 57

117 The Proposed Idea Action detection is formulated as searching for a subvolume in video that has the maximum mutual information toward the action class; In the figure, each circle represents a spatio-temporal feature point which contributes a vote based on its own mutual information; 46 / 57

118 The Proposed Idea Action detection is formulated as searching for a subvolume in video that has the maximum mutual information toward the action class; In the figure, each circle represents a spatio-temporal feature point which contributes a vote based on its own mutual information; This is a new formulation for action detection! 46 / 57

119 Naive-Bayes Based Mutual Information Maximization (NBMIM) We represent an action by a collection of spatio-temporal interest points Q = {d i } (STIPs), then appearance feature HOG and motion feature HOF are extracted from each STIP; 47 / 57

120 Naive-Bayes Based Mutual Information Maximization (NBMIM) We represent an action by a collection of spatio-temporal interest points Q = {d i } (STIPs), then appearance feature HOG and motion feature HOF are extracted from each STIP; Evaluate the mutual information between a video clip Q and a specific class c: 47 / 57

121 Naive-Bayes Based Mutual Information Maximization (NBMIM) To evaluate the contribution s c (d q ) of each d q Q: 48 / 57

122 Naive-Bayes Based Mutual Information Maximization (NBMIM) To evaluate the contribution s c (d q ) of each d q Q: So the likelihood ratio test P(dq C c) P(d q C=c) of d q ; determines the property 48 / 57

123 Naive-Bayes Based Mutual Information Maximization (NBMIM) To evaluate the contribution s c (d q ) of each d q Q: So the likelihood ratio test P(dq C c) P(d q C=c) of d q ; For the C-class action categorization, we built C oneagainst-all classifiers as: determines the property 48 / 57

124 Action Detection in Video Based on the NBMIM criterion, given a video sequence V, the goal is to find a spatial-temporal subvolume (3D subvolume) V V : 49 / 57

125 Action Detection in Video Based on the NBMIM criterion, given a video sequence V, the goal is to find a spatial-temporal subvolume (3D subvolume) V V : Suppose the target video V is of size m n t, the total number of the 3D subvolumes is in the order of O(n 2 m 2 t 2 ), an exhaustive search is impossible! 49 / 57

126 Action Detection in Video Based on the NBMIM criterion, given a video sequence V, the goal is to find a spatial-temporal subvolume (3D subvolume) V V : Suppose the target video V is of size m n t, the total number of the 3D subvolumes is in the order of O(n 2 m 2 t 2 ), an exhaustive search is impossible! A new efficient search strategy is needed! 49 / 57

127 Efficient Search for the Optimal 3D Subvolume Naive 3D branch-and-bound: 50 / 57

128 Efficient Search for the Optimal 3D Subvolume Naive 3D branch-and-bound: An optimal subvolume V has 6 parameters, including top, bottom, left, right positions, start and end time; 50 / 57

129 Efficient Search for the Optimal 3D Subvolume Naive 3D branch-and-bound: An optimal subvolume V has 6 parameters, including top, bottom, left, right positions, start and end time; The complexity of the branch-and-bound grows exponentially in the number of dimensions; 50 / 57

130 Efficient Search for the Optimal 3D Subvolume Naive 3D branch-and-bound: An optimal subvolume V has 6 parameters, including top, bottom, left, right positions, start and end time; The complexity of the branch-and-bound grows exponentially in the number of dimensions; Our new efficient search: 50 / 57

131 Efficient Search for the Optimal 3D Subvolume Naive 3D branch-and-bound: An optimal subvolume V has 6 parameters, including top, bottom, left, right positions, start and end time; The complexity of the branch-and-bound grows exponentially in the number of dimensions; Our new efficient search: Instead of directly applying branch-and-bound in the 6D parameter space, our new method decomposes it into two subspaces: 4D spatial space and 2D temporal space; 50 / 57

132 Efficient Search for the Optimal 3D Subvolume Naive 3D branch-and-bound: An optimal subvolume V has 6 parameters, including top, bottom, left, right positions, start and end time; The complexity of the branch-and-bound grows exponentially in the number of dimensions; Our new efficient search: Instead of directly applying branch-and-bound in the 6D parameter space, our new method decomposes it into two subspaces: 4D spatial space and 2D temporal space; An optimal 3D volume V is determined by a spatial window W and a temporal segment T which has the maximum detection score: [W, T ] = arg max s(d) W,T d WxT 50 / 57

133 Efficient Search for the Optimal 3D Subvolume We take different search strategies in the two subspaces W and T and search alternately between W and T : 51 / 57

134 Efficient Search for the Optimal 3D Subvolume We take different search strategies in the two subspaces W and T and search alternately between W and T : Once W is determined, we search for the optimal temporal segment T. This is a 1D max subvector problem; 51 / 57

135 Efficient Search for the Optimal 3D Subvolume We take different search strategies in the two subspaces W and T and search alternately between W and T : Once W is determined, we search for the optimal temporal segment T. This is a 1D max subvector problem; To search the spatial parameter space W, we employ a branch-and-bound strategy with a tighter upper bound (Proved in the paper); 51 / 57

136 Experiment Result 52 / 57

137 References I M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In Tenth IEEE International Conference on Computer Vision (ICCV 05) Volume 1, volume 2, pages IEEE, M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex action recognition. In Computer vision and pattern recognition, proceedings., 1997 ieee computer society conference on, pages IEEE, W. Brendel and S. Todorovic. Learning spatiotemporal graphs of human activities. In 2011 International Conference on Computer Vision, pages IEEE, R. Chaudhry, A. Ravichandran, G. Hager, and R. Vidal. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages IEEE, G. Chéron, I. Laptev, and C. Schmid. P-cnn: Pose-based cnn features for action recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages , / 57

138 References II A. Gaidon, Z. Harchaoui, and C. Schmid. Actom sequence models for efficient action detection. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages IEEE, A. Klaser, M. Marsza lek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC th British Machine Vision Conference, pages British Machine Vision Association, H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. Hmdb: a large video database for human motion recognition. In 2011 International Conference on Computer Vision, pages IEEE, I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2-3): , K. Li, J. Hu, and Y. Fu. Modeling complex temporal composition of actionlets for activity prediction. In European Conference on Computer Vision, pages Springer, / 57

139 References III M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages IEEE, K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5): , C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In Pattern Recognition, ICPR Proceedings of the 17th International Conference on, volume 3, pages IEEE, K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pages , K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arxiv preprint arxiv: , N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using lstms. CoRR, abs/ , 2, / 57

140 References IV V. Veeriah, N. Zhuang, and G.-J. Qi. Differential recurrent neural networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages , H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages IEEE, J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages IEEE, J. Wang and Y. Wu. Learning maximum margin temporal warping for action recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages , L. Wang, Y. Qiao, and X. Tang. Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , / 57

141 References V Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic versus max margin. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7): , D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In 2007 IEEE 11th International Conference on Computer Vision, pages 1 7. IEEE, T. Xiang and S. Gong. Beyond tracking: Modelling activity and understanding behaviour. International Journal of Computer Vision, 67(1):21 51, B. Yao and S.-C. Zhu. Learning deformable action templates from cluttered videos. In 2009 IEEE 12th International Conference on Computer Vision, pages IEEE, J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages IEEE, / 57

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Person Action Recognition/Detection

Person Action Recognition/Detection Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics

More information

Adaptive Action Detection

Adaptive Action Detection Adaptive Action Detection Illinois Vision Workshop Dec. 1, 2009 Liangliang Cao Dept. ECE, UIUC Zicheng Liu Microsoft Research Thomas Huang Dept. ECE, UIUC Motivation Action recognition is important in

More information

Velocity adaptation of space-time interest points

Velocity adaptation of space-time interest points Velocity adaptation of space-time interest points Ivan Laptev and Tony Lindeberg Computational Vision and Active Perception Laboratory (CVAP) Dept. of Numerical Analysis and Computer Science KTH, SE-1

More information

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Leveraging Textural Features for Recognizing Actions in Low Quality Videos Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman 1, John See 2, and Chiung Ching Ho 3 Centre of Visual Computing, Faculty of Computing and Informatics Multimedia

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Discriminative Video Pattern Search for Efficient Action Detection

Discriminative Video Pattern Search for Efficient Action Detection Discriminative Video Pattern Search for Efficient Action Detection Junsong Yuan, Member, IEEE, Zicheng Liu, Senior Member, IEEE, Ying Wu, Senior Member, IEEE Abstract Actions are spatio-temporal patterns.

More information

Evaluation of local descriptors for action recognition in videos

Evaluation of local descriptors for action recognition in videos Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,

More information

Learning realistic human actions from movies

Learning realistic human actions from movies Learning realistic human actions from movies Ivan Laptev, Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld CVPR 2008 Presented by Piotr Mirowski Courant Institute, NYU Advanced Vision class, November

More information

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France ABSTRACT

More information

Understanding Sport Activities from Correspondences of Clustered Trajectories

Understanding Sport Activities from Correspondences of Clustered Trajectories Understanding Sport Activities from Correspondences of Clustered Trajectories Francesco Turchini, Lorenzo Seidenari, Alberto Del Bimbo http://www.micc.unifi.it/vim Introduction The availability of multimedia

More information

Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition

Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition Ethem F. Can and R. Manmatha Department of Computer Science, UMass Amherst Amherst, MA, 01002, USA [efcan, manmatha]@cs.umass.edu

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

Class 9 Action Recognition

Class 9 Action Recognition Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition

More information

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk

More information

Middle-Level Representation for Human Activities Recognition: the Role of Spatio-temporal Relationships

Middle-Level Representation for Human Activities Recognition: the Role of Spatio-temporal Relationships Middle-Level Representation for Human Activities Recognition: the Role of Spatio-temporal Relationships Fei Yuan 1, Véronique Prinet 1, and Junsong Yuan 2 1 LIAMA & NLPR, CASIA, Chinese Academy of Sciences,

More information

Discriminative Subvolume Search for Efficient Action Detection

Discriminative Subvolume Search for Efficient Action Detection Discriminative Subvolume Search for Efficient Action Detection Junsong Yuan EECS Dept., Northwestern Univ. Evanston, IL, USA j-yuan@u.northwestern.edu Zicheng Liu Microsoft Research Redmond, WA, USA zliu@microsoft.com

More information

Bag of Optical Flow Volumes for Image Sequence Recognition 1

Bag of Optical Flow Volumes for Image Sequence Recognition 1 RIEMENSCHNEIDER, DONOSER, BISCHOF: BAG OF OPTICAL FLOW VOLUMES 1 Bag of Optical Flow Volumes for Image Sequence Recognition 1 Hayko Riemenschneider http://www.icg.tugraz.at/members/hayko Michael Donoser

More information

Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition

Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition Abstract Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly

More information

Human Activity Recognition Using a Dynamic Texture Based Method

Human Activity Recognition Using a Dynamic Texture Based Method Human Activity Recognition Using a Dynamic Texture Based Method Vili Kellokumpu, Guoying Zhao and Matti Pietikäinen Machine Vision Group University of Oulu, P.O. Box 4500, Finland {kello,gyzhao,mkp}@ee.oulu.fi

More information

Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding

Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding Dong Xing, Xianzhong Wang, Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive

More information

Action Recognition with HOG-OF Features

Action Recognition with HOG-OF Features Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient

More information

Incremental Action Recognition Using Feature-Tree

Incremental Action Recognition Using Feature-Tree Incremental Action Recognition Using Feature-Tree Kishore K Reddy Computer Vision Lab University of Central Florida kreddy@cs.ucf.edu Jingen Liu Computer Vision Lab University of Central Florida liujg@cs.ucf.edu

More information

A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance

A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance A Motion Descriptor Based on Statistics of Optical Flow Orientations for Action Classification in Video-Surveillance Fabio Martínez, Antoine Manzanera, Eduardo Romero To cite this version: Fabio Martínez,

More information

Temporal Feature Weighting for Prototype-Based Action Recognition

Temporal Feature Weighting for Prototype-Based Action Recognition Temporal Feature Weighting for Prototype-Based Action Recognition Thomas Mauthner, Peter M. Roth, and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {mauthner,pmroth,bischof}@icg.tugraz.at

More information

GPU Accelerated Sequence Learning for Action Recognition. Yemin Shi

GPU Accelerated Sequence Learning for Action Recognition. Yemin Shi GPU Accelerated Sequence Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Object Recognition (Image Classification) Action Recognition (Video Classification) Action Recognition

More information

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu

More information

EasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network

EasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network EasyChair Preprint 730 Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network Xue Bai, Enqing Chen and Haron Chweya Tinega EasyChair preprints are intended for rapid dissemination

More information

Evaluation of local spatio-temporal features for action recognition

Evaluation of local spatio-temporal features for action recognition Evaluation of local spatio-temporal features for action recognition Heng Wang, Muhammad Muneeb Ullah, Alexander Klaser, Ivan Laptev, Cordelia Schmid To cite this version: Heng Wang, Muhammad Muneeb Ullah,

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Mining Actionlet Ensemble for Action Recognition with Depth Cameras

Mining Actionlet Ensemble for Action Recognition with Depth Cameras Mining Actionlet Ensemble for Action Recognition with Depth Cameras Jiang Wang 1 Zicheng Liu 2 Ying Wu 1 Junsong Yuan 3 jwa368@eecs.northwestern.edu zliu@microsoft.com yingwu@northwestern.edu jsyuan@ntu.edu.sg

More information

arxiv: v1 [cs.cv] 14 Jul 2017

arxiv: v1 [cs.cv] 14 Jul 2017 Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University

More information

Human Focused Action Localization in Video

Human Focused Action Localization in Video Human Focused Action Localization in Video Alexander Kläser 1, Marcin Marsza lek 2, Cordelia Schmid 1, and Andrew Zisserman 2 1 INRIA Grenoble, LEAR, LJK {klaser,schmid}@inrialpes.fr 2 Engineering Science,

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

A Spatio-Temporal Descriptor Based on 3D-Gradients

A Spatio-Temporal Descriptor Based on 3D-Gradients A Spatio-Temporal Descriptor Based on 3D-Gradients Alexander Kläser Marcin Marszałek Cordelia Schmid INRIA Grenoble, LEAR, LJK {alexander.klaser,marcin.marszalek,cordelia.schmid}@inrialpes.fr Abstract

More information

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada Spatio-Temporal Salient Features Amir H. Shabani Vision and Image Processing Lab., University of Waterloo, ON CRV Tutorial day- May 30, 2010 Ottawa, Canada 1 Applications Automated surveillance for scene

More information

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs

More information

arxiv: v1 [cs.cv] 29 Apr 2016

arxiv: v1 [cs.cv] 29 Apr 2016 Improved Dense Trajectory with Cross Streams arxiv:1604.08826v1 [cs.cv] 29 Apr 2016 ABSTRACT Katsunori Ohnishi Graduate School of Information Science and Technology University of Tokyo ohnishi@mi.t.utokyo.ac.jp

More information

Content-based image and video analysis. Event Recognition

Content-based image and video analysis. Event Recognition Content-based image and video analysis Event Recognition 21.06.2010 What is an event? a thing that happens or takes place, Oxford Dictionary Examples: Human gestures Human actions (running, drinking, etc.)

More information

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,

More information

Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation

More information

4D Effect Video Classification with Shot-aware Frame Selection and Deep Neural Networks

4D Effect Video Classification with Shot-aware Frame Selection and Deep Neural Networks 4D Effect Video Classification with Shot-aware Frame Selection and Deep Neural Networks Thomhert S. Siadari 1, Mikyong Han 2, and Hyunjin Yoon 1,2 Korea University of Science and Technology, South Korea

More information

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Maxime Devanne 1,2, Hazem Wannous 1, Stefano Berretti 2, Pietro Pala 2, Mohamed Daoudi 1, and Alberto Del

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering www.csse.uwa.edu.au/~ajmal/ Overview Aim of automatic human action recognition Applications

More information

Human Action Recognition Based on Oriented Motion Salient Regions

Human Action Recognition Based on Oriented Motion Salient Regions Human Action Recognition Based on Oriented Motion Salient Regions Baoxin Wu 1, Shuang Yang 1, Chunfeng Yuan 1, Weiming Hu 1, and Fangshi Wang 2 1 NLPR, Institute of Automation, Chinese Academy of Sciences,

More information

Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition

Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition Emotion Recognition In The Wild Challenge and Workshop (EmotiW 2013) Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition Mengyi Liu, Ruiping Wang, Zhiwu Huang, Shiguang Shan,

More information

Chapter 2 Action Representation

Chapter 2 Action Representation Chapter 2 Action Representation Abstract In this chapter, various action recognition issues are covered in a concise manner. Various approaches are presented here. In Chap. 1, nomenclatures, various aspects

More information

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Leveraging Textural Features for Recognizing Actions in Low Quality Videos Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics Multimedia University,

More information

Exploiting Spatio-Temporal Scene Structure for Wide-Area Activity Analysis in Unconstrained Environments

Exploiting Spatio-Temporal Scene Structure for Wide-Area Activity Analysis in Unconstrained Environments SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1 Exploiting Spatio-Temporal Scene Structure for Wide-Area Activity Analysis in Unconstrained Environments Nandita M. Nayak, Yingying

More information

Learning discriminative space-time actions from weakly labelled videos

Learning discriminative space-time actions from weakly labelled videos International Journal of Computer Vision manuscript No. (will be inserted by the editor) Learning discriminative space-time actions from weakly labelled videos Michael Sapienza Fabio Cuzzolin Philip H.S.

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Learning Realistic Human Actions from Movies

Learning Realistic Human Actions from Movies Learning Realistic Human Actions from Movies Ivan Laptev*, Marcin Marszałek**, Cordelia Schmid**, Benjamin Rozenfeld*** INRIA Rennes, France ** INRIA Grenoble, France *** Bar-Ilan University, Israel Presented

More information

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness Xiaodong Yang and YingLi Tian Department of Electrical Engineering City College, City University of New York Abstract.

More information

Action Recognition From Videos using Sparse Trajectories

Action Recognition From Videos using Sparse Trajectories Action Recognition From Videos using Sparse Trajectories Alexandros Doumanoglou, Nicholas Vretos, Petros Daras Centre for Research and Technology - Hellas (ITI-CERTH) 6th Km Charilaou - Thermi, Thessaloniki,

More information

Recognizing Human Actions using 3D Skeletal Information and CNNs

Recognizing Human Actions using 3D Skeletal Information and CNNs Recognizing Human Actions using 3D Skeletal Information and CNNs Antonios Papadakis 1, Eirini Mathe 2,5, Ioannis Vernikos 3, Apostolos Maniatis 4, Evaggelos Spyrou 2,4, and Phivos Mylonas 5 1 Department

More information

Discriminative human action recognition using pairwise CSP classifiers

Discriminative human action recognition using pairwise CSP classifiers Discriminative human action recognition using pairwise CSP classifiers Ronald Poppe and Mannes Poel University of Twente, Dept. of Computer Science, Human Media Interaction Group P.O. Box 217, 7500 AE

More information

Human Action Recognition from Gradient Boundary Histograms

Human Action Recognition from Gradient Boundary Histograms Human Action Recognition from Gradient Boundary Histograms by Xuelu Wang Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For the M.A.SC.

More information

Person Identity Recognition on Motion Capture Data Using Label Propagation

Person Identity Recognition on Motion Capture Data Using Label Propagation Person Identity Recognition on Motion Capture Data Using Label Propagation Nikos Nikolaidis Charalambos Symeonidis AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Greece email:

More information

Space-Time Tree Ensemble for Action Recognition

Space-Time Tree Ensemble for Action Recognition To Appear in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015 Space-Time Tree Ensemble for Action Recognition Shugao Ma Boston University shugaoma@bu.edu Leonid Sigal Disney

More information

Spatio-Temporal Optical Flow Statistics (STOFS) for Activity Classification

Spatio-Temporal Optical Flow Statistics (STOFS) for Activity Classification Spatio-Temporal Optical Flow Statistics (STOFS) for Activity Classification Vignesh Jagadeesh University of California Santa Barbara, CA-93106 vignesh@ece.ucsb.edu S. Karthikeyan B.S. Manjunath University

More information

People Detection and Video Understanding

People Detection and Video Understanding 1 People Detection and Video Understanding Francois BREMOND INRIA Sophia Antipolis STARS team Institut National Recherche Informatique et Automatisme Francois.Bremond@inria.fr http://www-sop.inria.fr/members/francois.bremond/

More information

Cross-View Action Recognition from Temporal Self-Similarities

Cross-View Action Recognition from Temporal Self-Similarities Appears at ECCV 8 Cross-View Action Recognition from Temporal Self-Similarities Imran N. Junejo, Emilie Dexter, Ivan Laptev and Patrick Pérez INRIA Rennes - Bretagne Atlantique Rennes Cedex - FRANCE Abstract.

More information

Local Descriptors for Spatio-Temporal Recognition

Local Descriptors for Spatio-Temporal Recognition Local Descriptors for Spatio-Temporal Recognition Ivan Laptev and Tony Lindeberg Computational Vision and Active Perception Laboratory (CVAP) Dept. of Numerical Analysis and Computing Science KTH, S-100

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

RGBD-HuDaAct: A Color-Depth Video Database For Human Daily Activity Recognition

RGBD-HuDaAct: A Color-Depth Video Database For Human Daily Activity Recognition RGBD-HuDaAct: A Color-Depth Video Database For Human Daily Activity Recognition Bingbing Ni Advanced Digital Sciences Center Singapore 138632 bingbing.ni@adsc.com.sg Gang Wang Advanced Digital Sciences

More information

Discovering Motion Primitives for Unsupervised Grouping and One-shot Learning of Human Actions, Gestures, and Expressions

Discovering Motion Primitives for Unsupervised Grouping and One-shot Learning of Human Actions, Gestures, and Expressions IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Discovering Motion Primitives for Unsupervised Grouping and One-shot Learning of Human Actions, Gestures, and Expressions Yang Yang, Imran

More information

Computation Strategies for Volume Local Binary Patterns applied to Action Recognition

Computation Strategies for Volume Local Binary Patterns applied to Action Recognition Computation Strategies for Volume Local Binary Patterns applied to Action Recognition F. Baumann, A. Ehlers, B. Rosenhahn Institut für Informationsverarbeitung (TNT) Leibniz Universität Hannover, Germany

More information

Large-scale Video Classification with Convolutional Neural Networks

Large-scale Video Classification with Convolutional Neural Networks Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

Dyadic Interaction Detection from Pose and Flow

Dyadic Interaction Detection from Pose and Flow Dyadic Interaction Detection from Pose and Flow Coert van Gemeren 1,RobbyT.Tan 2, Ronald Poppe 1,andRemcoC.Veltkamp 1, 1 Interaction Technology Group, Department of Information and Computing Sciences,

More information

Learning 4D Action Feature Models for Arbitrary View Action Recognition

Learning 4D Action Feature Models for Arbitrary View Action Recognition Learning 4D Action Feature Models for Arbitrary View Action Recognition Pingkun Yan, Saad M. Khan, Mubarak Shah Computer Vision Lab, University of Central Florida, Orlando, FL http://www.eecs.ucf.edu/

More information

Learning Human Actions with an Adaptive Codebook

Learning Human Actions with an Adaptive Codebook Learning Human Actions with an Adaptive Codebook Yu Kong, Xiaoqin Zhang, Weiming Hu and Yunde Jia Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute

More information

An evaluation of local action descriptors for human action classification in the presence of occlusion

An evaluation of local action descriptors for human action classification in the presence of occlusion An evaluation of local action descriptors for human action classification in the presence of occlusion Iveel Jargalsaikhan, Cem Direkoglu, Suzanne Little, and Noel E. O Connor INSIGHT Centre for Data Analytics,

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Action Recognition using Discriminative Structured Trajectory Groups

Action Recognition using Discriminative Structured Trajectory Groups 2015 IEEE Winter Conference on Applications of Computer Vision Action Recognition using Discriminative Structured Trajectory Groups Indriyati Atmosukarto 1,2, Narendra Ahuja 3, Bernard Ghanem 4 1 Singapore

More information

Video Action Detection with Relational Dynamic-Poselets

Video Action Detection with Relational Dynamic-Poselets Video Action Detection with Relational Dynamic-Poselets Limin Wang 1,2, Yu Qiao 2, Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University of Hong Kong 2 Shenzhen Key Lab of CVPR,

More information

A Unified Method for First and Third Person Action Recognition

A Unified Method for First and Third Person Action Recognition A Unified Method for First and Third Person Action Recognition Ali Javidani Department of Computer Science and Engineering Shahid Beheshti University Tehran, Iran a.javidani@mail.sbu.ac.ir Ahmad Mahmoudi-Aznaveh

More information

G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework

G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework Victoria Bloom Kingston University London. UK Victoria.Bloom@kingston.ac.uk Dimitrios Makris Kingston University London.

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

UNDERSTANDING human actions in videos has been

UNDERSTANDING human actions in videos has been PAPER IDENTIFICATION NUMBER 1 A Space-Time Graph Optimization Approach Based on Maximum Cliques for Action Detection Sunyoung Cho, Member, IEEE, and Hyeran Byun, Member, IEEE Abstract We present an efficient

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Human activity recognition in the semantic simplex of elementary actions

Human activity recognition in the semantic simplex of elementary actions STUDENT, PROF, COLLABORATOR: BMVC AUTHOR GUIDELINES 1 Human activity recognition in the semantic simplex of elementary actions Beaudry Cyrille cyrille.beaudry@univ-lr.fr Péteri Renaud renaud.peteri@univ-lr.fr

More information

Stereoscopic Video Description for Human Action Recognition

Stereoscopic Video Description for Human Action Recognition Stereoscopic Video Description for Human Action Recognition Ioannis Mademlis, Alexandros Iosifidis, Anastasios Tefas, Nikos Nikolaidis and Ioannis Pitas Department of Informatics, Aristotle University

More information

Discriminative Figure-Centric Models for Joint Action Localization and Recognition

Discriminative Figure-Centric Models for Joint Action Localization and Recognition Discriminative Figure-Centric Models for Joint Action Localization and Recognition Tian Lan School of Computing Science Simon Fraser University tla58@sfu.ca Yang Wang Dept. of Computer Science UIUC yangwang@uiuc.edu

More information

Action Localization in Video using a Graph-based Feature Representation

Action Localization in Video using a Graph-based Feature Representation Action Localization in Video using a Graph-based Feature Representation Iveel Jargalsaikhan, Suzanne Little and Noel E O Connor Insight Centre for Data Analytics, Dublin City University, Ireland iveel.jargalsaikhan2@mail.dcu.ie

More information

Human Daily Action Analysis with Multi-View and Color-Depth Data

Human Daily Action Analysis with Multi-View and Color-Depth Data Human Daily Action Analysis with Multi-View and Color-Depth Data Zhongwei Cheng 1, Lei Qin 2, Yituo Ye 1, Qingming Huang 1,2, and Qi Tian 3 1 Graduate University of Chinese Academy of Sciences, Beijing

More information

Activities as Time Series of Human Postures

Activities as Time Series of Human Postures Activities as Time Series of Human Postures William Brendel and Sinisa Todorovic Oregon State University, Kelley Engineering Center, Corvallis, OR 97331, USA brendelw@onid.orst.edu,sinisa@eecs.oregonstate.edu

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid

More information

Revisiting LBP-based Texture Models for Human Action Recognition

Revisiting LBP-based Texture Models for Human Action Recognition Revisiting LBP-based Texture Models for Human Action Recognition Thanh Phuong Nguyen 1, Antoine Manzanera 1, Ngoc-Son Vu 2, and Matthieu Garrigues 1 1 ENSTA-ParisTech, 828, Boulevard des Maréchaux, 91762

More information

Activity Recognition in Temporally Untrimmed Videos

Activity Recognition in Temporally Untrimmed Videos Activity Recognition in Temporally Untrimmed Videos Bryan Anenberg Stanford University anenberg@stanford.edu Norman Yu Stanford University normanyu@stanford.edu Abstract We investigate strategies to apply

More information

HMDB: A Large Video Database for Human Motion Recognition

HMDB: A Large Video Database for Human Motion Recognition HMDB: A Large Video Database for Human Motion Recognition H. Kuehne Karlsruhe Instit. of Tech. Karlsruhe, Germany kuehne@kit.edu H. Jhuang E. Garrote T. Poggio Massachusetts Institute of Technology Cambridge,

More information

MoSIFT: Recognizing Human Actions in Surveillance Videos

MoSIFT: Recognizing Human Actions in Surveillance Videos MoSIFT: Recognizing Human Actions in Surveillance Videos CMU-CS-09-161 Ming-yu Chen and Alex Hauptmann School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 September 24, 2009 Copyright

More information