EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Size: px

Start display at page:

Download "EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari"

Job Dennis
5 years ago
Views:

1 EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari

2 What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular interval of time. Examples from various domains: Sports: shot on goal Surveillance: enter in car Movies: drink

3 Importance of Human Actions Most videos recorded and downloadable from the web contain people; the semantic is therefore defined by people behavior. Third generation video-surveillance systems benefit from automatic interpretation of human actions and behaviors. Definition 1: physical body motion. Definition 2: interaction with environment (objects or people) on a specific purpose.

4 Human action recognition challenges Actor appearance variation. Gender, clothing body posture and size. Scale, illumination and background change as in object categorization. Different ways of executing the same action. This results in limbs trajectory and speed change. Semantically different but perceptually similar actions (e.g. running and jogging).

5 Are actions space-time objects? We already know how to detect instances of object categories in static images. How do we take advantage of time to describe dynamic concepts (i.e. human actions)? time time time time

6 Framework Overview: Same three steps of object categorization (feature extraction, dictionary formation, classification) Features detector and descriptor here differ! Interest points extraction Bag-of-features Visual Dictionary running walking jogging handwaving handclapping boxing SVM classifier Bag-of-words

7 Descriptor combination strategy Descriptor Visual Dictionary Action Representation ST Patch 3DGrad_HoF BoW Visual Dictionaries Descriptors Action Representation ST Patch 3DGrad 3DGrad + HoF BoW HoF

8 Effective codebooks: Spatio-temporal descriptors span an extremely high-dimensional feature space Our dense multi-scale sampling produce a non-uniform feature space. K-means clusters are attracted towards densely populated regions. Less dense zone are not represented correctly. Radius-based clustering [Jurie ICCV05] exploits mode finding to place cluster centers. More accurate coding of the feature space. Note: to reduce the uncertainty we perform soft assignment.

9 Results: codebook performance Words are sorted by frequency and added incrementally to dictionary. KTH codebook size Non-informative high-frequency terms. Informative mid-frequency terms.

10 Results: codebook performance Words are sorted by frequency and added incrementally to dictionary. Weizmann codebook size Non-informative high-frequency terms. Informative mid-frequency terms.

Results: dataset The approach is tested on two standard datasets Weizmann dataset is considered less challenging for the reduce variability of

11 Results: dataset The approach is tested on two standard datasets Weizmann dataset is considered less challenging for the reduce variability of shooting conditions and amount of actors. KTH 25 actors 6 actions 4 viewing conditions 2931 clips Weizmann 9 actors 10 actions 1 viewing conditions 93 clips

12 Results: comparison with the state of the art We compare our results by using the same methodology to measure the Improvement w.r.t. to the current state-of-the-art Method KTH Weizmann Our method Laptev et al. - HoG ['08] Laptev et al. - HoF [ 08] Dollár et al. [ 05] Wong e Cipolla [ 07] Scovanner et al. [ 07] Niebles et al. [ 08] Liu et al. [ 08] Kläser et al. [ 08] Willems et al. [ 08]

13 Real video footage We test our detector on a sequence taken in a garage. A sliding temporal window is used to perform the segmentation. walking running

14 Recognizing generic video events Online video search and video indexing Events characterized by an evolution of scenes, objects and actions over time. 56 events are defined in LSCOM. Event examples in the news domain: Airplane Flying Car Exiting

Event Recognition: Object Tracking A possible

to detect interest object, track over time, and

Some events are well defined by the presence

Object Detection & Localization Tracking

15 Event Recognition: Object Tracking A possible approach, which exploit object recognition is to detect interest object, track over time, and model spatio-temporal dynamics. Some events are well defined by the presence and motion of an object. Object Detection & Localization Tracking Inference Airplane Landing Hard to detect events without explicit object motion, such as Riot?

16 Event recognition: exploit dynamic concept evolution Global low level feature are extracted such as edge histograms, Gabor texture descriptors and grid color moments. 108 concent detectors are trained on this features. Each frame is represented by 108 concept scores. Shots similarity is evaluated by computing Earth Mover s Distance. feature extraction concept detectors EMD distance Plug the EMD into a rbf kernel and use it in a SVM to predict category

Content Representation: Mid-level Semantic Concept Scores Image Database Concept Detectors Train detectors on low-level features Mid-level semantic concept

17 Content Representation: Mid-level Semantic Concept Scores Image Database Concept Detectors Train detectors on low-level features Mid-level semantic concept feature is more robust Columbia developed and released 374 semantic concept detectors. Detectors are available online

linear programming Temporal shift: a frame at the beginning of P can be mapped to a

18 Earth Mover s Distance (EMD): Approach Supplier P is with a given amount of goods Receiver Q is with a given limited capacity 1 1/2 d ij 1/2 Weights: Solved by linear programming Temporal shift: a frame at the beginning of P can be mapped to a frame at the end of Q Scale variations: a frame from P can be mapped to multiple frames in Q

19 Experiments: Keyframe based feature performance Dataset: TRECVID2005 Evaluation Metric: Average Precision 1,0 0,8 0,6 0,4 0,2 concept scores edge direction histogram Gabor texture color moment 0,0 Car Crash Protest Greeting Car Exiting Combat Marching Riot Running Shooting Walking (average)

20 Experiments: EMD concept performance

21 References On space-time interest points, Laptev, I. IJCV 2005 Behavior recognition via sparse spatio-temporal features, Dollar, P., Rabaud, V., Cottrel, G. and Belongie, S. ICCV VS-PETS 2005 Effective Codebooks for Human Action Recognition, Ballan, L., Bertini, M., Del Bimbo, A.,Seidenari, L. and Serra, G. ICCV VOEC 2009 Video Event Recognition using kernel methods with multilevel temporal alignement, Dong Xu, Shih-Fu Chang, TPAMI 2008

Content-based image and video analysis. Event Recognition

Content-based image and video analysis Event Recognition 21.06.2010 What is an event? a thing that happens or takes place, Oxford Dictionary Examples: Human gestures Human actions (running, drinking, etc.)