Human Detection and Action Recognition. in Video Sequences

Size: px

Start display at page:

Download "Human Detection and Action Recognition. in Video Sequences"

Emily Poole
5 years ago
Views:

1 5*4%),-<;*<!-$!P#23:%)%-23! 5*4%),-<;*<!-*!%-*%$!#*8%)%*!P#23:%)%-23!8%)!0(23123;/%!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$$%#$&#$'() "#$%&'(')$%!*+,-(.#/)0!!! *%++,-$).+,/.)0!!!!!!!!!!!!!!!!!!!"#$%&'#()*! +,*-../#%0*12#%"2%'!!"#$%&'%()'*(+%,,("&'-($'!.,/0+(,,%$.12#' 3%"424%#2"'5'3%"424%#6' "#$%&! Human Detection and Action Recognition '()*#$%&! +#,)-.%/*)&!! 0%-$#,#*123)-4,&! 5*123)-4,!673)%*8!8%)!9%#):%-,;*<&! '()123/#<!=3%$#&! :%-4J<%*!!M);??%*#):%-,!!$-,&! '()123/#<!D)J4%)B-*I&! NK!D)J4%)B-*I&! GK!D)J4%)B-*I&! OK!D)J4%)B-*I&! 9YZ CDE 9DE +DE in Video Sequences Human Character Recognition in TV-Style Movies Alexander Kläser Cordelia Schmid LEAR, INRIA Rhônes Alpes Rainer Herpers Fachhochschule Bonn-Rhein-Sieg 6. December 2006 Master Thesis Defense A. Kläser

2 Overview Introduction: goal, sample images, overview approach Approach: human detection, character identification Results: images, videos Closure: conclusion, possible extensions, acknowledgements Master Thesis Defense A. Kläser 1

3 Overview 2

4 Goal Introduction Detecting humans in video sequences and determine their identity Type of data: TV-/cinema-style videos, they provide an uncontrolled, realistic working environment Motivation Understand image content Applications such as image/video retrieval, automated video surveillance Master Thesis Defense A. Kläser 3

5 Example Images Introduction Master Thesis Defense A. Kläser 4

6 Approach Overview Introduction (1) Segment the video into shot sequences (data structuring) (2) Human detection (supervised learning) (3) Character identification (supervised learning) Master Thesis Defense A. Kläser 5

7 Human Detection 6

8 Goal Human Detection Human detection in images under the following constraints: Lateral/frontal From closeup to distant view Partial occlusion Approach: Detection of body parts (face, head, head+shoulders) Flexible assembly of detected body parts Gaussian model for geometric relation between body parts Master Thesis Defense A. Kläser 7

9 Annotation Examples Human Detection Master Thesis Defense A. Kläser 8

10 Body Part Detection Human Detection Detector introduced by Dalal and Triggs [1] is trained on body parts Detector computes SIFT-like descriptors on a fixed grid and uses SVMs for learning Characteristic structures (esp. edges in images) are abstracted and learned Altogether 6 different detectors: each frontal+lateral face, head, head+shoulders, Master Thesis Defense A. Kläser 9

11 Human Detection SIFT-like? The SIFT-Descriptor [3] Image gradients Keypoint descriptor Figure 7: A keypoint descriptor is created by fi rst computing the gradient magnitude and orientation at each image sample point in a region around the keypoint location, as shown on the left. These are weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within the region. This fi gure shows a 2x2 descriptor array computed from an 8x8 set of samples, whereas the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array. (1) An image region is divided into cells (here 2 2) (2) Gradient orientation and magnitude are computed for each pixel in a cell (here 4 4 pixel) 6.1 Descriptor representation (3) All pixels (weighted with a Gaussian) vote into the Histogram of Figure 7 illustrates the computation of the keypoint descriptor. First the image gradient magnitudes and orientations are sampled around keypoint location, using the scale of the Oriented Gradients (HoG) of their cell keypoint to select the level of Gaussian blur for the image. In order to achieve orientation invariance, the coordinates of the descriptor and the gradient orientations are rotated relative Master Thesis Defense to A. the Kläser keypoint orientation. For effi ciency, the gradients are precomputed for all levels of the 10 pyramid as described in Section 5. These are illustrated with small arrows at each sample location on the left side of Figure 7. A Gaussian weighting function with σ equal to one half the width of the descriptor window is used to assign a weight to the magnitude of each sample point. This is illustrated

12 Human Detection Geometric Relation between Body Parts Model similar to work of Mikolajczyk und Schmid [2] On training data and for each body part combination, Gauss functions are learned for: relative x-/y-position, relative scale Detected body parts are assembled based on this model Yields a more robust detection Master Thesis Defense A. Kläser 11

13 Human Detection Statistics Face (Frontal) Detection -- Buffy2 Database Face (Side) Detection -- Buffy2 Database True Positive Rate True Positive Rate face frontal face frontal w/combined body parts face side face side w/combined body parts Number of False-Positives Number of False-Positives Master Thesis Defense A. Kläser 12

14 Human Detection Head (Frontal) Detection -- Buffy2 Database Head (Side) Detection -- Buffy2 Database True Positive Rate True Positive Rate head frontal head frontal w/combined body parts head side head side w/combined body parts Number of False-Positives Number of False-Positives Master Thesis Defense A. Kläser 13

15 Human Detection Head+Shoulders (Frontal) Detection -- Buffy2 Database Head+Shoulders (Side) Detection -- Buffy2 Database True Positive Rate True Positive Rate headshoulders frontalrear headshoulders frontalrear w/combined body parts headshoulders side headshoulders side w/combined body parts Number of False-Positives Number of False-Positives Master Thesis Defense A. Kläser 14

16 Character Identification 15

17 Goal Character Identification Determine the identity of a person under the following constraints: Varying poses, varying perspectives A person can change clothes in a video sequence Varying environments, varying illumination conditions Approach: Bag-of-Features (BoF) separately for each detected body part Classes for main characters + all others Combined codebooks: SIFT and color More robust identification by combining probabilistic votes of connected body parts Master Thesis Defense A. Kläser 16

18 Character Identification Example Training Images (from Buffy) Master Thesis Defense A. Kläser 17

19 Character Identification Master Thesis Defense A. Kläser 18

20 Bag of Features [4, 5] Character Identification (1) Feature extraction Feature: abstracts the local neighborhood of a point in an image (we use SIFT, and mean CIEL*U*V* color) Given: training images for different body parts (face, head, head+shoulders) and classes (main characters + others) Features are extracted from all images (of certain body part type) using dense sampling Master Thesis Defense A. Kläser 19

21 (2) Codebook generation Character Identification All features can be seen as a distribution in a high-dimensional feature space A cluster algorithm (we use k-means) groups similar features together into clusters Each cluster corresponds to a visual word, alle words together represent the codebook Master Thesis Defense A. Kläser 20

22 Character Identification Feature Extraction Clustering in the Feature Space A D C B Master Thesis Defense A. Kläser 21

23 (3) Classification Character Identification Features are extracted (in the same manner as before) from one specific training image Each feature is assigned to its closest visual word All features of an image yield a word histogram (we binarize it) Based on the histograms, an SVM learns (and predicts) the class membership = identity of a person (one-against-one multi-class SVM, non-linear with RBF kernel) Master Thesis Defense A. Kläser 22

24 Character Identification Feature Extraction Histogram Voting Based on Clusters A B C D Master Thesis Defense A. Kläser 23

25 Cluster Examples (Color) Character Identification Master Thesis Defense A. Kläser 24

26 Cluster Examples (SIFT) Character Identification Master Thesis Defense A. Kläser 25

27 Combination of Body Parts Character Identification The character identification is more robust if results of connected body parts are combined For each detected body part, a probabilistic vote (containing the membership probability for each class) is computed Combination: mean from all votes, the class with the highest probability wins Accuracy over all classes and all body parts 81.3% w/o combination 74.4%) (on Buffy data set, leave-one-out cross validation) Master Thesis Defense A. Kläser 26

28 Results 27

29 Result Visualization Master Thesis Defense A. Kläser 28

30 Results (correct) Master Thesis Defense A. Kläser 29

31 Results (incorrect) Master Thesis Defense A. Kläser 30

32 Closure 31

33 Conclusion Closure A system for human detection and character identification in TV-/cinema-style videos has been successfully developed, implemented, and tested Approach based on supervised learning methods Two main techniques: Human detection combines detection of body parts Character identification uses a bag-of-features approach Very promising results Combination of body parts improves results significantly Master Thesis Defense A. Kläser 32

34 Future Work Closure Human Detection Different detection systems for body parts (esp. for body parts that are less rigid), e.g. based on contours Use temporal information in the video sequence Additional tracking of detected body parts Character Identification Different, more versatile clustering methods No binary word histograms Feature extraction using e.g. interest point detectors Master Thesis Defense A. Kläser 33

35 Closure Different feature descriptors esp. for color (e.g. Color-SIFT), but also in general (e.g. SURF) Extension for unsupervised learning Towards image understanding Scenes and environments could be detected with a BoF-like approach Recognized scenes can help to reason on a more semantic level (e.g. about the interaction between certain characters) Master Thesis Defense A. Kläser 34

36 References Closure [1] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In International Conference on Computer Vision & Pattern Recognition, volume 2, pages , June [2] Krystian Mikolajczyk, Cordelia Schmid, and Andrew Zisserman. Human detection based on a probabilistic assembly of robust part detectors. In European Conference on Computer Vision, volume I, pages 69-81, [3] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, [4] Shivani Agarwal and Dan Roth. Learning a sparse representation for object detection. In Proceedings of the 7th European Conference on Computer Master Thesis Defense A. Kläser 35

37 Vision, volume 4, pages , Closure [5] Chris Dance, Jutta Willamowski, Lixin Fan, Cedric Bray, and Gabriela Csurka. Visual categorization with bags of keypoints. In ECCV International Workshop on Statistical Learning in Computer Vision, Master Thesis Defense A. Kläser 36

38 So far about my master thesis... 37

39 ... a big THANKS to the group :)... 38

40 ... do you have questions? 39

String distance for automatic image classification

String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,