Adaptive Gesture Recognition System Integrating Multiple Inputs

Size: px

Start display at page:

Download "Adaptive Gesture Recognition System Integrating Multiple Inputs"

Lucas Ferguson
5 years ago
Views:

1 Adaptive Gesture Recognition System Integrating Multiple Inputs Master Thesis - Colloquium Tobias Staron University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects of Multimodal Systems May 19, 2015 Tobias Staron 1

2 Table of contents Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 2

3 Introduction University of Hamburg Overview Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 3

4 Introduction - Gesture Recognition in General Overview Introduction Gesture Recognition in General Motivation Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 4

5 Introduction - Gesture Recognition in General Gesture Recognition in General several applications (more natural interaction with robots, way of communication, sign language,...) Gesture recognition is the process by which the gestures made by the user are recognized by the receiver. (Mitra & Acharya, 2007 [3]) static vs. dynamic gestures Tobias Staron 5

6 Introduction - Motivation Overview Introduction Gesture Recognition in General Motivation Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 6

7 Introduction - Motivation Previous Work TAMS - Master Project Intelligent Robotics ( ) vision-based system (Microsoft Kinect) for recognizing static gestures depth images and Support Vector Machines (SVMs) project paper (Paetzel & Staron, 2014 [4]) Tobias Staron 7

8 Introduction - Motivation Problems in Gesture Recognition recognition results in general context-depended applications changed circumstances, e.g. new users / users with different figures, changed environments, changed camera properties (position, calibration,...), light changes,... exploiting features of Robotics (a robot might have more than one sensor; possible interaction between user and robot) Tobias Staron 8

9 Introduction - Motivation Hypotheses use of multiple inputs improved recognition results (& context-independent systems) use of multiple inputs robustness possible interaction between user and robot ability of the system to adapt to changed circumstances possible interaction between user and robot omitting of preliminary training development of an adaptive gesture recognition system that makes use of multiple inputs Tobias Staron 9

10 Set-Up University of Hamburg Overview Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 10

11 Set-Up - Inputs University of Hamburg Overview Introduction Set-Up Inputs Data Evaluation Criteria k-nearest Neighbor (k-nn) Classifiers Multiple Inputs & Adaptivity Conclusion Tobias Staron 11

12 Set-Up - Inputs University of Hamburg Depth Images gray value images information about distances to the camera preprocessing (noise reduction, foreground separation, histogram equalization, grid) (Biswas & Basu, 2011 [2]) gray value binning in grid cells 520 features Tobias Staron 12

13 Set-Up - Inputs University of Hamburg Exemplary Preprocessing of a Depth Image RGB image of an exemplary gesture. Tobias Staron 13

14 Set-Up - Inputs University of Hamburg Exemplary Preprocessing of a Depth Image The corresponding depth image prior to preprocessing. Tobias Staron 13

15 Set-Up - Inputs University of Hamburg Exemplary Preprocessing of a Depth Image The depth image but with reduced noise. Tobias Staron 13

16 Set-Up - Inputs University of Hamburg Exemplary Preprocessing of a Depth Image Only the foreground of the depth image. Tobias Staron 13

17 Set-Up - Inputs University of Hamburg Exemplary Preprocessing of a Depth Image The foreground of the depth image after histogram equalization. Tobias Staron 13

18 Set-Up - Inputs University of Hamburg Exemplary Preprocessing of a Depth Image The equalized foreground of the depth image with a grid put on it. Tobias Staron 13

19 Set-Up - Inputs University of Hamburg Skeletal Information OpenNI tracker position and orientation of several joints of the human skeleton a coordinate frame for each joint transformations into target frame 8 joints 56 features Tobias Staron 14

20 Set-Up - Data University of Hamburg Overview Introduction Set-Up Inputs Data Evaluation Criteria k-nearest Neighbor (k-nn) Classifiers Multiple Inputs & Adaptivity Conclusion Tobias Staron 15

21 Set-Up - Data University of Hamburg Collecting Training and Test Data 12 gestures 10 test users different groups (users with similar/differing figures) different poses and positions (to the left or right) but no different distances to the camera different environments camera calibration and illumination remained unchanged Tobias Staron 16

22 Set-Up - Data University of Hamburg Different Environments Tobias Staron 17

23 Set-Up - Evaluation Criteria Overview Introduction Set-Up Inputs Data Evaluation Criteria k-nearest Neighbor (k-nn) Classifiers Multiple Inputs & Adaptivity Conclusion Tobias Staron 18

24 Set-Up - Evaluation Criteria Evaluation Criteria precision: proportion of test instances classified correctly recall: proportion of instances that should have been classified as a certain gesture that have actually got the respective label F 1 -score = (2 precision recall)/(precision + recall) average classification and (initial) training time nr. of training instances Tobias Staron 19

25 Set-Up - k-nearest Neighbor (k-nn) Classifiers Overview Introduction Set-Up Inputs Data Evaluation Criteria k-nearest Neighbor (k-nn) Classifiers Multiple Inputs & Adaptivity Conclusion Tobias Staron 20

26 Set-Up - k-nearest Neighbor (k-nn) Classifiers k-nearest Neighbor (k-nn) Classifier supervised learning method arbitrary number of dimensions no explicit training (computations during classification) label that occurred most among the k-nearest neighbors of a query instances is chosen distance measure (e.g. Euclidean distance) Tobias Staron 21

27 Three classes, represented by blue squares, magenta diamonds and yellow Tobias Staron pentagons respectively; the red square is the query instance. 22 University of Hamburg Set-Up - k-nearest Neighbor (k-nn) Classifiers Exemplary Dataset in the 2-Dimensional Space

28 Set-Up - k-nearest Neighbor (k-nn) Classifiers Weighted k-nn Classifier if a training example matches the query instance, its label will be chosen Generalization the nearer one of its k-nearest neighbors lies by the query instance, the higher the probability that its label is the result Tobias Staron 23

29 Set-Up - k-nearest Neighbor (k-nn) Classifiers Training of Classifiers classifiers for each kind of input, for each group of users and for each environment the same amount of training (and test) data for each classifier Tobias Staron 24

30 Multiple Inputs & Adaptivity Overview Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 25

31 Multiple Inputs & Adaptivity - Combining Multiple Inputs Overview Introduction Set-Up Multiple Inputs & Adaptivity Combining Multiple Inputs Adaptivity Integrated System Conclusion Tobias Staron 26

32 Multiple Inputs & Adaptivity - Combining Multiple Inputs Sensor Fusion low-level sensor fusion: fusion at signal level, one classifier high-level sensor fusion: fusion at a more symbolic level, one classifier per input, classification results are fused low-level sensor fusion does not allow for variations regarding the chosen inputs (e.g. adding or removing of sensors) without omitting previous data / retraining everything high-level sensor fusion was chosen Tobias Staron 27

33 Multiple Inputs & Adaptivity - Combining Multiple Inputs Hypotheses Verification inspired by Aldoma et al. (Aldoma et al., 2013 [1]) high-level sensor fusion approach one classifier per kind of input each classifier can generate an unspecified number of hypotheses each hypothesis is weighted hypothesis with the highest weight is chosen as recognition result Tobias Staron 28

34 Multiple Inputs & Adaptivity - Combining Multiple Inputs Weighting Cues each hypothesis is weighted by an unspecified number of weighting cues neighboring cue (in case of k-nn classifiers): all labels occurring among k-nearest neighbors as hypotheses; weights depend on nr. of examples with respective labels / on their distance to the query instance meta-features: e.g. reliability of classifiers summation of weights of a hypothesis Tobias Staron 29

35 Multiple Inputs & Adaptivity - Combining Multiple Inputs Evaluation (1) the same data were used as for testing the classifiers with depth respectively skeletal information individually k-nn classifier: best performance for the neighboring cue weighted k-nn classifier outperformed the standard one improved robustness Tobias Staron 30

36 Comparison of the individual inputs and their combination via Tobias Staron neighboring cues for the k-nn classifier, trained and tested on data from 31 University of Hamburg Multiple Inputs & Adaptivity - Combining Multiple Inputs Evaluation (1) F 1 -score depth images skeletal information neighboring cue k

37 Comparison of the individual inputs and their combination via Tobias Staron neighboring cues for the weighted k-nn classifier, trained and tested on 31 University of Hamburg Multiple Inputs & Adaptivity - Combining Multiple Inputs Evaluation (1) F 1 -score depth images skeletal information neighboring cue k

38 Multiple Inputs & Adaptivity - Combining Multiple Inputs Evaluation (1) - depth images skeletal data combined inputs F 1 -score Table: Comparison of the individual inputs and their combination via neighboring cues for the weighted 5-NN classifier, trained on data from users with similar figures and tested on data from the same users, but in a different environment. Tobias Staron 31

39 Multiple Inputs & Adaptivity - Adaptivity Overview Introduction Set-Up Multiple Inputs & Adaptivity Combining Multiple Inputs Adaptivity Integrated System Conclusion Tobias Staron 32

40 Multiple Inputs & Adaptivity - Adaptivity Online Learning goal: recognition of gestures under changed circumstances classifiers try to recognize query instances and are told the correct label afterwards to update their model no online version for SVMs (they need to be retrained every time new training are added) k-nn classifiers different points when to learn showed no apparent effects Tobias Staron 33

41 Multiple Inputs & Adaptivity - Adaptivity Evaluation (2) 5-NN classifier trained on depth images from users with similar figures and tested on depth images from the same users, but in a different environment online learning after each misclassification training data and the test data of iteration 1 the same as for previous tests similar tests in the remaining iterations, but with newly sampled test data Tobias Staron 34

42 Multiple Inputs & Adaptivity - Adaptivity Evaluation (2) F 1 -score nr. of training data nr. of retraining F 1 -score # iteration Tobias Staron 35

43 Multiple Inputs & Adaptivity - Adaptivity Evaluation (2) average classification time nr. of training data nr. of retraining time (sec) # iteration Tobias Staron 35

44 Multiple Inputs & Adaptivity - Integrated System Overview Introduction Set-Up Multiple Inputs & Adaptivity Combining Multiple Inputs Adaptivity Integrated System Conclusion Tobias Staron 36

45 Multiple Inputs & Adaptivity - Integrated System Multiple Inputs Hypotheses Verification additional weighting cues are enabled by online learning: the experience of a classifier (the number of examples it has been trained with) Tobias Staron 37

46 Multiple Inputs & Adaptivity - Integrated System Adaptivity online Learning what examples to learn previously: all misclassified ones alternative: misclassified examples as soon as the fusion result is wrong, too Tobias Staron 38

47 Multiple Inputs & Adaptivity - Integrated System Evaluation (3) 5-NN classifier trained on data from users with similar figures and tested on data from the same users, but in a different environment depth images and skeletal data combined via neighboring cue online learning after each misclassification training data and the test data of iteration 1 the same as for previous tests similar tests in the remaining iterations, but with newly sampled test data Tobias Staron 39

48 Multiple Inputs & Adaptivity - Integrated System Evaluation (3) F 1 -score learning after false classifier prediction learning after false classifier and overall prediction iteration Tobias Staron 40

49 Multiple Inputs & Adaptivity - Integrated System Evaluation (3) time (sec) learning after false classifier prediction learning after false classifier and overall prediction iteration Tobias Staron 40

50 Multiple Inputs & Adaptivity - Integrated System Evaluation (4) - Final Test weighted 5-NN classifier depth images and skeletal data combined via neighboring cue online learning after each misclassification when fusion result false, too no preliminary training test data from users with similar figures (first ten and last ten iterations), data from users with varying figures (iteration 11-20), original users, but in a different environment (iteration 21-30) and the users with the varying figures in that environment (iteration 31-40) Tobias Staron 41

51 Multiple Inputs & Adaptivity - Integrated System Evaluation (4) - Final Test F 1 -score # F score nr. of training data - depth images nr. of training data - skeletal information iteration Tobias Staron 42

52 Conclusion University of Hamburg Overview Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Tobias Staron 43

53 Conclusion - Summary Overview Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Summary References Tobias Staron 44

54 Conclusion - Summary Hypotheses use of multiple inputs lead to improved recognition results as well as a more robust system system is able to adapt to changed circumstances due to online learning preliminary training can be omitted because of online learning adaptive gesture recognition system that makes use of multiple inputs Tobias Staron 45

55 Conclusion - References Overview Introduction Set-Up Multiple Inputs & Adaptivity Conclusion Summary References Tobias Staron 46

56 Conclusion - References References I 1. Aitor Aldoma, Federico Tombari, Johann Prankl, Andreas Richtsfeld, Luigi Di Stefano, and Markus Vincze. Multimodal Cue Integration through Hypotheses Verification for RGB-D Object Recognition and 6DOF Pose Estimation. In Robotics and Automation (ICRA), 2013 IEEE International Conference on, pages IEEE, May K. K. Biswas and Saurav Kumar Basu. Gesture Recognition using Microsoft Kinect R. In Automation, Robotics and Applications (ICARA), th International Conference on., pages December Tobias Staron 47

57 Conclusion - References References II 3. Sushmita Mitra and Tinku Acharya. Gesture Recognition: A Survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 37(3): , Maike Paetzel and Tobias Staron. Gesture Recognition. Project Paper, Technical Aspects of Multimodal Systems, Department of Informatics, MIN-Faculty, University of Hamburg, March Tobias Staron 48

58 Conclusion - University of Hamburg The End Thanks for Your Attention! Tobias Staron 49

Object Classification in Domestic Environments

Object Classification in Domestic Environments Markus Vincze Aitor Aldoma, Markus Bader, Peter Einramhof, David Fischinger, Andreas Huber, Lara Lammer, Thomas Mörwald, Sven Olufs, Ekaterina Potapova, Johann