Feature Subset Selection Utilizing BioMechanical Characteristics for Hand Gesture Recognition Farid Parvini Computer Science Department University of Southern California Los Angeles, USA Dennis McLeod Computer Science Department University of Southern California Los Angeles, USA Abstract Feature Subset Selection has become the focus of much research in areas of application for Multivariate Time Series (MTS). MTS data sets are common in many multimedia and medical applications such as gesture recognition, video sequence matching and EEG/ECG data analysis. MTS data sets are high dimensional as they consist of a series of observations of many variables at a time. The objective of feature subset selection is two-fold: providing a faster and more cost-effective process and a better understanding of the underlying process that generated the data. We propose a subset selection approach based on biomechanical characteristics, a simple yet effective technique for MTS. We apply our approach for recognizing ASL static signs using Neural Network and Multi-Layer Neural Network and show that we can maintain the same accuracy by selecting just 50% of the generated data. 1. Introduction With the potential for many interactive applications, automatic hand gesture recognition has been actively investigated in the computer pattern recognition and Bio-Informatics communities. Hand gestures are often subtle, can happen at various timescales, and may exhibit long-range dependencies. All these issues make hand gesture recognition a challenging problem. Sometimes, using all the features available has a detrimental effect on accuracy when compared to using some subset of the features. This is because some features are dependent on other features and often noisy, hence, there is no need to include those features. Furthermore, the more features there are, the more likely that some feature will randomly fit the data and hence the probability of over fitting increases. Thus removing these features would lead to both improved accuracy, and clearer descriptions of learned concepts, since there are less features with which the user has to deal with. However, there is a problem: feature subset selection is a difficult problem. If there are n features, then there are 2 n possible feature subsets. Searching this space for a good set of features can be extremely time-consuming.[7] We present a new method for finding the subset data that are well suited for hand gesture recognition. Our method is based on evaluating data received from each sensor and calculating the range of motion for each sensor individually. Our approach is distinct and novel in the following two aspects. First, to the best of our knowledge, our approach utilizing a bio-mechanical characteristics for feature subset selection is unique among all the studies which have intended to apply FSS to Multivariate Time Series for analyzing the collected raw data for gesture recognition. Second, our approach addresses the major challenges involved in FSS: accuracy and time complexity. The remainder of this paper is organized as follows. Section 2 discusses related work. We present our approach in Section 3. The results of our experiments are reported in Section 4. Finally, Section 5 concludes this paper and discusses our future research plans. Figure 1. ASL Alphabets
2. Related Work There are two possible approaches to feature subset selection: the wrapper approach and the filter approach [1]. In the wrapper approach, the learner is applied to subsets of features and tested on a hold-out set (or, if execution time is not an issue, using cross-validation). From the results of these tests, a good subset of features is selected. Yand et al [9] used a wrapper genetic algorithm to select such subsets, achieving multi-criteria optimization in terms of generalization accuracy and costs associated with the features. Bin-Bin Qu [8] adopted an approximation quality concept and hierarchy structure for feature subset selection. The other approach to feature subset selection is the filter approach, which looks at correlations between features and other performance measures to decide a priori, independent of any particular learner, to find what a good subset of features is. Although such techniques do not perform nearly as well as the wrapper approach, they are much faster. Many approaches have been proposed in the area of Multivariate Time Series. Chakraborty [2] summarizes the current techniques for feature subset selection and classification for MTS data sets. Shababi et. al. [3]proposed a family of unsupervised methods for feature subset selection from Multivariate Time Series based on Common Principal Component Analysis. They also proposed an approach for MTS with Extremely Large Spatial Features [4].Tucker [5]presented a method for decomposing high dimensional MTS into mutually exclusive subsets of variables where within-group dependencies are high and between group dependencies are low. Yang [6] utilized the properties of the principal components to retain the correlation information among original features. of one possible starting posture and the posture representing ASL [16] sign L are shown in Figure 2. Figure 2. ASL sign L and its representation with k=4 The process of gesture recognition starts with collecting data from the sensors attached to the hand of a user. At each sensor clock, the sensory device driver captures one sample by acquiring data from all of the n sensors of the device. Each sample is stored in a tuple with n fields where each field is associated with one sensor. We represent a sample at time t by St = (s1, s2,..., sn ), where each si is a real number indicating the value that is acquired from sensor i at time t. As time evolves, a data set of samples is acquired and generated. An example of such a data set of samples is shown in Figure 3. 3. Feature Subset Selection Based on the Range of Motion Gesture recognition by utilizing bio-mechanical characteristics, originally proposed in [11], is inspired by the observation that all forms of hand signs include finger-joint movements from a starting posture to a final posture. We utilize the concept of range of motion from the BioMechanical literature at each joint to abstract this movement. Range of motion (ROM) [15] is a quantity that defines the joint movement by measuring the angle from the starting position of an axis to its position at the end of its full range of the movement. We compute the range of motion per joint by using the sensor values acquired by the sensory device attached to a human hand. Suppose that a user is making the sign by wearing a sensory device. The user is required to start making the sign from a starting posture toward a final posture. An example Figure 3. A snapshot of MTS generated by sensory device Suppose St0 = (s1,..., sn ) and St = (s01,..., s0n ) represent the samples of the start and final postures for making a sign, respectively. We calculate the range of motion tuple R4t as follows : R4t = St St0 = (r1,..., rn ) i (1,..., n), ri = s0i si and 4t = t t0 R4t is a tuple consisting of n positive or negative real numbers depending on the direction of the movement. In order to normalize the values of the movement, we first construct R = ( r1,..., rn ). The rationale behind using
Figure 4. ASL alphabets and N R presentation for each sensor Figure 5. Cyberglove with numbered sensors absolute values is that smaller values (i.e., larger negative numbers) in R do not necessarily mean less movement. To capture the direction of the movement and differentiate between movements in opposite directions, for each R, we calculate D which holds directional information as follows: D t = (d 1,..., d n ) where { 1 if ri 0 d i = 1 otherwise Subsequently, we find the maximum and minimum values within R and represent them as M(R) and m(r), respectively. We then normalize each value in R by subtracting m(r) and dividing the result by (M(R) m(r)). We represent the result of this normalized R which consists of values between 0 and 1 with NR. Finally, we discretize the values of NR with a given discretization parameter k(> 1). For example, if k = 2, we replace each value of NR with 0 if its value is less than 0.5, and 1 otherwise. The resulting NR with k = 4 for all ASL alphabets are displayed in Figure 4 Our hypothesis is that there is a direct correlation between the range of motion and the signature that represents the sign. Consequently, subset selection needs to be based on the maximum range of motion. To this end, we propose eliminating the data received from the sensors whose range of motion are minimal compared to other sensors.
Figure 6. Comprehensive Experiment Result for Three Sets of Experiment In the next section, we evaluate our approach and provide the result of our experiments. 4. Experimental Results and Observations For our experiments, we used the CyberGlove [17] as a virtual reality user interface for acquiring data. CyberGlove is a glove that provides up to 22 joint-angle measurements. It uses proprietary resistive bend-sensing technology to transform hand and finger motions into real-time digital joint-angle data. A picture of this glove and the location of each sensor are shown in Figure 5. We initiated our experiments by collecting data from ten signers wearing the glove and making all 24 ASL static signs representing the 24 English alphabet from the starting posture as shown in Figure 1. Note that signs for letter J and Z involve the movement of the hand and hence considered as dynamic signs and consequently are excluded from our experiments. Figure 1 shows there are not enough sensors in the glove to capture the relationship between the index and middle fingers which is essential to differentiate R from U. Thus we omitted the signs H and R from of our experiments. For some data files, complete sensor values are not captured for some lines. We also excluded all the data lines that did not have all 22 sensor values. For each sign, we collected 140 tuples from the starting posture to the final posture, each including 22 sensor values. For the first set of experiments, we recognized all 24 signs using Neural Network [12], [13] and Multi-layer Neural Network [14]. We used nine data sets for training and tuning and one data set for testing the system and repeated the experiment 10 times. The result of this set of experiment is displayed in the first two columns of Figure 2. We then sorted the sensors based on their contribution to the total range of motion while making all signs. The result are displayed in Figure 4. This figure clearly states that since the cumulative range of motion for sensors 3,7,8,10 and 11 is zero, they have no role in making signs and can easily be omitted. So in the next set of experiments, we omitted data received
from these sensors and repeated the exact experiment as the first step (9 sets for training, 9 sets for tuning and 1 set for testing, repeating 10 times). The result of this set of experiments are display in Figure 6. As can be observed, the overall accuracy did not drop and is very close to the results obtained from the experiment including all sensors. In the last set of experiments, we omitted 5 mores sensors (12,1,4,2,9). These are the sensors that have the minimum cumulative range of motion while making ASL signs. We again repeated the experiment as step one, except we omitted ten sensors from our dataset. The result of this set of experiments is shown in Figure 6. 5. Conclusion & Future Work We proposed an approach for feature subset selection for hand gestures recognition based on the bio-mechanical characteristics of the hand during the formation of the gesture. Our approach has the following advantages over previous studies: It is simple It is application-independent Time complexity is very low Our approach also provides a very good understanding of the process and explains intuitively why some of the sensors can be omitted, the feature that most FSS are lacking. Our experimental results by recognizing static ASL signs confirm the effectiveness of our approach when applied to MTS generated by sensors attached to human hands. We plan to extend this work in two directions. First, we intend to extend our technique for recognizing hand gestures with out sensory device, e.g. vision based recognition. Second, we would like to show that in general, utilizing any other characteristic that defines the system on a higher level or abstraction rather than data layer (e.g. a bio-mechanical characteristic) provides both higher accuracy on result and less dependency on the data gathering process. References [1] G. H. John, R. Kohavi, and K. Pfleger : Irrelevant features and the subset selection problem. In Proceedings of the International Conference on Machile Learning 1994, pages 121-129, 1994 [2] B. Chakraborty : Feature Selection and Classification Techniques for Multivariate Time Series, Innovative Computing,Information and Control, International Conference on, pp. 42, Second International Conference on Innovative Computing, Informatio and Control (ICICIC 2007), 2007 [4] H. Yoon; C. Shahabi: Feature Subset Selection on Multivariate Time Series with Extremely Large Spatial Features Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on Volume, Issue, Dec. 2006 Page(s):337-342 Digital Object Identifier 10.1109/ICDMW.2006.81 [5] A. Tucker, S. Swift, X. Liu: Variable grouping in multivariate time series via correlation. IEEE Trans. on Systems, Man, and Cybernetics, Part B 31 (2001) [6] K. Yang, H. Yoon, C. Shahabi: Clever: a feature subset selection technique for multivariate time series. Technical report, University of Southern California (2005) [7] E. Cantu-Paz, S. Newsam, C. Kamath: Feature Selection in Scientific Applications, Proceedings, ACM International Conference on Knowledge Discovery and Data Mining, pp 788-793, August 22-25, 2004, Seattle, WA. UCRL-CONF-202657 [8] J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song.: Dimensionality reduction via sparse support vector machines. JMLR, 3:12291243, 2003 [9] J. Yang, V. Honavar: Feature subset selection using a genetic algorithm Intelligent Systems and their Applications, IEEE Volume 13, Issue 2, Mar/Apr 1998 Page(s):44-49 [10] B.B. Qu, Y. Lu : A hierarchy reduct algorithm for feature subset selection, Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on Volume 2, Issue, 26-29 Aug. 2004 Page(s): 1157-1161 vol.2 [11] F. Parvini, C. Shahabi: An algorithmic approach for static and dynamic gesture recognition utilising mechanical and biomechanical characteristics, International Journal of Bioinformatics Research and Applications, Vol. 3, No. 1, 2007 [12] K. Boehm,W. Broll, and M. Sokolewicz: Dynamic gesture recognition using neural networks; a fundament for advanced interaction construction. SPIE Conference Electronic Imaging Science and Technology, San Jose, CA,1994. [13] K. Murakami, H. Taguchi: Gesture recognition using recurrent neural networks. CHI 91 Conference Proceedings, pages 237242,1991. [14] J. Eisenstein, S. Ghandeharizadeh, L. Golubchik, C. Shahabi, D. Yan and R. Zimmermann: Device Independence and Extensibility in Gesture Recognition, IEEE Virtual Reality Conference (VR), LA, CA, 2003 [15] N. B. Reese: Joint Range of Motion and Muscle Length Testing, ISBN 0721689426, Saunders Publication, 2001. [16] http://www.lifeprint.com/ [17] http://www.cyberglovesystems.com/products/cyberglove.php [3] H. Yoon, K. Yang, and C. Shahabi: Feature subset selection and feature ranking for multivariate time series. IEEE Trans. Knowl. Data Eng., 17(9):11861198, 2005