Human Action Recognition Using Independent Component Analysis

Human Action Recognition Using Independent Component Analysis Masaki Yamazaki, Yen-Wei Chen and Gang Xu Department of Media echnology Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan Abstract Principal Component Analysis (PCA) is often used for reducing the dimensionality of input feature space. However, the eigenspace based on PCA is not always the best feature space for pattern recognition. In this paper, we use the feature space based on Independent Component Analysis (ICA) and show that the ICA representation is more effective than the PCA representation for human action recognition. he experimental results on the human action database show that the ICA approach produces more accurate recognition than the PCA approach. 1. Introduction Recognition of human actions from video images is very important in various ways, such as surveillance, video retrieval and human computer interaction. he general method for human action recognition is to extract human motion information directly from video sequences and to compare it with a human action database. he important problem for human action recognition is how to learn and classify human actions efficiently. he approaches for human action recognition from images can be classified into two types. he first type [1, 8, 9] is to use 2D motion information from background subtraction, frame differencing, optical flow segmentation, and these methods are easy to calculate. However, 2D image information changes due to lighting conditions and camera view points. he second type [2, 10] is to use 3D motion information from a marker and marker-less motion capture system. It is difficult to get 3D human motions, because it needs especial devices or complex algorithms. 2D-based methods are popular methods for human action recognition. here is a famous method that uses an eigenspace method for learning [3]. he eigenspace method regards one image as one point in eigenspace, so image sequences are point sequences of time series data. It is possible to recognize human actions by verifying the shape of locus constructed from the point sequences in eigenspace. Several successive methods using the eigenspace method for recognizing human motions have been proposed [4, 5, 15, 16, 17, 18]. he eigenspace transformation based on Principal Component Analysis (PCA) is applied to video images for reducing the dimensionality of input feature space. Several pattern classification techniques are finally performed in the lower-dimensional eigenspace for recognition. However, the eigenspace based on PCA is not always the best feature space for pattern recognition, because PCA can only reduce the dimensionality of input data. Recently, some applications of Independent Component Analysis (ICA) [14] that is a statistical data analysis method have been exploited in the field of image processing and computer vision. For example, face recognition, the blind deconvolution of a blurred image and separating reflections. he ICA generalizes the technique of PCA and has proven to be superior to PCA for face recognition [6, 7]. In this paper, we use the feature space based on ICA and show that the ICA representation is more effective than the PCA representation for recognizing human actions. We give a comparison result between ICA and PCA. Section 2 describes the related work. Section 3 presents the ICA representation of human actions. Section 4 describes our experimental results. Section 5 summaries our conclusion. 2. Related Work Human action recognition consists of three major steps, human detection and tracking, feature extraction, and training or classification. he first step is to detect and track a human figure in an image sequence. ackground subtraction is a simple method for human de-

tection and tracking. For each image sequence, a background subtraction algorithm and a simple correspondence procedure are firstly used for segmenting and tracking the moving silhouettes of a human figure. he second step is to extract human motion features from the each frame and to normalize the image size with the shape centroid. he third step is to apply PCA for reducing the dimensionality of input feature space and to recognize human actions using standard pattern classification techniques in the lower-dimensional eigenspace. he general human actions are named as follows: Walk, Run, Skip, Hand waving and so on. he experimental motion patterns of many researchers are approximately 5 ~ 10 patterns. Human action recognition is a very difficult problem, because human actions have a lot of motion patterns and there are the few open databases of human actions. In this paper, we try to improve the PCA-based method by using ICA. he overview of the eigenspace method is shown in Fig. 1. Fig. 1 (a) shows the binary silhouettes extracted from a walking motion sequence using background subtraction and normalization. Fig. 1 (b) shows the PCA projected trajectory in the threedimensional eigenspace. Recognition of human actions is done by measuring similarities between reference patterns and test samples in the lower-dimensional eigenspace. For example, there are several ways such as Spatiotemporal Correlation, Normalized Euclidean Distance and Canonical Angle. However, the eigenspace based on PCA is not the best feature space for human action recognition, because PCA can only reduce the dimensionality of input data and PCA basis vectors represent global features. For better recognizing human actions, we try to construct more effective feature space by using ICA than the feature space based on PCA, because ICA can get independent components and it represent local features efficiently. Fig. 1. Overview of the eigenspace method. 3. ICA representation of human actions 3.1. rief Introduction of ICA ICA generalizes the technique of PCA and has proven to be a good tool of feature extraction. When some mixtures of probabilistically independent source signals are observed, ICA recovers the original source signals from the observed mixtures without knowing how the sources are mixed. he general model can be described as follows: We start with the assumption that the observation vectors X = ( x,,, ) 1 x2 L xm can be represented in terms of a linear superposition of unknown independent vectors S = ( s,,, ) 1 s2 L s N. X = AS (1) where A is an unknown mixing matrix (M N). he goal of ICA is to find a matrix W, so that the resulting vectors Y = WX (2) recovers the independent vectors S, probabilistically permuted and rescaled. W is roughly the inverse matrix of A. efore performing ICA, the problem of estimating the matrix A can be simplified by a prewhitening of the vectors X. he observed vectors X is first linearly transformed to other vectors Z = MX (3) whose correlation matrix equals unity: E ( Z Z ) = I. his can be accomplished by PCA with 1/ 2 M = D V (4) where the matrix V is the eigenvector matrix of the covariance matrix of X and the matrix D is the eigenvalue matrix of the covariance matrix of X. At the same time, the dimensionality of the vectors is reduced. After this transformation we have Z = MX = MAS = S (5) where the matrix is the mixing matrix. ICA is performed on the sphered vectors Z and the estimated mixing matrix is an orthogonal matrix, since E ( Z Z ) = E( S S ) = = I. After a prewhitening of the vectors X, we can rewrite Eq.(2) to: Y = WZ (6) Several ICA algorithms have been proposed for solving W. Here we use a neural learning algorithm proposed by ell & Sejnowski [11]. he algorithm is to maximize the joint entropy by using a stochastic gradient ascent. he gradient update rule for the weight matrix W is as follows:

W = ( I + g ( Y) Y )W (7) Y where g ( Y) = 1 2 /(1 + e ) is calculated for the each component of Y. 3.2. ICA basis images from human actions From face recognition research reports, PCA basis images represent global information such as average faces and ICA basis images represent local information such as facial parts. he ICA representation is better performance than the PCA representation for face recognition. We show the ICA basis images and the PCA basis images from human walking sequences in Fig. 2 and Fig. 4. he 4 basis functions of the each algorithm are shown in Fig. 3 and Fig. 5. Fig. 3 (a) and Fig. 5 (a) are the 4 PCA eigenvectors with highest eigenvalues (left is first eigenvalue) and Fig. 3 (b) and Fig. 5 (b) are the 4 ICA basis images. From these basis images, we can see that the PCA basis images represent the frequency of body movements, which means that both arms and feet are the most frequent movement in a walking motion. In contrast, the ICA basis images represent the motion features of the each frame sequence, which means that a walking motion are repetitions such as opened feet and crossed feet at each time. In brief, ICA can get the local motion features of human motions. 3.3. Justification of ICA ICA can be applied only if the independent components obey non-gaussian distribution. In order to measure non-gaussianity of the resu1lting independent components, we calculate the kurtosis of the independent components. Since the kurtosis of gaussian distribution is equal to 0, we can measure nongaussianity by calculating the kurtosis. he normalized kurtosis [14] is defined as 4 ( xi x) i 3 (8) 2 2 ( xi x) i he kurtosis of our resulting independent components is shown in able 1. Fig. 2. Sample silhouette images of a human walking sequence. Fig. 3. PCA and ICA basis images from the walking silhouette images. Fig. 4. Sample frame difference images of a human walking sequence. ALE 1. Kurtosis of ICA representation. Kurtosis ICA 69.5 Fig. 5. PCA and ICA basis images from the walking frame difference images.

4. Experimental results o evaluate our proposed method, we used the human action database [12] containing the six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in the four different scenarios: outdoors s1, outdoors with scale variation s2, outdoors with different cloths s3 and indoors s4. In our experiments, we used only the data of s1 that is the easiest situation (see Fig. 6). Since appearance-based methods are depended on the image appearance of a camera angle, it is very difficult to learn all view patterns. We just tried to show the comparison results between PCA and ICA from the simple human action data. Since the database is limited, the recognition results were measured using a leave-one-out strategy which makes maximal use of the available data for training. In order to select the number of independent components, we used PCA reduction before applying ICA. Recognition was done on the test sequences using a minimum Euclidean distance classifier and a Cosine similarity classifier that uses as similarity the angle between a test vector and a training one. he recognition rate was computed as the ratio of the number of samples classified correctly to the total number samples. Fig. 7 shows the recognition performance that was averaged over the recognition rate of the two motion features (silhouettes, frame difference) and the two classifiers (the minimum Euclidean distance, the Cosine similarity). It can be seen that the recognition rate of ICA was higher than that of PCA. he best recognition rate was about 75% for PCA and about 80% for ICA. Fig. 8 shows the confusion matrix of PCA and ICA. It turns out from these results that the ICA representation provides more discriminability power than the PCA representation for human action recognition. In order to verify these results and gain some knowledge about the inter-class and intra-class dispersion, we measured the class separability by computing the within-class scatter matrix S W, the between-class scatter matrix S and the mixture scatter matrix S M = SW + S corresponding to the each class [13]. he criterion used for the class separability measurement was J = trace( S M ) trace( S W ). his number is large when S is dispersed or the scatter of S W is small. Our experimental results are shown in able 2. he results presented in able 2 are consistent with the results shown in Fig. 8, since the smallest J are J(Jog) and J(Run) that correspond to the worst recognition rates ( Jog and Run ) and J(ox) has a large value. he value of J(Jog) and J(Run) are small, since running style is very similar to jogging style and it is difficult to classify. As one can see, the ICA representation produces a better class separation than the PCA representation. Fig. 6. Sample frames from the action database. Fig. 7. Recognition performance of PCA and ICA.

6. References Fig. 8. Confusion matrix of PCA and ICA (%). ALE 2. Class separability measurement of PCA and ICA. Case PCA ICA J(Walk) 1.025 1.036 J(Jog) 1.014 1.019 J(Run) 1.015 1.021 J(ox) 1.026 1.039 J(Hwav) 1.023 1.031 J(Hclp) 1.017 1.023 5. Conclusions We have shown that the ICA approach is more effective than the PCA approach for human action recognition. he ICA representation produces a better class separation than the PCA representation, since ICA can get the local motion features of human actions. Further investigations will be tried to compare the other methods with more motion patterns. [1] Y. Yacoob and M. J. lack, Parameterized modeling and recognition of activities, Journal of Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999. [2] D. M. Gavrila and L. S. Davis, 3-D model-based tracking of humans in action: a multi-view approach, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 73-80, 1996. [3] H. Murase and R. Sakai, Moving object recognition in eigenspace representation: Gait analysis and lip reading, Pattern Recognition Letters, vol. 17, pp. 155-162, 1996. [4] O. Masoud, and N. Papanikolopoulos, A Method for Human Action Recognition, Image and Vision Computing, vol. 21, no. 8, pp.729-743, Aug. 2003 [5] L. Wang,. an, H. Ning and W. Hu, Silhouette Analysis-ased Gait Recognition for Human Identification, IEEE rans. Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1505-1518, 2003. [6] M. artlett, J. Movellan, and. Sejnowski, Face recognition by independent component analysis, IEEE rans. Neural Networks, pp. 1450-1464, 2002. [7] J. Yang, D. Zhang and J. Y. Yang, Is ICA Significantly etter than PCA for Face Recognition?, Proc. IEEE Int. Conf. Computer Vision, pp. 198-203, 2005. [8] C. Cedras and M. Shah, Motion-based recognition: a survey, Image and Vision Computing, vol. 13, no. 2, pp. 129-155, 1989. [9] D. M. Gavrila, he visual analysis of human movement: a survey, Journal of Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82-98, January, 1999. [10] L. Campbell and A. obick, Recognition of human body motion using phase space constraints, Proc. IEEE Int. Conf. Computer Vision, pp. 624-630, 1995 [11] A. ell, and. Sejnowski, An informationmaximization approach to blind separation and blind deconvolution, Neural Computation 7:1129-1159, 1995. [12] C. Schuldt, I. Laptev and. Caputo, Recognizing Human Actions: A Local SVM Approach Proc. Int l Conf. Pattern Recognition, pp. 32-36, 2004. [13] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990. [14] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley, New York, 2001. [15] M. lack and A. Jepson, Eigentracking: Robust matching and tracking of articulated objects using view-based representation, Int l Journal of Computer Vision, vol. 26, pp. 63-84, 1998. [16] C. enabdelkader, R. Culter, H. Nanda, and L. Davis, EigenGait: Motion-ased Recognition of People Using Image Self-Similarity, Proc. Int l Conf. Audio-and Video- ased iometric Person Authentication, pp. 284-294, 2001. [17] J. Lu, E. Zhang, Z. Zhang, and Y. Xue, Gait Recognition Using Independent Component Analysis, Proc. Advances in Neural Networks, pp. 183-188, 2005. [18] K. K. Lee and Y. Xu, Modeling Human Actions from Learning, Proc. IEEE Int. Conf. Intelligent Robots and Systems, pp. 2787-2792, 2004.