Wavelet based Keyframe Extraction Method from Motion Capture Data Xin Wei * Kunio Kondo ** Kei Tateno* Toshihiro Konma*** Tetsuya Shimamura * *Saitama University, Toyo University of Technology, ***Shobi University, Abstract This research extracts eyframes from Motion capture data. Accurate selection of eyframes is a ey point in motion synthesizing, editing, retargeting and motion data compression. Our method firstly represents every oint curve with a simple numeric sequence based on wavelet transformation. Then the correlation between curves is calculated based on these numeric sequences. Several curves which are least correlated are chosen for eyframe extraction. In the next step, a noise filter is applied to selected curves. Finally the eyframes are selected from the de-noised curves. Keyword: Animation, Motion capture data, Keyframe extraction. Introduction Selecting eyframes for motion capture data accurately is an important technique in motion compression and motion editing. In order to extract eyframes, we should firstly reduce the dimensions of motion. The general way is to apply the PCA (Principal Component Analysis) technique on motion data to yield a reduced dimension space. In this research, we calculate correlation coefficient of each oint curve based on wavelet transformation, which has a lower computation complexity than PCA. Then several representative curves are selected according to their correlation coefficients. By doing this, most of the redundant curves which follow similar locomotion pattern can be removed. Figure is the overview of our algorithm. The detailed introduction of our algorithm is explained in the following section. In the field of eyframe extraction for motion capture data, Vadym Voznyu [6] applied principal component analysis (PCA) to reduce the dimension of motion capture data. Matsuda and Kondo [] used handwriting techniques to compress the motion capture data. This paper presents a novel method for extracting eyframes from motion capture data. Compared with the handwriting techniques proposed by Matsuda and Kondo [], our method doesn t need to modify every curve manually. Furthermore, because we extract eyframes from a reduced dimension of motion capture data, our method brings less redundant frames. Our algorithm is based on discrete wavelet transform, so it has a lower computation complexity than that of PCA, which is used in the research of Vadym Voznyu [6]. In our algorithm, every oint curve is re-represented with a simple numeric sequence, which can be regarded as the curve s index. These indices are than used to calculate the correlation of the oint curves. Several least correlated curves are chosen for eyframe extraction. In a motion database, these indices can be also used in motion query. We can index a segment of motion and scale it to match the motion in the database. By doing so, we can find motions with similar pattern in the database. Also, we give each oint a weight. The important oints (hip, thigh and arm) have higher weights whereas less important oints (hand, foot and so on) have lower weights. In this way, we can locate oints which contribute more to the whole movement. Motion data input Discrete wavelet transform Curve re-representation Correlation calculation Noise filtering Keyframe selection Keyframes output Figure. Overview of the algorithm 2. Implementation The core of this eyframe extraction method is wavelet transformation. In motion data dimension reduction, wavelet transformation is faster than PCA; in curves de-noising, wavelet outperforms Fourier transform in almost every aspect. The following part (section 2.) is a brief introduction of wavelet transform used in this research. After that, we will explain our curve selection algorithm in section 2.2 and noise filtering algorithm in section 2.3.
2. Wavelet transform The traditional method used in analyzing time series is Fourier transform. It is based on the simple observation that every signal can be represented by a superposition of sine and cosine waves. Wavelets can be thought of as a generalization of this idea to a larger family of functions than sine and cosine. Fourier transformation includes Fourier series expansion, the discrete Fourier transform, and the integral Fourier. The counterparts in wavelet domain are the generalized wavelet series expansion, the discrete wavelet transform, and the continuous wavelet transform. The wavelet series expansion maps a function of a continuous variable into a sequence of coefficients. The wavelet series expansion function f(x) L 2 (R) relative to wavelet (x) and scaling function φ(x) is: f = c ( ) ( ) + 0 0, x d ( ), = 0 where 0 is an arbitrary starting scale. The c 0 () s are normally called the approximation or scaling coefficients; the d () s are referred to as the detail or wavelet coefficients. This is because the first sum in above equation uses scaling functions to provide an approximation of f(x) at scale 0. For each higher scale 0 in the second sum, a finer resolution function (a sum of wavelets) is added to the approximation to provide increasing detail. If the expansion functions form an orthonormal basis of tight frame, which is often the case, the expansion coefficients are calculated as c ( ) = f, = f, dx 0 0, 0 And d ( ) = f,, = f, dx Scaling function The set of expansion functions is composed of integer translations and binary scaling of the real, square-integrable function φ(x); that is, the set {φ, (x)} where / 2, = 2 (2 x ) for all, Z and φ(x) L 2 (R). Here, determines the position of φ, (x) along the x-axis, and 2 /2 controls its height or amplitude. The shape of φ, (x) changes with, so φ(x) is called a scaling function. By choosing φ(x), {φ, (x)} can be made to span L 2 (R), the set of all measurable, square-integrable functions. Wavelet functions The subspace spanned over for any is V, which can be called scaling subspace. The scaling function should obey the four fundamental requirements of multiresolution analysis (MRA [7]). Given a scaling function that meets the MRA requirements, we can define a wavelet function (x) that, together with its integer translates and binary scaling, spans the difference between any two adacent scaling subspaces V and V +. W is called wavelet function subspace. We define the set {, (x)} of wavelets for all Z that spans the W spaces / 2, = 2 (2 x ) The scaling and wavelet function subspaces are related by V + = V W where denotes the union of spaces. The orthogonal complement of V in V + is W, and all members of V are orthogonal to the members of W. Therefore,,, l = 0 for all appropriate,, l Z. If the function being expanded in wavelet expansion is a sequence of numbers, the resulting coefficients are called the discrete wavelet transform (DWT). There for, the series expansion becomes the DWT transform pair W ( 0, ) = f, 0 M x W (, ) = f, M x For 0 f = W ( 0, ) + W (, ) 0,, M M = 0 Here, f(x), φ 0, (x), and, (x) are functions of the discrete variable x = 0,,2,, M. Normally, we let 0 = 0 and select M to be a power of 2, so that the summations are performed over x = 0,, 2,, M, = 0,, 2,, J, and = 0,, 2,, 2. Unlie the DFT that taes the original signals in time/space domain and transforms them into frequency domain, the Wavelet transform taes the original signals in time/space domain and transforms them into time/frequency or space/frequency domain. Since the wavelet transform gives a time-frequency localization of the signal, it means that most of the energy of the signal can be represented by only a few DWT coefficients. Therefore, we can transform the motion capture data into wavelet representation without losing of general information. 2.2 Curve Selection One captured motion has around 60 oint curves. Some of these curves are highly correlated with each other. In order to extract eyframes from the motion data, we should choose several representative curves. For this purpose, we re-represent the original data in a new form, which can represent the gross pattern of original data and at the same time can be calculated more efficient and faster. The detailed curve re-representation process is as follows:. Transform original curves (Figure 2(a)) into agged shape using Haar wavelet (Figure 2(b)). Decomposition level could be set to adust the precision of simulation. 2. Represent curves with sequence of + and -. When the curve tends to increase, set the Y value as +; set the value as - when decrease (Figure 2(c)). Since we care only the frames where curve changes its direction, the original curve is re-represented in a new form which doesn t contain Y axis information (Figure 2(d)). This numeric sequence consisting of + and - could be regarded as coefficient of original curves. 3. Calculate the correlation coefficient of two curves by multiplying and adding their numeric sequences (Figure 2(e)). If the sum of products at each frame is a large positive number, these two curves are considered as highly correlated. If the sum is a large negative number, they are highly negative correlated. And when the sum is close to zero, the two curves are regarded as having low correlation.
Figure 2 (e) is an example of correlation coefficient calculation between two curves using re-represented form. The numeric sequences of line A and line B are: LineA + + - - + - - + LineB + - + + - + + - By multiplying their corresponding coefficient and summing the value up, we get the result of -6. Because the possible minimal value is -8, we can conclude that these two curves are negative correlated. When the similarity coefficients of oint curves are calculated, we can choose a few curves which are least correlated for eyframes extraction. Local maximum and minimum value of selected curves are eyframes needed. In order to select them correctly, we smooth the curves by applying noise filter, which is introduced in following section. a. Original motion curve ( X axis is time; Y axis is angle). 2.3 Motion curve de-nosing In this step, a wavelet noise filter is applied to reduce noises of motion data. Many raw motion capture data contain considerable amount of noises. These noises bring difficulty in the process of extraction. So we need to clean the noise before woring on the data. In the field of signal processing, Fourier transform is often used to denoise one dimension signals. In our algorithm, we use wavelet transform instead of Fourier transform for this purpose. Practical experience has shown that for many applications, wavelet transforms are as powerful as Fourier transform, yet without some of the limitations of the latter. The de-noising procedure involves three steps: () Decompose Choose a wavelet; choose a level N. Compute the wavelet decomposition of the signal s at level N. (2) Threshold detail coefficients For each level from to N, select a threshold and apply a thresholding method to the detail coefficients. (3) Reconstruct Compute wavelet reconstruction using the original approximation coefficients of level N and the modified detail coefficients of levels from to N. There are three common thresholding methods as shown in figure 3: hard thresholding, soft thresholding and shrinage. In general, the sophisticated shrinage schemes are computationally very expensive, while the hard thresholding exhibits artefacts in some conditions. Soft thresholding is a good compromise between computational complexity and performance; we choose soft thresholding in curve de-noising. b. Curve transformed using Haar wavelet (Blue curve is transformed curve, green one is original curve). c. Increasing part is labeled as positive one; decreasing part is labeled as negative one. a. Hard thresholding. d. New form represented in sequence of + and -. b. Soft thresholding. e Calculate similarity between two curves using rerepresented form. Figure 2. Curves re-representation. c. Shrinage. Figure 3. Hard and soft thresholding and shrinage.
The selection of mother wavelet and parameters has direct effect on the performance of this eyframe extraction algorithm. The wavelet and parameters used in this paper are determined upon large amount of experiments on various motions. In this research, wavelet DB0 is used in de-noising oint curves. Figure 4 shows the result of de-noising using DB0 wavelet at different levels. The dashed line is original curve and the solid one is de-noised curve. Setting the decomposition level at leve5 resulted in a gross and smooth simulation of original curve. While setting the level at 2 generated a curve with more details. figure 6, the attac ic motion contains 32 oint curves. 5 eyframes are selected out of totally 02 frames original data. In figure 7, 2 eyframes are selected from 0 frames ump motion. a. DB0 level 5 Figure 6. 5 eyframes out of 02-frame attac-ic-motion. b. DB0 level 2 Figure 4. De-noise using DB 0 at different level. 2.4 Keyframe selection When a few least correlated oint curves are selected and smoothed, their local maximum and minimum points are extracted as eyframes. As shown in Figure 5, the two curves are selected oint curves. The dots are local maximum and minimum values. And the poses to which arrows pointed are eyframes. Figure 5. Local maximum and minimum points from selected motion curves are extracted as eyframes. 3. Conclusion We proposed the method of eyframe extraction from motion capture data using a oint curve re-representation algorithm and wavelet noise filter. We have demonstrated using this method proposed we can mae good motion approximation of original motion capture data. The eyframes extracted can cover almost all of the important features of original motion, and at the same time introduce only the minimum redundant eyframes. Figure 6 and figure 7 are illustrations of extracted eyframe sequences. In Figure 7. 2 eyframes out of 0-frame ump motion
References [] Kondo, K. Matsuda, K. 2004. Keyframes extraction method for motion capture data. Journal for Geometry and Graphics 08, 08 090. Proceedings of The 0th International Conference on Geometry and Graphics (2002). [2] Par, M. J and Shin, S. Y. 2004. Example-Based Motion Cloning. Computer Animation and Virtual Worlds 5, 3-4, pp.245-257. [3] Popivanov, I. & Miller, R. J. (2002). Similarity search over time series data using wavelets. In proceedings of the 8th Int'l Conference on Data Engineering. San Jose, CA, Feb 26-Mar. pp 22-22. [4] Uros Lotric, Andre Dobniar, Neural Networs with Wavelet Based Denoising Layer for Time Series Prediction, Neural Computing and Applications, 4, -7, 2005. [5] Xin Wei, Kunio Kondo. Keypose extraction method from motion capture data, ADADA 2005, Proceedings of the 3 rd annual conference of Asia Digital Art and Design Association. [6] Vadym Voznyu., Constrained Optimization in Motion Compression [7] Mallat, S. [989]. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-, pp. 674-693. [8] Rafael C. Gonzalez, Richard E. Woods. Digital Image Processing, Second edition. Prentice Hall.