Stylized synthesis of facial speech motions

Size: px
Start display at page:

Download "Stylized synthesis of facial speech motions"

Transcription

1 COMPUTER ANIMATION AND VIRTUAL WORLDS Comp. Anim. Virtual Worlds 2007; 18: Published online 26 June 2007 in Wiley InterScience ( Stylized synthesis of facial speech motions By Yuru Pei * and Hongbin Zha... Stylized synthesis of facial speech motions is central to facial animation. Most synthesis algorithms put emphasis on the reasonable concatenation of captured motion segments. The dynamic modeling of speech units, e.g. visemes and visyllables (the visual appearance of a syllable), has not drawn much attention. In this paper, we address the fundamental issues regarding the stylized dynamic modeling of visyllables. The decomposable generalized model is learnt for the stylized motion synthesis. The visyllable modeling includes two parts: (1) A dynamic model for each kind of visyllable that is learnt based on a Gaussian Process Dynamical Model; (2) A multilinear model based unified mapping between the high dimensional observation space and low dimensional latent space. The dynamic visyllable model embeds the high dimensional motion data, and constructs the dynamic mapping in the latent space simultaneously. To generalize the visyllable model from several instances, the mapping coefficient matrices are assembled to a tensor, which is decomposed into independent modes, e.g. identity and uttering styles. Therefore, with the linear combination of components in each mode, the novel stylized motions can be synthesized. Copyright 2007 John Wiley & Sons, Ltd. Received: 15 May 2007; Accepted: 17 May 2007 KEY WORDS: speech animation; visyllable dynamic model; stylized synthesis; decomposable generalized mapping Introduction Synthesizing stylized speech motions is important to realistic human facial animation. Currently, the 3D facial motions can be captured real-time with opticalbased devices. Given the large volume of dataset, most 3D speech motion synthesis algorithms are based on the concatenation of captured motion segments and the variable transition models 1 5 to handle the coarticulation. Hitherto, little efforts addressed the inside dynamic nature of the pronunciation unit, e.g., what dynamics inside the visyllable is and how the facial shape varies throughout one visyllable. In this paper, we address the dynamic model specific to each kind of visyllable. Moreover, a decomposable generalized mapping between the low dimensional latent space and the high dimensional observation is generated based on the *Correspondence to: Y. Pei, State Key Laboratory of Machine Perception, Science Building 2, Peking University, No. 5 Yiheyuan Road, Haidian District, Beijing, China. peiyuru@cis.pku.edu.cn multilinear model. The work is based on an assumption that the stylized mapping can be decomposed as a combination of independent components. The attributes that parameterize the mapping space is computed via the N-mode Singular Value Decomposition (SVD). The novel stylized visyllable is synthesized with the linear combination of decomposed mode components. Our framework incorporates the speech motion dynamics into the visyllable model. As illustrated in Figure 1, first, the GPDM is learnt for every visyllable instance. Then a generalized mapping is generated based on a multilinear model. Finally, the stylized motions are synthesized based on the linear component combination and the mean prediction in the visyllable dynamic model. The main idea of this paper is to represent the visyllable dynamics based on a Gaussian Process. The contextdependent speaking motion transitions are largely incorporated inside the visyllable models. The model consists of the explicit mapping function for the temporal transition and the low dimensional embedding. A generalized uttering style mapping is constructed based on a tensor-product of independent mode components.. Copyright 2007 John Wiley & Sons, Ltd.

2 Y. PEI AND H. ZHA Figure 1. An overview of stylized speech synthesis. The stylized motion synthesis is achieved with the mean prediction in the low dimensional space and the stylized mapping to the high dimensional observation space. Related Work Facial speech motions have highly repetitive motion patterns, clear meanings (uttering words), complex physical-driven mechanisms (the joint work of a set of muscles), and high dimensional representations. The motion synthesis is mainly on the concatenation of visual units. 2 4,6,7 Ezzat et al. 2 employ a variant of Multidimensional Morphable Model (MMM) to embed mouth configurations. HMM is employed to model the probabilistic state machine for speech animation. 6 The dynamic function of phonemes is presented in Reference [8]. Cao et al. 9 generate a data structure called Anime Graph to encapsulate a facial shape database along with the speech information. The Radial Basis Function (RBF) network is used to map the Mocap speech data to 3D facial models, 1 in which the EM-PCA is used to learn the co-articulation model and find the expressive eigen space from Mocap data. Hitherto, the dynamics inside speech units has not drawn much attention. We propose the framework to incorporate the speech dynamics in the visyllable model. Our approach is similar to Li et al. s 10 Motion texture. In their work, the repetitive motion pattern is modeled with the Linear Dynamic System (LDS), and a separate embedding mechanism is introduced to deal with the high dimensional motion data. Instead, we provide an integrated framework for the motion embedding and the dynamic modeling, along with a generalized style mapping. Motion analysis has drawn attention for many years. The motion data can be obtained from videos, motion capture devices, and real-time 3D scanners. Due to the high dimensionality and the complex transition dynamics of the data, the dimensional reduction and the dynamics modeling are the major issues in this field. Most work separates the data embedding and the dynamics modeling. 3,4,7,10 12 First, linear and non-linear embedding techniques are used to acquire the low dimensional representation, e.g. the PCA 3 and the graphbased non-linear dimensionality reduction methods such as the Isomap 7,12 and the LLE. 11 And then, the dynamic models are generated in the low dimensional space by different algorithms, e.g. the HMM, 4,6 the LDS, 10 and the GMM. 12 In the linear embedding, such as the PCA and the ICA, the mapping between the observations and latent variables is constructed explicitly, whereas it is not the case in many non-linear embedding methods. Rahimi et al. 13 learn the semi-supervised embedding through the RBF regression with Newton dynamics in the low dimensional space. However, probabilistic transitions are not embodied. Lawrence 14 proposes an unsupervised probabilistic embedding with a Gaussian Process as the GPLVM, which has been used in the stylized motion synthesis. 15 In the GPLVM, the mapping function is kernelized with the RBF. With the Gaussian prior, the embedding can be marginalized to yield multivariate Gaussian data likelihoods. Wang et al. 16 propose the GPDM, which incorporates the dynamic model in the latent space and is a dynamic extension of the GPLVM. A modified GPDM 17 has been used in the 3D people tracking. In this paper, we employ the GPDM to model the visyllable dynamics. The bilinear and multilinear models have been employed in computer vision and computer graphics. 11,18 21 The attributes with effects on the appearance are decoupled. The content and style separation based on the bilinear model has been used in the analysis of the gait and face image ensemble. Vasilescu 19 uses the multilinear model to decompose the face image data into different components: the head position, the illumination, and the expression. In the 3D face analysis, 20 the bilinear and trilinear models are used to separate the components related to identities, expressions, and visemes. In this paper, a generalized mapping based on the multilinear model is proposed to separate the orthogonal speaking style components.. Copyright 2007 John Wiley & Sons, Ltd. 518 Comp. Anim. Virtual Worlds 2007; 18:

3 FACIAL SPEECH MOTIONS Visyllable Model Learning Visyllable Feature Vectors The speech motions in the paper are confined to the subregion of the human face, including the lip, the chin, and the cheek, which has direct relations to pronunciations. Every visyllable is represented as a set of marker trajectories. The feature vector y i = [ p i, v i ] is used to define every frame in the sequence with n markers. p i is the 3D coordinates vector of the facial markers, v i the marker velocity being the difference of two consecutive frames as v i = p i p i 1. The visyllable with m frames is represented as a [m 6n] matrix with each row being the feature vector y i. Visyllable Model The high dimensional visyllable data is formidable to process. The dimensionality of motion data has to be reduced for the dynamic model learning. Instead of separating the dimensionality reduction and the dynamic model learning, the GPDM computes the dynamic mapping while finding the low dimensional latent variables. Thus, the dynamics modeling and the dimensionality reduction are integrated into the same framework. The smooth probabilistic density function of the latent variables is generated. The mapping between the high dimensional feature vector y t and the latent variable x t and the dynamic mapping in the latent space are both kernelized with the RBF. The dynamic model is defined as: the likelihood: L = ln p(x, Y, ᾱ, β) = λ d 2 ln K X tr ( KX 1 X OUTXOUT T ) + ln α j m ln W + D 2 ln K Y tr ( KY 1 YW 2 Y T ) + j j ln β j where K Y and K X are the kernel matrices of the mapping and the dynamic model. The matrix elements are defined with the RBF and the affine transformation (Appendix). To every visyllable instance, the mapping f Y (x) from the high dimensional motion data to the low dimensional representations is computed along with the dynamic mapping f X (x) over latent variables as: (3) f Y (x) = µ Y + Ȳ T K 1 Y k Y (x) (4) f X (x) = X T 1 OUT K X k(x) (5) where µ Y = 1 m m i y i and µ X = 1 m m i x i are the mean of the training data. Ȳ is the observations with the mean subtracted, and X OUT is the output of the dynamic mapping. As shown in Figure 2, one point in the latent space is corresponding to a face configuration represented by the feature vector. The warm color is related to the small x t = i a i φ i x t 1 + n x,t (1) The mapping between the high dimensional feature vector and the latent variable is defined as: y t = j b j ϕ j x t + n y,t (2) The weights A ={a i },B={b j } and the basis functions ϕ j, φ i are used to define the non-linear mapping and the dynamics in the latent space, where n x,t and n y,t are the zero-mean white Gaussian noise. The learning process is to solve the latent variables and hyper-parameters in the kernel functions. With Gaussian priors on the weights, the parameters can be computed by minimizing the negative log-posterior of Figure 2. The reconstruction variance map of a visyllable in the latent space.. Copyright 2007 John Wiley & Sons, Ltd. 519 Comp. Anim. Virtual Worlds 2007; 18:

4 Y. PEI AND H. ZHA Figure 3. A Trajectory variations of a marker (y-direction) due to variations in stress on the left and speed on the right. Neutral speech is the green dashed line for comparison. reconstruction variance and the cold color to the large reconstruction variance. The color configurations are the same to all the figures in this paper. With the explicit mapping functions, the new motion sequence can be predicted given an initial pose. Style Separation in a Generalized Model Due to the inter-personal differences of uttering styles, the instances of the same visyllable may take on different appearances. Moreover, uttering the syllable with different speeds and stresses can cause various appearances even for the same person. Figure 3 shows stress and speed variations on marker trajectories. The goal is to build a generalized model to accommodate all the instances. One institutive method is to concatenate all instances together and feed them to the visyllable model learning. Due to the style discrepancy, the difference between instances is larger than that inside the visyllable as shown in Figure 4. The embeddings of different instances in the latent space are far away from each other. Our system employs an alternative method. First, the dynamic model specific to every instance is learnt. Second, the mapping coefficients of all models are assembled to a tensor, and apply the N-mode SVD for the style separation. A generalized embedding has to be constructed before the tensorization of the coefficient matrices. We select an instance with the neutral style as the reference. The latent embedding of the reference is chosen as the initial value for the other instance learning. In this way, the latent variables have similar values in the embedding space as shown in Figure 4. The mapping between the latent space and the input motions is rewritten as: Figure 4. The upper row is the embedding of two instances with different styles. The bottom is the generalized embedding of all instances of a visyllable. where B0 i = µi Y and B 1 = Ȳ T K 1 Y ki Y (x )/ky ref(x ). N is the instance number of one visyllable. ky i (x )/ky ref(x ) is the ratio between the ith kernel matrix and the reference on the ith instance s latent variables. After the model learning for all instances, the mapping coefficients B1 i in f Y (x) are assembled orderly into a tensor with respect to the identity and the uttering styles. The independent mode components are separated by the tensor decomposition. The concise description of the multilinear algebra can be found in References [19,20]. The tensor is decomposed as the mode product of the core tensor Z with a set of mode matrices U. T = Z 1 U ID 2 U SA 3 U MC (7) The core tensor Z includes the structure information and controls the interaction between mode matrices. U ID is the mode matrix to the identity, and U SA to the uttering style. U MC is to the mapping coefficient B 1. The tensor is decomposed as the N-mode product of an orthogonal space related to different attributes. The core tensor is computed as: f i Y (x) = Bi 0 + Bi 1 kref Y (x), i = 1,...,N (6) Z = T 1 U T ID 2 U T SA 3 U T MC (8). Copyright 2007 John Wiley & Sons, Ltd. 520 Comp. Anim. Virtual Worlds 2007; 18:

5 FACIAL SPEECH MOTIONS The mode matrices U ID and U SA can be seen as independent of the speaking content, and compose the uttering styles. As we can see, the dynamic mappings in the latent space are similar in a generalized visyllable model. However, the mappings between the latent space and the high dimensional observation space are various and are formulated as the tensor-product of different style modes. The generalized mapping enables the linear combination of style factors for the synthesis of a novel style. Motion Synthesis Given a syllable script, the motion synthesis is to build the 3D trajectories of facial markers. We assume the initial pose y 0 is predefined, e.g. the neutral facial configuration. The goal is to generate a complete motion sequence of a visyllable given an initial pose. As the probability distribution of latent variables is learnt, the embedded motion in the latent space can be reconstructed with the dynamic mapping f X (x). And then, the high dimensional motions in the observation space are computed via the stylized tensor-product. Visyllable Motion Prediction in the Latent Space Given an initial pose in the latent space, the visyllable trajectory can be constructed with the dynamic mapping function f X (x). This is by the Gaussian sampling with the zero-mean noise added. However, when the initial pose diverge from the training data, the mean prediction by the dynamic mapping will produce a sequence with a large reconstruction variance and drift from the training data. In order to reduce the uncertainty in the prediction, an optimization is introduced to minimize the objective function L1(x, y) related to the likelihood of the new data. (x f X (x)) 2 L1(x, y) = 2σX 2 (x) + d 2 ln σ2 X (x) (9) where σx 2 (x) is the variance to the dynamic mapping, d the dimension of the latent space. Style Interpolation Synthesizing a novel style is an interesting issue in the motion synthesis. The log space interpolation scheme has been used in the style based IK. 15 In our system, with the multilinear model, the mapping in Equation (4) is formulated as: f Y (x) = B i 0 + Z 1 U ID 2 U SA 3 U MC k ref Y (x) (10) N i=1 The novel style is synthesized with the tensor-product of the style mode matrices. Each mode matrix can be composed as a linear combination of components. Based on the component interpolation, the stylized mapping in the space spanned by the tensor mode components can be obtained. Experiments Two subjects are asked to pronounce each syllable with the neutral style three times, at three speeds, and with three stresses. Thus, for each visyllable, there are 18 instances. The motion-captured data are aligned in the preprocessing. To the motion data of every syllable, a dynamic model is learnt. Some visyllable dynamic models used in the experiments are shown in Figure 5. A generalized model is constructed with a Figure 5. Some English and Mandarin visyllable dynamic models in the experiments. The three dimensional latent spaces are shown with the reconstruction variance plotted. The circled blue line is the latent variables of the reference instance, while the triangled magenta line is the latent variables of the synthesized sequences.. Copyright 2007 John Wiley & Sons, Ltd. 521 Comp. Anim. Virtual Worlds 2007; 18:

6 Y. PEI AND H. ZHA Visual Speech Motion Synthesis With the first pose specified, the motion sequence to a syllable can be predicted by the dynamic mapping in the latent space. The optimization in the motion synthesis phase makes the synthesis results consistent with the training data. The synthesis results are shown in Figures 5 and 6 and the accompanying videos. Stylized Motion Synthesis Figure 6. The comparison of three synthesized marker trajectories (y-direction) against the ground truth in the Euclidean coordinate system. The positions of three markers are shown on the left image. decomposable mapping between the latent space and the high dimensional observation space. When assembling the tensor, the mapping coefficients are put into the same size via the time warping. The length of the coefficient matrix in the first visyllable dynamic model is selected as the reference for the time warping. The time scaling parameter is defined as: r t = m i /m ref, where m i is the frame number of the current instance and m ref the frame number of the reference visyllable. With the N-mode SVD, the components corresponding to each mode are computed as shown in Figure 7, where the first row is the coefficient vectors of uttering mode components, and the first column is the coefficient vectors of identity mode components. The central part of Figure 7 is the facial configurations to each mode component in a visyllable model. The 16th frame in each motion sequence is shown. The trajectories of one lower lip marker are plotted with respect to each identity. We linearly combine the identity mode components UID i and uttering mode components Ui SA for the synthesis of a novel style as: U ID = α U U 1 ID + (1 αu )U 2 ID 9 U SA = β U i Ui SA i=1 Figure 7. The coefficient vectors of mode components.. Copyright 2007 John Wiley & Sons, Ltd. 522 Comp. Anim. Virtual Worlds 2007; 18:

7 FACIAL SPEECH MOTIONS Figure 8. The reference and stylized synthesis sequences. The first row is the reference motion sequence. The lower three rows are the stylized synthesis motion sequences of different persons. where 0 α U 1, 0 β U i 1, 9 i=1 βu i = 1. The motion sequences are computed by the stylized mapping in Equation (10). Figure 8 shows the visyllable motion synthesis with the identity mode remaining constant. Figure 9 shows the synthesized motions with the tensor-product of the modified identity and uttering modes, where the first and the second columns are the coefficient vectors of uttering mode components and Figure 9. Stylized visyllable motion synthesis.. Copyright 2007 John Wiley & Sons, Ltd. 523 Comp. Anim. Virtual Worlds 2007; 18:

8 Y. PEI AND H. ZHA identity components respectively. The third column is the synthesized motion sequences. Due to the time warping in the generalized model learning, the factor r t is used to scale the synthesized motion sequence. Comparison With the Ground Truth For the evaluation, we compare the motion-captured data of the test sentence with the synthesis results. As shown in Figure 6, the synthesis results can follow the trajectories of the ground truth. However, in some parts there are apparent discrepancies as circled out. They are mainly on the short pause of the syllable scripts. Thus, the system could not reconstruct the various motions in the pause segments. In the complete motion sequence, people tend to care little to the mouth shape when there is no audio signal. Therefore, even with these discrepancies, the synthesis results show consistent visual appearances to the input syllable scripts. Mapping the Motion Data to a Novel 3D Face The mapping of the motion-captured data to a novel 3D face is not a major issue of this paper. We employ a method similar to Deng s 1 Blendshape Face. The RBF network is trained for the mapping between motioncaptured markers and shape blending coefficients. However, in our system, the prototypes are determined automatically with the Isomap embedding and the clustering in the low dimensional space as described in the speech motion transferring system. 7 The blending coefficients G(g 0,g 1,...,g m ) to the training data of the RBF network are computed via the L2 distance to the clustering centroids. The mapping between the latent variable x j and the blending coefficients G is computed with the RBF regression. G = Q(x j ) = i=1 w i h ( xj c i ) where x j is a 3D latent variable, c i the training data. h(r) = r 2 + e 2 is the Hardy multiquadrics function, e the stiffness constant to regulate the effects of feature points. The mapping results to a 3D face model are shown in Figure 10, where the first row is the ground truth, the second and the third rows show the synthesized marker set and the feature mesh of corresponding frames. Figure 10. Synthesized facial motions. Conclusion and Future Work In this paper, we present a dynamic visyllable model to represent the speech motion data. We propose a stylized motion synthesis method based on the learnt probability density function over facial configurations, and the stylized synthesis is achieved by the generalized mapping from the observation to the latent space. As we can see, the number of the visyllables is comparably large, e.g. approximately 400 in Mandarin, and some 900 demi-visyllables 3 in English. That means large number of visyllable models need to be learnt. It is intuitive to classify the visyllables. Hitherto the visyllable classification is via the syllable definition. We think the shape analysis of visyllables should draw more attention. In order to classify the visyllable according to its own visual appearance, a reasonable similarity description should be defined to account for the style variations. Moreover, the similarity description should be robust in. Copyright 2007 John Wiley & Sons, Ltd. 524 Comp. Anim. Virtual Worlds 2007; 18:

9 FACIAL SPEECH MOTIONS case of only partial matching between visyllables. The automatic visyllable annotation and the classification might be interesting to pursue in the future research. The model is learnt by minimizing the negative logposterior in Equation (3). Appendix GPDM Learning The likelihood of motion-captured data is modeled with the GPDM, which is a latent variable dynamic model. From Bayesian s view, the weights A ={a i },B={b j } of the basis functions can be marginalized. Given the Gaussian prior on a i and b i, the marginalization on A and B will produce the multivariate Gaussian data likelihood in the following forms: p(y X, β) = p(x ᾱ) = W m ( (2π) md K Y exp 1 D 2 tr ( K 1 Y YW 2 Y )) T ( p(x 1 ) exp (2π) 1 (m 1)d K X d 2 tr ( K 1 X X ) ) OUTX T OUT where W m is the scaling factors to different dimension of observations. K Y and K X are the kernel matrices. The elements in the kernel matrices are defined with the RBF and the affine transformation as: ( k Y (x, x ) = β 1 exp β 2 2 ) x x 2 + δ x,x β 3 ( k X (x, x ) = α 1 exp α 2 x x 2) + α 3 x T x + δ x,x 2 α 4 β ={β 1,...,W} is the hyper-parameters of kernel functions, with β 1, β 2 the parameters on the scale and the inverse width of the basis functions. β 3 is related to the noise. X OUT = (x 2,...,x m ) is the output of the dynamic model, with input as X = (x 1,...,x m 1 ),where m is the number of training data. Hyper-parameters α 1,α 2 are the scale and the inverse width of the RBF basis function. The linear term is included in the kernel function of the dynamic model with the coefficient α 3. α 4 is related to the noise. Moreover, the priors are placed on the hyperparameters of the mapping and the dynamic models with p(ᾱ) i α 1 i and p( β) i β 1 i to avoid overfitting. Thus, the generalized model of the observation with the prior, the mapping, and the dynamics is as follows: p(x, Y, ᾱ, β) = p(y X, β)p(x ᾱ)p(ᾱ)p( β) ACKNOWLEDGEMENTS The authors would like to thank Neil Lawrence for his publicly available source code of the GPLVM. They would also like to thank the anonymous referees for the useful suggestions for improving this paper. This work was supported in part by the National Science Foundation (China) grants NSF and National 973 Research grants 2004CB References 1. Deng Z, Neumann U, Lewis JP, Kim TY, Bulut M, Narayanan S. Expressive facial animation synthesis by learning speech co-articultion and expression spaces. IEEE Transactions on Visualization and Computer Graphics 2006; 12(6): Ezzat T, Geiger G, Poggio T. Trainable videorealistic speech animation. ACM Transactions on Graphics 2002; 21(3): Kshirsagar S, Thalmann NM. Visyllable based speech animation. Computer Graphics Forum 2003; 22(3): Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In ACM SIGGRAPH 97 Proceedings, August 1997; pp Ma J, Cole R, Pellom B, Ward W, Wise B. Accurate visible speech synthesis based on concatenating variable length motion capture data. IEEE Transactions on Visualization and Computer Graphics 2006; 12(2): Brand M. Voice puppetry. In ACM SIGGRAPH 99 Proceedings, 1999; pp Pei Y, Zha H. Transferring of speech movements from video to 3d face space. IEEE Transactions on Visualization and Computer Graphics 2007; 13(1): King SA, Parent RE. Accurate visible speech synthesis based on concatenating variable length motion capture data. IEEE Transactions on Visualization and Computer Graphics 2005; 11(3): Cao Y, Tien W, Faloutsos C, Pighin P. Expressive speechdriven facial animation. ACM Transactions on Graphics 2005; 24(4): Li Y, Wang T, Shum H. Motion texture: a two-level statistical model for character motion synthesis. In ACM SIGGRAPH 02 Proceedings, August 2002; pp Elgammal A, Lee C. Separating style and content on a nonlinear manifold. In CVPR 04 Proceedings, July 2004; pp Wang Q, Xu G, Ai H. Learning object intrinsic structure for robust visual tracking. In CVPR 03 Proceedings, June 2003; pp Rahimi A, Recht B, Darrell T. Learning appearance manifolds from video. In CVPR 05 Proceedings, June 2005; pp Lawrence ND. Gaussian process latent variable models for visualization of high dimensional data. In NIPS Proceedings, 2004; pp Copyright 2007 John Wiley & Sons, Ltd. 525 Comp. Anim. Virtual Worlds 2007; 18:

10 Y. PEI AND H. ZHA 15. Grochow K, Martin S, Hertzmann A, Popovic Z. Style based inverse kinematics. In ACM SIGGRAPH 04 Proceedings, August 2004; pp Wang J, Fleet DJ, Hertzmann A. Gaussian process dynamical models. In NIPS Proceedings, 2005; pp Urtasun R, Fleet DJ, Fua P. 3d people tracking with gaussian process dynamical models. In CVPR 06 Proceedings, 2006; pp Tenenbaum J, Freeman WT. Separating style and content with bilinear models. Neural Computation 2000; 12: Vasilescu MAO, Terzopoulos D. Multilinear subspace analysis of image ensembles. In CVPR 03 Proceedings, June 2003; pp Vlasic D, Brand M, Pfister H, Popovic J. Face transfer with multilinear models. ACM Transactions on Graphics 2005; 24(3): Wang Y, Huang X, Lee C, et al. High resolution acquisition, learning and transfer of dynamic 3d facial expressions. Computer Graphics Forum 2004; 23(3): Authors biographies: Yuru Pei received her Ph.D. in Computer Science from Peking University, China, in 2006, and M.S. in Computer Science from Zhejiang University in 2003, and B.S. in Computer Science from Central South University in She is now an Assistant Professor in the State Key Laboratory of Machine Perception, Peking University. Her research is mainly in character animation, especially on facial animation of speech and expression. Moreover, she also works on craniofacial reconstruction. Hongbin Zha received his B.E. in Electrical Engineering from Hefei University of Technology, China, in 1983, and M.S. and Ph.D. in Electrical Engineering from Kyushu University, Japan, in 1987 and 1990, respectively. After working as a Research Associate in the Department of Control Engineering and Science, Kyushu Institute of Technology, Japan, he joined Kyushu University in 1991 as an Associate Professor. He was also a Visiting Professor in Centre for Vision, Speech and Signal Processing, Surrey University, UK, in Since 2000, he has been a Professor in the State Key Laboratory of Machine Perception, Peking University, Beijing, China. His research interests include computer vision, 3D geometric modeling, digital museums, and robotics. He has published over 140 technical publications in various journals, book, and international conference proceedings. Dr Zha received the Franklin V. Taylor Award from the IEEE Systems, Man and Cybernetics Society in Copyright 2007 John Wiley & Sons, Ltd. 526 Comp. Anim. Virtual Worlds 2007; 18:

The Role of Manifold Learning in Human Motion Analysis

The Role of Manifold Learning in Human Motion Analysis The Role of Manifold Learning in Human Motion Analysis Ahmed Elgammal and Chan Su Lee Department of Computer Science, Rutgers University, Piscataway, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract.

More information

3D Human Motion Analysis and Manifolds

3D Human Motion Analysis and Manifolds D E P A R T M E N T O F C O M P U T E R S C I E N C E U N I V E R S I T Y O F C O P E N H A G E N 3D Human Motion Analysis and Manifolds Kim Steenstrup Pedersen DIKU Image group and E-Science center Motivation

More information

Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds

Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds Ahmed Elgammal a,,, Chan-Su Lee b a Department of Computer Science, Rutgers University, Frelinghuysen Rd,

More information

Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics

Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics Department of Computer Science The University of British Columbia duanx@cs.ubc.ca, lili1987@cs.ubc.ca Abstract

More information

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation Chan-Su Lee and Ahmed Elgammal Rutgers University, Piscataway, NJ, USA {chansu, elgammal}@cs.rutgers.edu Abstract. We

More information

Motion Interpretation and Synthesis by ICA

Motion Interpretation and Synthesis by ICA Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional

More information

Using GPLVM for Inverse Kinematics on Non-cyclic Data

Using GPLVM for Inverse Kinematics on Non-cyclic Data 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 Using

More information

Unsupervised Learning for Speech Motion Editing

Unsupervised Learning for Speech Motion Editing Eurographics/SIGGRAPH Symposium on Computer Animation (2003) D. Breen, M. Lin (Editors) Unsupervised Learning for Speech Motion Editing Yong Cao 1,2 Petros Faloutsos 1 Frédéric Pighin 2 1 University of

More information

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA

More information

Tracking People on a Torus

Tracking People on a Torus IEEE TRANSACTION ON PATTERN RECOGNITION AND MACHINE INTELLIGENCE (UNDER REVIEW) Tracking People on a Torus Ahmed Elgammal Member, IEEE, and Chan-Su Lee, Student Member, IEEE, Department of Computer Science,

More information

Style-based Inverse Kinematics

Style-based Inverse Kinematics Style-based Inverse Kinematics Keith Grochow, Steven L. Martin, Aaron Hertzmann, Zoran Popovic SIGGRAPH 04 Presentation by Peter Hess 1 Inverse Kinematics (1) Goal: Compute a human body pose from a set

More information

Shape and Expression Space of Real istic Human Faces

Shape and Expression Space of Real istic Human Faces 8 5 2006 5 Vol8 No5 JOURNAL OF COMPU TER2AIDED DESIGN & COMPU TER GRAPHICS May 2006 ( 0087) (peiyuru @cis. pku. edu. cn) : Canny ; ; ; TP394 Shape and Expression Space of Real istic Human Faces Pei Yuru

More information

Separating Style and Content on a Nonlinear Manifold

Separating Style and Content on a Nonlinear Manifold CVPR 4 Separating Style and Content on a Nonlinear Manifold Ahmed Elgammal and Chan-Su Lee Department of Computer Science, Rutgers University, New Brunswick, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract

More information

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Norimichi Ukita 1 and Takeo Kanade Graduate School of Information Science, Nara Institute of Science and Technology The

More information

Alternative Statistical Methods for Bone Atlas Modelling

Alternative Statistical Methods for Bone Atlas Modelling Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Data-Driven Face Modeling and Animation

Data-Driven Face Modeling and Animation 1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,

More information

Gaussian Process Dynamical Models

Gaussian Process Dynamical Models DRAFT Final version to appear in NIPS 18. Gaussian Process Dynamical Models Jack M. Wang, David J. Fleet, Aaron Hertzmann Department of Computer Science University of Toronto, Toronto, ON M5S 3G4 jmwang,hertzman

More information

Registration of Expressions Data using a 3D Morphable Model

Registration of Expressions Data using a 3D Morphable Model Registration of Expressions Data using a 3D Morphable Model Curzio Basso, Pascal Paysan, Thomas Vetter Computer Science Department, University of Basel {curzio.basso,pascal.paysan,thomas.vetter}@unibas.ch

More information

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Sebastian Bitzer, Stefan Klanke and Sethu Vijayakumar School of Informatics - University of Edinburgh Informatics Forum, 10 Crichton

More information

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

VISEME SPACE FOR REALISTIC SPEECH ANIMATION VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic

More information

Gaussian Process Dynamical Models

Gaussian Process Dynamical Models Gaussian Process Dynamical Models Jack M. Wang, David J. Fleet, Aaron Hertzmann Department of Computer Science University of Toronto, Toronto, ON M5S 3G4 jmwang,hertzman@dgp.toronto.edu, fleet@cs.toronto.edu

More information

Graphical Models for Human Motion Modeling

Graphical Models for Human Motion Modeling Graphical Models for Human Motion Modeling Kooksang Moon and Vladimir Pavlović Rutgers University, Department of Computer Science Piscataway, NJ 8854, USA Abstract. The human figure exhibits complex and

More information

The accuracy and robustness of motion

The accuracy and robustness of motion Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data Qing Li and Zhigang Deng University of Houston The accuracy and robustness of motion capture has made it a popular technique for

More information

Synthesis and Editing of Personalized Stylistic Human Motion

Synthesis and Editing of Personalized Stylistic Human Motion Synthesis and Editing of Personalized Stylistic Human Motion Jianyuan Min Texas A&M University Huajun Liu Texas A&M University Wuhan University Jinxiang Chai Texas A&M University Figure 1: Motion style

More information

Facial Expression Analysis using Nonlinear Decomposable Generative Models

Facial Expression Analysis using Nonlinear Decomposable Generative Models Facial Expression Analysis using Nonlinear Decomposable Generative Models Chan-Su Lee and Ahmed Elgammal Computer Science, Rutgers University Piscataway NJ 8854, USA {chansu, elgammal}@cs.rutgers.edu Abstract.

More information

Generating Different Realistic Humanoid Motion

Generating Different Realistic Humanoid Motion Generating Different Realistic Humanoid Motion Zhenbo Li,2,3, Yu Deng,2,3, and Hua Li,2,3 Key Lab. of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing

More information

3D Lip-Synch Generation with Data-Faithful Machine Learning

3D Lip-Synch Generation with Data-Faithful Machine Learning EUROGRAPHICS 2007 / D. Cohen-Or and P. Slavík (Guest Editors) Volume 26 (2007), Number 3 3D Lip-Synch Generation with Data-Faithful Machine Learning Ig-Jae Kim 1,2 and Hyeong-Seok Ko 1 1 Graphics and Media

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

THE capability to precisely synthesize online fullbody

THE capability to precisely synthesize online fullbody 1180 JOURNAL OF MULTIMEDIA, VOL. 9, NO. 10, OCTOBER 2014 Sparse Constrained Motion Synthesis Using Local Regression Models Huajun Liu a, Fuxi Zhu a a School of Computer, Wuhan University, Wuhan 430072,

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches XU, CUI, ZHAO, ZHA: TRACKING GENERIC HUMAN MOTION VIA FUSION 1 Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches Yuandong Xu xuyd@cis.pku.edu.cn Jinshi Cui cjs@cis.pku.edu.cn

More information

Coupled Visual and Kinematic Manifold Models for Tracking

Coupled Visual and Kinematic Manifold Models for Tracking Int J Comput Vis (2010) 87: 118 139 DOI 10.1007/s11263-009-0266-5 Coupled Visual and Kinematic Manifold Models for Tracking C.-S. Lee A. Elgammal Received: 13 January 2008 / Accepted: 29 June 2009 / Published

More information

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Faces and Image-Based Lighting

Faces and Image-Based Lighting Announcements Faces and Image-Based Lighting Project #3 artifacts voting Final project: Demo on 6/25 (Wednesday) 13:30pm in this room Reports and videos due on 6/26 (Thursday) 11:59pm Digital Visual Effects,

More information

A Generative Model for Motion Synthesis and Blending Using Probability Density Estimation

A Generative Model for Motion Synthesis and Blending Using Probability Density Estimation A Generative Model for Motion Synthesis and Blending Using Probability Density Estimation Dumebi Okwechime and Richard Bowden University of Surrey, Guildford, Surrey, GU2 7XH, UK {d.okwechime,r.bowden}@surrey.ac.uk

More information

Speech-Driven Facial Animation Using A Shared Gaussian Process Latent Variable Model

Speech-Driven Facial Animation Using A Shared Gaussian Process Latent Variable Model Speech-Driven Facial Animation Using A Shared Gaussian Process Latent Variable Model Salil Deena and Aphrodite Galata School of Computer Science, University of Manchester Manchester, M13 9PL, United Kingdom

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Dimensionality Reduction and Generation of Human Motion

Dimensionality Reduction and Generation of Human Motion INT J COMPUT COMMUN, ISSN 1841-9836 8(6):869-877, December, 2013. Dimensionality Reduction and Generation of Human Motion S. Qu, L.D. Wu, Y.M. Wei, R.H. Yu Shi Qu* Air Force Early Warning Academy No.288,

More information

Hierarchical Gaussian Process Latent Variable Models

Hierarchical Gaussian Process Latent Variable Models Neil D. Lawrence neill@cs.man.ac.uk School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, U.K. Andrew J. Moore A.Moore@dcs.shef.ac.uk Dept of Computer

More information

Visual Speech Synthesis by Modelling Coarticulation Dynamics using a Non-Parametric Switching State-Space Model

Visual Speech Synthesis by Modelling Coarticulation Dynamics using a Non-Parametric Switching State-Space Model Visual Speech Synthesis by Modelling Coarticulation Dynamics using a Non-Parametric Switching State-Space Model Salil Deena, Shaobo Hou and Aphrodite Galata School of Computer Science University of Manchester,

More information

Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints

Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints Stephan Al-Zubi and Gerald Sommer Cognitive Systems, Christian Albrechts University, Kiel, Germany Abstract. We propose

More information

Differential Structure in non-linear Image Embedding Functions

Differential Structure in non-linear Image Embedding Functions Differential Structure in non-linear Image Embedding Functions Robert Pless Department of Computer Science, Washington University in St. Louis pless@cse.wustl.edu Abstract Many natural image sets are samples

More information

Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models

Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models Zhigang Deng J.P. Lewis Ulrich Neumann Computer Graphics and Immersive Technologies Lab Department of Computer Science, Integrated

More information

Scaled Functional Principal Component Analysis for Human Motion Synthesis

Scaled Functional Principal Component Analysis for Human Motion Synthesis Scaled Functional Principal Component Analysis for Human Motion Synthesis Han Du 1 Somayeh Hosseini 1 Martin Manns 2 Erik Herrmann 1 Klaus Fischer 1 1 German Research Center for Artificial Intelligence,

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Robotic Imitation from Human Motion Capture using Gaussian Processes

Robotic Imitation from Human Motion Capture using Gaussian Processes Robotic Imitation from Human Motion Capture using Gaussian Processes Aaron P. Shon Keith Grochow Rajesh P.N. Rao University of Washington Computer Science and Engineering Department Seattle, WA 98195 USA

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Articulated Structure from Motion through Ellipsoid Fitting

Articulated Structure from Motion through Ellipsoid Fitting Int'l Conf. IP, Comp. Vision, and Pattern Recognition IPCV'15 179 Articulated Structure from Motion through Ellipsoid Fitting Peter Boyi Zhang, and Yeung Sam Hung Department of Electrical and Electronic

More information

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Yafeng Yin, Hong Man, Jing Wang, Guang Yang ECE Department, Stevens Institute of Technology Hoboken,

More information

Transfer Facial Expressions with Identical Topology

Transfer Facial Expressions with Identical Topology Transfer Facial Expressions with Identical Topology Alice J. Lin Department of Computer Science University of Kentucky Lexington, KY 40506, USA alice.lin@uky.edu Fuhua (Frank) Cheng Department of Computer

More information

Hallucinating Faces: TensorPatch Super-Resolution and Coupled Residue Compensation

Hallucinating Faces: TensorPatch Super-Resolution and Coupled Residue Compensation Hallucinating Faces: TensorPatch Super-Resolution and Coupled Residue Compensation Wei Liu 1, Dahua Lin 1, and Xiaoou Tang 1, 2 1 Department of Information Engineering The Chinese University of Hong Kong,

More information

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Robust Model-Free Tracking of Non-Rigid Shape. Abstract Robust Model-Free Tracking of Non-Rigid Shape Lorenzo Torresani Stanford University ltorresa@cs.stanford.edu Christoph Bregler New York University chris.bregler@nyu.edu New York University CS TR2003-840

More information

K A I S T Department of Computer Science

K A I S T Department of Computer Science An Example-based Approach to Text-driven Speech Animation with Emotional Expressions Hyewon Pyun, Wonseok Chae, Yejin Kim, Hyungwoo Kang, and Sung Yong Shin CS/TR-2004-200 July 19, 2004 K A I S T Department

More information

Expression Transfer between Photographs through Multilinear AAM s

Expression Transfer between Photographs through Multilinear AAM s Expression Transfer between Photographs through Multilinear AAM s Ives Macêdo Emilio Vital Brazil Luiz Velho IMPA Instituto Nacional de Matemática Pura e Aplicada E-mail: {ijamj,emilio,lvelho}@visgraf.impa.br

More information

Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance

Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance Ahmed Elgammal Department of Computer Science, Rutgers University, Piscataway, NJ, USA elgammal@cs.rutgers.edu Abstract Our objective

More information

Motion Synthesis and Editing. Yisheng Chen

Motion Synthesis and Editing. Yisheng Chen Motion Synthesis and Editing Yisheng Chen Overview Data driven motion synthesis automatically generate motion from a motion capture database, offline or interactive User inputs Large, high-dimensional

More information

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS Yanghao Li, Jiaying Liu, Wenhan Yang, Zongg Guo Institute of Computer Science and Technology, Peking University, Beijing, P.R.China,

More information

IJCAI Dept. of Information Engineering

IJCAI Dept. of Information Engineering IJCAI 2007 Wei Liu,Xiaoou Tang, and JianzhuangLiu Dept. of Information Engineering TheChinese University of Hong Kong Outline What is sketch-based facial photo hallucination Related Works Our Approach

More information

Animation of 3D surfaces.

Animation of 3D surfaces. Animation of 3D surfaces Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible =

More information

Epitomic Analysis of Human Motion

Epitomic Analysis of Human Motion Epitomic Analysis of Human Motion Wooyoung Kim James M. Rehg Department of Computer Science Georgia Institute of Technology Atlanta, GA 30332 {wooyoung, rehg}@cc.gatech.edu Abstract Epitomic analysis is

More information

Analysis and Synthesis of Facial Expressions Using Decomposable Nonlinear Generative Models

Analysis and Synthesis of Facial Expressions Using Decomposable Nonlinear Generative Models Analysis and Synthesis of Facial Expressions Using Decomposable Nonlinear Generative Models Chan-Su Lee LED-IT Fusion Research Center and Department of Electronic Engineering Yeungnam University, Korea

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING

FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING Lijia Zhu and Won-Sook Lee School of Information Technology and Engineering, University of Ottawa 800 King Edward Ave., Ottawa, Ontario, Canada,

More information

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Video based Animation Synthesis with the Essential Graph Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Goal Given a set of 4D models, how to generate realistic motion from user specified

More information

Animating Lips-Sync Speech Faces with Compact Key-Shapes

Animating Lips-Sync Speech Faces with Compact Key-Shapes EUROGRAPHICS 2008 / G. Drettakis and R. Scopigno (Guest Editors) Volume 27 (2008), Number 3 Animating Lips-Sync Speech Faces with Compact Key-Shapes Submission id: paper1188 Abstract Facial animation is

More information

GOOD statistical models for human motion are important

GOOD statistical models for human motion are important IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 2, FEBRUARY 2008 283 Gaussian Process Dynamical Models for Human Motion Jack M. Wang, David J. Fleet, Senior Member, IEEE, and

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations 1 Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations T. Tangkuampien and D. Suter Institute for Vision Systems Engineering Monash University, Australia {therdsak.tangkuampien,d.suter}@eng.monash.edu.au

More information

Computer Vision II Lecture 14

Computer Vision II Lecture 14 Computer Vision II Lecture 14 Articulated Tracking I 08.07.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Outline of This Lecture Single-Object Tracking Bayesian

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Phase-Functioned Neural Networks for Motion Learning

Phase-Functioned Neural Networks for Motion Learning Phase-Functioned Neural Networks for Motion Learning TAMS University of Hamburg 03.01.2018 Sebastian Starke University of Edinburgh School of Informatics Institue of Perception, Action and Behaviour Sebastian.Starke@ed.ac.uk

More information

Animating Blendshape Faces by Cross-Mapping Motion Capture Data

Animating Blendshape Faces by Cross-Mapping Motion Capture Data To appear in the Proc. of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2006 (I3DG), March 2006 Animating Blendshape Faces by Cross-Mapping Motion Capture Data Zhigang Deng Pei-Ying Chiang

More information

Occluded Facial Expression Tracking

Occluded Facial Expression Tracking Occluded Facial Expression Tracking Hugo Mercier 1, Julien Peyras 2, and Patrice Dalle 1 1 Institut de Recherche en Informatique de Toulouse 118, route de Narbonne, F-31062 Toulouse Cedex 9 2 Dipartimento

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Priors for People Tracking from Small Training Sets Λ

Priors for People Tracking from Small Training Sets Λ Priors for People Tracking from Small Training Sets Λ Raquel Urtasun CVLab EPFL, Lausanne Switzerland David J. Fleet Computer Science Dept. University of Toronto Canada Aaron Hertzmann Computer Science

More information

Animating Lip-Sync Characters

Animating Lip-Sync Characters Animating Lip-Sync Characters Yu-Mei Chen Fu-Chung Huang Shuen-Huei Guan Bing-Yu Chen Shu-Yang Lin Yu-Hsin Lin Tse-Hsien Wang National Taiwan University University of California at Berkeley Digimax {yumeiohya,drake,young,b95705040,starshine}@cmlab.csie.ntu.edu.tw

More information

Learning-Based Facial Rearticulation Using Streams of 3D Scans

Learning-Based Facial Rearticulation Using Streams of 3D Scans Learning-Based Facial Rearticulation Using Streams of 3D Scans Robert Bargmann MPI Informatik Saarbrücken, Germany Bargmann@mpi-inf.mpg.de Volker Blanz Universität Siegen Germany Blanz@informatik.uni-siegen.de

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Learnt Inverse Kinematics for Animation Synthesis

Learnt Inverse Kinematics for Animation Synthesis VVG (5) (Editors) Inverse Kinematics for Animation Synthesis Anonymous Abstract Existing work on animation synthesis can be roughly split into two approaches, those that combine segments of motion capture

More information

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model Caifeng Shan, Shaogang Gong, and Peter W. McOwan Department of Computer Science Queen Mary University of London Mile End Road,

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Expressive Speech Animation Synthesis with Phoneme-Level Controls

Expressive Speech Animation Synthesis with Phoneme-Level Controls Expressive Speech Animation Synthesis with Phoneme-Level Controls Zhigang Deng, and Ulrich Neumann Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu Technical

More information

Expanding gait identification methods from straight to curved trajectories

Expanding gait identification methods from straight to curved trajectories Expanding gait identification methods from straight to curved trajectories Yumi Iwashita, Ryo Kurazume Kyushu University 744 Motooka Nishi-ku Fukuoka, Japan yumi@ieee.org Abstract Conventional methods

More information

Multi-Modal Face Image Super-Resolutions in Tensor Space

Multi-Modal Face Image Super-Resolutions in Tensor Space Multi-Modal Face Image Super-Resolutions in Tensor Space Kui Jia and Shaogang Gong Department of Computer Science Queen Mary University of London London, E1 4NS, UK {chrisjia,sgg}@dcs.qmul.ac.uk Abstract

More information

Manifold Clustering. Abstract. 1. Introduction

Manifold Clustering. Abstract. 1. Introduction Manifold Clustering Richard Souvenir and Robert Pless Washington University in St. Louis Department of Computer Science and Engineering Campus Box 1045, One Brookings Drive, St. Louis, MO 63130 {rms2,

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Face Recognition using Tensor Analysis. Prahlad R. Enuganti

Face Recognition using Tensor Analysis. Prahlad R. Enuganti Face Recognition using Tensor Analysis Prahlad R. Enuganti The University of Texas at Austin Final Report EE381K 14 Multidimensional Digital Signal Processing May 16, 2005 Submitted to Prof. Brian Evans

More information

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis 3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis Kooksang Moon and Vladimir Pavlović Department of Computer Science, Rutgers University Piscataway, NJ 885 {ksmoon, vladimir}@cs.rutgers.edu

More information

Behaviour based particle filtering for human articulated motion tracking

Behaviour based particle filtering for human articulated motion tracking Loughborough University Institutional Repository Behaviour based particle filtering for human articulated motion tracking This item was submitted to Loughborough University's Institutional Repository by

More information

Client Dependent GMM-SVM Models for Speaker Verification

Client Dependent GMM-SVM Models for Speaker Verification Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 14-PCA & Autoencoders 1 / 18

More information

Vision-based Control of 3D Facial Animation

Vision-based Control of 3D Facial Animation Eurographics/SIGGRAPH Symposium on Computer Animation (2003) D. Breen, M. Lin (Editors) Vision-based Control of 3D Facial Animation Jin-xiang Chai,1 Jing Xiao1 and Jessica Hodgins1 1 The Robotics Institute,

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element

More information

Facial expression recognition using shape and texture information

Facial expression recognition using shape and texture information 1 Facial expression recognition using shape and texture information I. Kotsia 1 and I. Pitas 1 Aristotle University of Thessaloniki pitas@aiia.csd.auth.gr Department of Informatics Box 451 54124 Thessaloniki,

More information