Facial Animation System Based on Image Warping Algorithm

Similar documents
Facial Expression Morphing and Animation with Local Warping Methods

Facial Animation System Design based on Image Processing DU Xueyan1, a

Speech Driven Synthesis of Talking Head Sequences

Facial Deformations for MPEG-4

Fast Facial Motion Cloning in MPEG-4

2D Image Morphing using Pixels based Color Transition Methods

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

A Multiresolutional Approach for Facial Motion Retargetting Using Subdivision Wavelets

CS 231. Deformation simulation (and faces)

VIDEO FACE BEAUTIFICATION

Image-Based Deformation of Objects in Real Scenes

Virtual Interaction System Based on Optical Capture

Face Synthesis in the VIDAS project

POLYMORPH: AN ALGORITHM FOR MORPHING AMONG MULTIPLE IMAGES

Transfer Facial Expressions with Identical Topology

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article

Sample Based Texture extraction for Model based coding

CS 231. Deformation simulation (and faces)

A reversible data hiding based on adaptive prediction technique and histogram shifting

Study on Delaunay Triangulation with the Islets Constraints

Data-Driven Face Modeling and Animation

network and image warping. In IEEE International Conference on Neural Networks, volume III,

A Novel Double Triangulation 3D Camera Design

Modeling Coarticulation in Continuous Speech

An Adaptive Threshold LBP Algorithm for Face Recognition

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

The Vehicle Logo Location System based on saliency model

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

D DAVID PUBLISHING. 3D Modelling, Simulation and Prediction of Facial Wrinkles. 1. Introduction

Fingerprint Mosaicking by Rolling with Sliding

Design and Application of the Visual Model Pool of Mechanical Parts based on Computer-Aided Technologies

AUTOMATIC 2D VIRTUAL FACE GENERATION BY 3D MODEL TRANSFORMATION TECHNIQUES

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Computer Animation Visualization. Lecture 5. Facial animation

Reconstruction of complete 3D object model from multi-view range images.

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Georgios Tziritas Computer Science Department

Image Morphing: A Literature Study

Synthesizing Realistic Facial Expressions from Photographs

Effects Of Shadow On Canny Edge Detection through a camera

Learning The Lexicon!

AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING.

Ray tracing based fast refraction method for an object seen through a cylindrical glass

Face Detection CUDA Accelerating

Modeling and Analyzing 3D Shapes using Clues from 2D Images. Minglun Gong Dept. of CS, Memorial Univ.

Image warping/morphing

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

Research on QR Code Image Pre-processing Algorithm under Complex Background

Muscle Based facial Modeling. Wei Xu

A Simple Automated Void Defect Detection for Poor Contrast X-ray Images of BGA

CVGIP: Graphical Models and Image Processing, Academic Press, 60(5),pp , September 1998.

VTalk: A System for generating Text-to-Audio-Visual Speech

Texture Segmentation by Windowed Projection

CSE452 Computer Graphics

Dynamic Obstacle Detection Based on Background Compensation in Robot s Movement Space

An Algorithm for Seamless Image Stitching and Its Application

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK

Enhanced Active Shape Models with Global Texture Constraints for Image Analysis

BUILDING DETECTION AND STRUCTURE LINE EXTRACTION FROM AIRBORNE LIDAR DATA

Facial expression recognition using shape and texture information

FACET SHIFT ALGORITHM BASED ON MINIMAL DISTANCE IN SIMPLIFICATION OF BUILDINGS WITH PARALLEL STRUCTURE

AUTOMATIC CARICATURE GENERATION BY ANALYZING FACIAL FEATURES

The Study and Implementation of Text-to-Speech System for Agricultural Information

View-dependent fast real-time generating algorithm for large-scale terrain

Warping and Morphing. Ligang Liu Graphics&Geometric Computing Lab USTC

Mesh Morphing. Ligang Liu Graphics&Geometric Computing Lab USTC

A Retrieval Method for Human Mocap Data Based on Biomimetic Pattern Recognition

3D Face Deformation Using Control Points and Vector Muscles

Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING

Study on Gear Chamfering Method based on Vision Measurement

Pose Space Deformation A unified Approach to Shape Interpolation and Skeleton-Driven Deformation

Optical Character Recognition Based Speech Synthesis System Using LabVIEW

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

2 Proposed Methodology

3-D Morphing by Direct Mapping between Mesh Models Using Self-organizing Deformable Model

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Fusion of 3D B-spline surface patches reconstructed from image sequences

Restricted Nearest Feature Line with Ellipse for Face Recognition

Computers and Mathematics with Applications. An embedded system for real-time facial expression recognition based on the extension theory

Tracking facial features using low resolution and low fps cameras under variable light conditions

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

Analysis of fluid-solid coupling vibration characteristics of probe based on ANSYS Workbench

Image Morphing. Application: Movie Special Effects. Application: Registration /Alignment. Image Cross-Dissolve

Research on Evaluation Method of Video Stabilization

AN WIRELESS COLLECTION AND MONITORING SYSTEM DESIGN BASED ON ARDUINO. Lu Shaokun 1,e*

Stereo Vision Image Processing Strategy for Moving Object Detecting

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

Human Body Shape Deformation from. Front and Side Images

Hole repair algorithm in hybrid sensor networks

A face recognition system based on local feature analysis

Motion Synthesis and Editing. Yisheng Chen

Time Stamp Detection and Recognition in Video Frames

Image Edge Detection

Lip Tracking for MPEG-4 Facial Animation

Skeleton-based Template Retrieval for Virtual Maize Modeling

Towards Audiovisual TTS

An Automatic 3D Face Model Segmentation for Acquiring Weight Motion Area

Stereo pairs from linear morphing

Transcription:

Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China 1:lfdong@ustc.edu.cn, 2:ytwang@mail.ustc.edu.cn, 3:nk@ustc.edu.cn, 4:lukui@mail.ustc.edu.cn Abstract This paper is about the technologies in facial animation application. The major contents are as follows:1)we summary the classic image warping algorithms, and analyze the advantages and disadvantages of these algorithms when used in facial animation;2)we study the principles of Mesh Warping image warping algorithms, and propose a novel Mesh Warping algorithm based on scan lines. The algorithm meets real-time facial animation system in running time, reduces the constraints of constructing splines in some degree, and cuts down the difficulty of image warping, according to experimental results;3)we introduce MEPG-4 facial animation standard, and implement a speechdriven facial animation system using it. The system makes use of our scan lines Mesh Warping algorithm, which produces a variety of mouth and facial expressions of the speaker, more realistic animation, higher real-time, and better with the speech sync. Keywords-Facial Animation; Image Warping; MPEG-4; Viseme Interpolation I. Introduction The human face and speech are the two most important ways of human communications. Combining the animation and speech processing, speech animation technology generates animation with speech and mouth shape changing synchronously by computer, also known as "Talking Head" or "Mouth Shape-Sync". Speech animation technology is divided mainly into three types: samples-based, 3D model-based and single imagebased. Speech animation based on samples generates new facial animation by reorganizing the given samples. The advantages are very realistic, but it needs a live talk video with hard data acquisition. Besides it only gets the facial animation in the samples. 3D model-based facial animation establishes a 3D face model first, and then drives the face model to generate speech animation in which speech and mouth shape playing synchronously. Talking face can has a variety of expressions. This approach is not better than samples-based animation in reality at the moment, but data acquisition is easy (only a few images of different angles). It can be made conveniently (no or little user interaction), and generate realistic 3D animation. The speech animation we describe here is based on image warping: inputting one human face (or animal face, cartoon face) image, positioning feature point, we save the positions into data files. For the sound files inputted, we perform speech recognition, and generate phoneme timestamps files. Then, we synchronize images and audio files selected, play the speech, and drive face in the image for animation simultaneously. The system can be applied to humancomputer interaction using images which contain human, animal or cartoon faces. Image warping technique is the core in single image-based speech animation. The technologies used in face image warping with better effects are divided into two categories: 1) The warping based on scattered point interpolation: the typical algorithm is the warping algorithm based on radial basis function (RBF) [1, 2, 3]. It is more convenient in positioning feature points, and can produce realistic warped images. But the selected functions of RBF such as Gaussian function are generally very complex. They are slow in warping as a result. In addition, the algorithm is difficult to guarantee the stability border of warped image. 2) The warping based on fragments: typical algorithms are the warping algorithms based on triangulation [4, 5] and the grid distortion algorithms [6, 7]. Triangulation-based image warping algorithms can obtain good results when performing local warping in the face images. But the preprocessing of images divided into triangular pieces is relatively complex, and the reasonableness and effectiveness of block affect directly the final warping results. Therefore, they are in less convenient. The entire warping work must do again especially warping results are not satisfied and need to be adjusted. G.Wolberg proposed the distorted grid warping algorithm which usually uses cubic spline interpolation (Fig. 1). Grid distortion algorithm is mainly used for the shape transitions between two faces (or called warping), we call these two images IS and IT (the source and target images).source image links with grid MS. It specifies the coordinates of control points or landmarks. The second grid MT is specified their corresponding positions in the target image. MS and MT define spatial transformation at the same time, and the transformation will map all points in IS to IT. The mesh topology is limited isomorphism which allows non-folding and discontinuity. Therefore, the nodes in MT can disappear 978-1-4577-0321-8/11/$26.00 2011 IEEE 2648

Figure 1. Meshing of image warping based on grid distortion from MS according to requirements, as long as they do not cause self-intersection. Grids are limited to a fixed boundary for simplicity moreover. Lin in National Taiwan University has applied this approach in the facial animation system [8], in which facial features are defined by a mesh mask, using the meshing algorithm for image warping to generate facial animation. Because converging mesh on the face image is too complex and not easy in user interaction, the requirements are high for mesh in addition. The animation process is very random if it is used for speech animation, because the system requirements high precision. Control is not convenient as a result. II. Image warping based on scan lines We propose a novel image warping algorithm based on scan lines after carefully researching the Mesh Warping algorithm. Fig. 2 describes the details of our warping method which includes three stages. 1 Using the feature points in the source image and target image to generate the feature points in the intermediate warped image (Fig. 3). The x coordinates of feature points comes from the target feature points, and y coordinates of feature points come from the source feature points. 2 Performing warping in x direction. a Using source feature points and intermediate feature points to construct the source vertical splines (Fig. 4) and target vertical splines separately, while every spline is generated by linear interpolation. b Horizontal scan lines scan vertical bars progressively. One scan line intersects with two adjacent vertical splines, and the middle points come from the control points (two intersection points) using linear interpolation. Algorithm 1: Image Warping Based on Scan Lines Input: A gray-scale image size rectangular block (M * N) and a number of feature points which distribute in the source image and target image. Output: The new coordinates of each pixel in the warped image. Figure 2. Image warping algorithm Figure 3. Schematic diagram of feature points Figure 4. Schematic diagram of the vertical spline Figure 5. Schematic diagram of horizontal splines 3 Performing warping in y direction. a Using source feature points and intermediate feature points to construct the source horizontal splines (Fig. 5) and target horizontal splines separately, while every spline is generated by linear interpolation. 2649

(a) Source image (b) Mouth shape 1 (c) Mouth shape 2 (d) Mouth shape 3 Figure 6. Various Mouth Shape b Vertical scanning lines scan horizontal bands by column. One scan line intersects with two adjacent horizontal splines, and the middle points come from the control points (two intersection points) using linear interpolation. The horizontal and vertical splines as described in the steps above can be constructed independently, and do not form spline grid sharing some feature points any more. This improvement reduces the difficulty of constructing splines and makes it easier for the image warping in speech animation. Parts of the warping effects are shown in Fig. 6. III. 2D facial animation system based on image warping Facial animation in MPEG-4 is only a standard, and does not give a specific solution which gives researchers a vast space. We implement a face speech animation system which is composed by face parameters modeling module, speech recognition module, and animation generating module in this paper based on MPEG-4. The animation generation module composed by the animation parameters calculation module and image warping module. System block diagram is shown in Fig. 7. System inputs are arbitrary facial image files and audio files. We local the coordinates of feature points in face images by face parameters modeling module. Speech streams are recognized into visual phoneme streams by speech recognition module. System definition files define the correspondence between standard face model and the visual phonemes, including the visual phoneme definitions, the displacement factor definitions of feature points, and the expressions FAP (Facial Animation Parameters) definitions. We can obtain a set of FAP values corresponding to currently playing visual phonemes by pre-designed operation. We also can get the displacement of feature points from FAP values and the FDP (Facial Definition Parameters) values (the coordinates of facial feature points) by calculating in the face image. Finally, the image warping algorithm could generate animation. Broadcasting speech and animation simultaneously, speech animation system can be realized. A. Face parameters modeling module We mark facial feature points using face parameters modeling module. Our system selects 45 facial feature points which most distribute around the eyes and the mouth to describe a frontal face image (Fig. 8). These points play a major role in generating animation and are mainly used to achieve a variety of mouth shapes, expressions, random blinking and other effects. Feature points which are used for realization the random shaking of the whole face are lower in cheek area. B. Speech recognition module Speech recognition module is mainly used for extracting the visual phonemes in the audio file streams. Our system adopts speech recognition engine SAPI5.0 as a visual phonemes extraction tool. SAPI 5.0 (MS Speech SDK 5.0 for short) is released in October 2000 by the Microsoft Corporation. Users can easily develop applications such as speech recognition, speech synthesis and related applications using SAPI 5.0. Figure 7. Speech-driven facial animation flow chart Figure 8. Distribution of feature points in face parameters modeling 2650

TABLE 1. VISUAL PHONEME AND PHONEME # Phonemes 0 ae ax ah aa ay 1 P b m 2 D t l 3 ey eh uh 4 F v 5 K g h 6 Y iy ih ix 7 sh ch jh zh 8 ao ow aw oy 9 R 10 W uw 11 S z 12 th dh 13 N ng 14 Er 15 silence Visual phonemes (viseme) are the video parameters corresponding to phonemes which represent kinds of mouth shapes of some pronunciation. The visual phonemes used in the system are listed in Table. 1. C. Animation parameters calculation module This system uses high-layer actions, bottom-layer FAP, feature points and spline points to consist the four-layer control structure for facial animation, shown in Fig. 9. High-layer actions include mouth shapes and facial expressions. Low-layer FAP is defined as FAP3-68 in MPEG-4 standard, a total of 68. The implementation of FAP is achieved by moving a set of related feature points whose locations are determined in a neutral state face model according to the definition in MPEG-4. Each animation frame is represented into movement of a number of feature points and related splines which are relative to the neutral state face model at last. For a given intensity of a high-layer action, calculating methods about the displacement of feature points and the splines are: Bottom-layer FAP strength = High-layer actions strength * Bottom-layer FAP weights Feature points displacement = Bottom-layer FAP strength * Feature points weights * FAPU Spline points displacement = Feature points displacement * Spline points weights Four-layer structure can reduce the facial animation dependent on the face mesh, making the replacement of model more convenient. We abstract high-layer actions which make application and expansion more convenient. D. Image warping module Our system adopts Mesh Warping algorithm based on scan lines for image warping with comprehensive consideration of real-time animation, reality and operational flexibility. The constructing of splines is relatively more complicated in this method, but the effects of warping have great influence. As mentioned earlier, Mesh Warping algorithm based on scan lines can make image warping and constructing the splines in X and Y directions independently. After repeated experiments, the system determines eventually the following options. Horizontal splines covering the whole area of images achieve the warping in Y direction, as shown in Fig. 10; Vertical splines covering the mouth and eyes area respectively for reducing the difficulty of constructing splines achieves the warping in X direction, as shown in Fig. 11; Constructing the vertical splines for the whole image separately achieves the effect of shaking head in addition, as shown in Fig. 12. In Fig. 10, Fig. 11, and Fig. 12, black rectangles represent feature points; white rectangle represents the secondary feature points (mainly in order to assist constructing splines and stabilize warped image boundary, these points are computed simply by the feature points); diamonds represent the warping boundary, and control the warping only occurring inside the boundary area; upright triangles represent the pixels adjacent feature points; inverted triangles represent the points separated by a pixel with feature points. These adjacent or close points mainly control the warping of local area. It is worth mentioning that feature points in the chin are not in the vertical splines but in the horizontal splines taking into account the chin can only move up and down almost when talking (shaking head is considered separately). It shows one improvement for the traditional Mesh Warping algorithm, namely: constructing horizontal splines and vertical splines independently; warping in X and Y directions separately. Figure 9. Four-layer control structure Figure 10. Horizontal splines 2651

based on scan lines to producing a variety of mouth shape and facial expressions about talker. Nevertheless, facial animation produced by the system can be improved in many aspects. The system does not care the teeth inside the mouth, but put to use smearing method. Mouth opening becomes larger in some frames than the real size in actual. In addition, the animation effect of transitions between various visual phonemes and all kinds of expressions needs to be further improved. Figure 11. Vertical splines for facial warping Figure 12. Vertical splines for shaking head IV. Experimental results The animation process of our system is shown in Fig. 13. Using algorithm in Section 3, we achieve a speech-driven facial animation system. Fig. 14 shows some frames of facial animations produced by our system. The main functions of the speech-driven facial animation system can be describes as follows. Generating 16 self-defined visual phonemes corresponding to the mouth in the system; Providing the expression labels with six kinds of common expressions; Providing random blinking, shaking head and other effects, enhancing reality of the animation; Sharing visual phonemes, synchronization between voice and animation preferable; Animation frame rate is 20 frames/sec. V. Conclusion We analyze the facial animation research situation, and propose a facial animation generation method based on image warping. We use a frontal face image by choosing feature points to construct the face model, and achieve realistic facial animation using Mesh Warping image warping technology Figure 13. System animation flow chart 2652

Acknowledgement The author would like to thank the students Jiahui Chen and Meng Li in visual computing and visualization laboratory of University of Science and Technology of China for providing assistance. This work is supported by the Imagebased Speech Animation of Youth Innovation Fund in University of Science and Technology of China (2011-2012) and Intelligent Human-machine Speech Interaction Robot Key Technology of Research Programs in Anhui Province (2009-2011) under Grant No.09010206052. (a) Speech happily (b) Speech angrily References [1] D. Reisfld,N. Arad,N. Dyn and et al, Image warping by radial basis functions: Application to facial expressions, CVGIP: Graphical Models and Image Processing, 1994, 56(2), pp.161-172. [2] N. Arad, D. Reisfeld. Image warping using few anchor points and radial functions, Computer Graphics Forum,1995,14(1),pp.35-46. [3] J. Noh, D. Fidaleo, U. Neumann, Animated deformations with radial basis functions, In Proceedings of the ACM Symposium on Virtual Reality Software and Technology,Seoul,2000,pp.166-174. [4] G. Zhu, B. Zhang,L. Wu,Z.Hu, Research on Metamorphosis Using Delaunay Triangulation, Journal of Image and Graphics,2003,8A(6),pp.641-646. [5] Y. Zhang, H. Zhao, An Image Deformation Algorithm Based on Triangle Skeleton Coordinate, Journal of Image and Graphics,2001,6A(4),pp.365-368. [6] S. Lee, G. Wolberg, S.Y.Shin, Scattered data interpolation with multilevel B-splines, IEEE Transactions on Visualization and Computer Graphics, 1997, 3(3), pp.228-244. [7] D.B. Smythe, A two-pass mesh warping alogrithm for object transformation and image interpolation, Technical Report 1030,ILM Computer Graphics Department, Lucasfilm, San Rafael, Calif, 1998. [8] I. Lin, C. Hung, T. Yang, M. Ouhyoung, A Speech Driven Talking Head System Based on a Single Face Image, In Proceedings of the 7th Pacific Conference on Computer Graphics and Applications, Seoul,1999, pp.43-49. (c)speech surprisedly Figure 14. Speech-driven facial animation 2653