Architecture of an Animation System for Human Characters T. Pejša * and I.S. Pandžić * * University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia (tomislav.pejsa, igor.pandzic)@fer.hr Abstract Virtual human characters are found in a broad range of applications, from movies, games and networked virtual environments to teleconferencing and tutoring applications. Such applications are available on a variety of platforms, from desktop and web to mobile devices. Highquality animation is an essential prerequisite for realistic and believable virtual characters. Though researchers and application developers have ample animation techniques for virtual characters at their disposal, implementation of these techniques into an existing application tends to be a daunting and time-consuming task. In this paper we present visage SDK, a versatile framework for real-time character animation based on MPEG-4 FBA standard that offers a wide spectrum of features that includes animation playback, lip synchronization and facial motion tracking, while facilitating rapid production of art assets and easy integration with existing graphics engines. I. INTRODUCTION Virtual characters have long been a staple of the entertainment industry namely, motion pictures and electronic games but in more recent times they have also found application in numerous other areas, such as education, communications, healthcare and business, where they are found in roles of avatars, virtual tutors, assistants, companions etc. A category of virtual characters that has been an exceptionally active topic of research are embodied conversational agents (ECAs), characters that interact with real humans in direct, face-toface conversations. Virtual character applications are of great potential interest to the field of telecommunications. Wellarticulated human characters are a common feature in networked virtual environments such as Second Life, Google Lively and World of Warcraft, where they are found in roles of user avatars and non-player characters (NPCs). A potential use of virtual characters is in video conferences, where digital avatars can be used to replace video streams of human participants and thus conserve bandwidth. Up to recently virtual characters have been almost exclusive to desktop and browser-based network applications, but growing processing power of mobile platforms now allows their use in mobile applications as well. These developments have resulted in increasing demand for high-quality visual simulation of virtual humans. This visual simulation consists of two aspects graphical model and animation. The latter encompasses body animation (locomotion, gestures) and facial animation (expressions, lip movements, facial gestures). While many open-source and proprietary rendering solutions deliver excellent graphical quality, their animation functionality, particularly facial animation, is often limited. Moreover, they often offer limited or no tools for production of characters and animations, requiring the user to invest a great deal of effort into setting up a suitable art pipeline. Our system seeks to address this by delivering greater animation capabilities, while being general enough to work with any 3D engine and thus facilitating development of applications with cutting edge visuals. Our principal contributions are these: design of a character animation system architecture that supports advanced animation features and provides tools for production of new character animations assets with minimal expenditure of time and effort a model for decoupling animation, asset production and rendering to enable fast and easy integration of the system with different graphics engines and application frameworks Facial motion tracking, lip synchronization and other advanced features make visage SDK especially suited for applications such as ECAs and low-bandwidth video communications. Due to simplicity of art asset production our system is ideal for researchers with limited resources at their disposal. We begin with a brief summary of related work and continue with an overview of our system's features, followed by a description of the underlying architecture. Finally, we discuss our future work and planned improvements to the system. II. RELATED WORK Though virtual characters have been a highly active area of research for years, little effort has been made to propose a system which would integrate various aspects of their visual simulation and be easily usable in combination with different graphics engines and for a broad range of applications. The most recent and ambitious effort is SmartBody, a modular system for animation and behavior modeling of ECAs [1]. SmartBody sports more advanced low-level animation than visage SDK, featuring hierarchies of customizable, scheduled controllers. SmartBody also supports behavior modeling through Behavior markup language (BML) scripts [2]. However, SmartBody lacks some of visage SDK's integrated functionality, namely face tracking, lip sync and visual text-to-speech and has no built-in capabilities for character model production. It also features a less common method of interfacing with
the renderer namely, via TCP whereas visage SDK is statically or dynamically linked with the main engine. The new visage SDK system builds upon the earlier visage framework for facial animation [3], introducing new features such as body animation support and facial motion tracking. It also greatly enhances integration capabilities by enabling easy integration into other graphics engines. Engines for simulations and electronic games typically have modular and extensible architectures, and it is common for such engines to feature third-party components. Companies such as Havok and NaturalMotion even specialize in developing modular animation and physics systems intended to be integrated into existing architectures. These architectural concepts are commonly found in non-science literature on graphics engine design and we found such resources to be very suitable references during development of our system [13] [14] [15]. III. FEATURES visage SDK includes the following core features: animation playback lip synchronization visual text-to-speech (VTTS) conversion facial motion tracking from video In addition to these, visage SDK also includes functionality for automatic off-line production of character models and their preparation for real-time animation: face model generation from photographs morph target cloning This functionality can be integrated into the user's own applications and it is also available as full-featured standalone tools or plug-ins for 3D modeling software. A. Animation playback visage SDK animation system is based on MPEG-4 Face and Body Animation (FBA) standard [4] [5], which defines a set of animation parameters (FBAPs) needed for detailed and efficient animation of virtual humans. These parameters can be divided into the following categories: body animation parameters (BAPs) these parameters control individual degrees of freedom (DOFs) of the character's skeleton (e.g., r_shoulder_abduct) low-level facial animation parameters (FAPs) these control movements of individual facial features (e.g., open_jaw or raise_l_i_eyebrow; see Fig. 1) expression high-level FAP which controls the facial expression (e.g., joy or sadness) viseme high-level FAP which controls the shape of the lips during speech (e.g., TH or aa) Animation in MPEG-4 FBA is nothing more than a temporal sequence of FBAP value sets. Our system is capable of loading FBA animations from MPEG-4 standard file format and applying them, frame-by-frame, to the character model. How each FBAP value is applied to the model depends on the graphics engine visage SDK doesn't concern itself with details of FBAP implementation. Figure 1: MPEG-4 FBA face, marked with facial definition parameters (FDPs) Figure 2: Face model imported from FaceGen and animated in visage SDK B. Lip synchronization visage SDK features a lip sync component for both online and off-line applications. Speech signal is analyzed and classified into visemes using neural networks (NNs). A genetic algorithm (GA) is used to automatically train the NNs [6] [8]. Our lip sync implementation is language-independent and has been successfully used with a number of different languages, including English, Croatian, Swedish and Japanese [7]. C. Visual text-to-speech visage SDK features a simple visual text-to-speech (VTTS) system based on Microsoft SAPI. It converts the SAPI output into a sequence of FBA visemes [9]. D. Facial motion tracking The facial motion tracker tracks facial movements of a real person from recorded or live video stream. The motion tracking algorithm is based on active appearance models (AAM) and doesn't require markers or special cameras a simple, low-cost webcam is sufficient. Tracked motion is encoded as a sequence of FAP values and applied to the virtual character in real-time. In addition to this functionality, the facial motion tracker also supports automatic feature detection in static 2D images, which can be used to further automate the process of face model generation from photographs (see next section) [10]. Potential applications of the system include humancomputer interaction and teleconferencing, where it can be
used to drive 3D avatars with the purpose of replacing video streams of human participants. E. Face model generation from photos Face model generator can be used to rapidly generate 3D face models. It takes a collection of orthogonal photographs of the head as input and uses them to deform a generic template face and produce a face model that matches the individual in the photographs [11]. Since the resulting models always have the same topology, the cloner can automatically generate morph targets for facial animation. F. Facial motion cloning The cloner copies morph targets from a source face model onto a target model [12]. For arbitrary models it requires that the user maps a set of feature points (FDPs) to vertices of the model, though this step can be bypassed if the target model and the source model have identical topologies. The cloner also supports fully automated processing of face models generated by Singular Inversions FaceGen application (Fig. 2). IV. ARCHITECTURE A. Components visage SDK has a multi-layered architecture and is composed of the following key components: Scene wrapper Animation player High-level components lip sync, TTS, face tracker, character model production libraries (face model generator, facial motion cloner) Scene wrapper provides a common, rendererindependent interface to the character model in the scene. Its main task is to interpret animation parameter values and apply them to the model. Furthermore, it aggregates information about the character model pertinent to MPEG- 4 FBA most notably mappings of FBAPs to skeleton joint transformations and mesh morph targets. This highlevel model data can be loaded and serialized to an XMLbased file format called VCM (Visage Character Model). Finally, scene wrapper also provides direct access to the model's geometry (meshes and morph targets) and joint transformations, permitting model production components to work with any model irrespective of the underlying renderer. Animation player is the core runtime component of the system, tasked with playing generalized FBA actions. These actions can be animations loaded from MPEG-4.fba files, but also procedural actions such as gaze following. Animation player can play the actions in its own thread or it can be updated manually in every frame. High-level components include lip sync, text-to-speech and facial motion tracker. They are implemented as FBA actions and therefore driven by the animation player. Character model production components are meant to be used off-line and so they don't interface with the Application visage SDK High-level components LipSync Configure actions & add them to the player Text-to-Speech Facial Motion Tracker visage SDK Model production Application Get FBAP values & blend Animation Player Apply FBAP value set Scene Wrapper Set bone transformations / morph weights Facial Motion Cloner Face Model Generator Scene Wrapper Get geometry Update geometry Get geometry Update geometry Rendering Engine Rendering Engine Figure 3: visage SDK architecture
animation player. They access the model's geometry via the common scene wrapper. B. Integration with a graphics engine When it comes to integration with graphics engines, visage SDK is highly flexible and places only bare minimum requirements before the target engine. The engine should support basic character animation techniques skeletal animation and mesh morphing and the engine's API should provide the ability to manually set joint transformations and morph target blend weights. Animation is possible even if some of these requirements aren't met for example, in absence of morph target support, a facial bone rig can be used for facial animation. Minimal integration of the system is a trivial endeavor, amounting to subclassing and implementation of a single wrapper class representing the character model. Depending on desired functionality, certain parts of the wrapper can be left unimplemented e.g. there is no need to provide geometry access if the developer doesn't plan to use the cloner of face model generation features in their application. The 3D model itself is loaded and handled by the engine, while FBAP mappings and other information pertaining to MPEG-4 FBA are loaded from VCM files. VCM files are tied to visage SDK rather than the graphics engine, which means they are portable and can be reused for a character model or even different models with a similar structure regardless of the underlying renderer. This greatly simplifies model production and reduces interdependence of the art pipelines. C. Component interactions A simplified overview of runtime component interactions is illustrated in Fig. 3. Animation process flows in the following manner: Application adds actions to the animation player. For example, lip sync coupled with gaze following and a set of simple repeating facial gestures (e.g. blinking). Animation player executes the animation loop. From each action it obtains the current frame of animation as a set of FBAP values, blends all the sets together and applies them to the character model via the wrapper. Scene wrapper receives the FBAP value set and interprets the values depending on the character's FBAP mappings. Typically, BAPs are converted to Euler angles and applied to bone transformations, while FAPs are interpreted as morph target blend weights. For cloner and face model generator interactions are even more straightforward and amount to obtaining and updating the model's geometry via the model wrapper. Figure 4: FBAPMapper an OGRE-based application for mapping animation parameters D. Art pipeline As previously indicated, the art pipeline is very flexible. Characters are modeled in 3D modeling applications and exported into the target engine. Naturally, FBAPs need to be mapped to joints and morph targets of the model. This is done using a special plug-in for the 3D modeling application if one is available, otherwise it needs to be handled by a stand-alone application with appropriate 3D format support. For animations the pipeline is similar, and again a plug-in is used for export and import. We also provide stand-alone face model and morph target production applications that use our production libraries. These applications rely on intermediate file formats (currently VRML or OGRE formats, though support for others will be added in the future) to obtain the model, while results are output via the intermediate format in combination with VCM. Fig. 4 shows a screenshot of a simple application for mapping and testing animation parameters. V. EXAMPLES We have so far successfully integrated our system with two open-source rendering engines, with more implementations on the way. The results are presented in this section. A. OGRE OGRE [16] is one of the most popular open-source, cross-platform rendering engines. Its features include a powerful object-oriented interface, support for both OpenGL and Direct3D graphical APIs, shader-driven architecture, material scripts, hardware-accelerated skeletal animation with manual bone control, hardwareaccelerated morph target animation etc. Despite challenges encountered in implementing a wrapper around certain features, we have achieves both face and body animation in OGRE (Fig. 5 and 6).
OGRE is also notable for its extensive art pipeline, supported by exporters from nearly every modeling suite in existence. We initially encountered difficulties in loading complex character models composed of multiple meshes, because basic OGRE doesn't support file formats capable of storing entire scenes. However, this shortcoming is rectified by the community-made DotScene loader plug-in, and a COLLADA loader is also under development by the OGRE community. B. Irrlicht Though Irrlicht [17] doesn't boast OGRE's power, it is nonetheless popular for its small size and ease of use. Its main shortcoming in regard to our system is lack of support for morph target animation. However, we were able to alleviate this by creating a face model with a bone rig and parametrizing it over MPEG-4 FBAPs, with very promising results (see Fig. 8). Unlike OGRE's art pipeline, which is based on exporter plug-ins for 3D modeling applications, Irrlicht's art pipeline relies on a large number of loaders for various file formats. We found the loader for Microsoft.x format to be the most suited to our needs and were able to successfully import several character models, both with body and face rig (Fig. 7). C. Upcoming implementations We are working on integrating visage SDK with several other engines in concurrence. These include: StudierStube (StbES) [19] a commercial augmented reality (AR) kit with a 3D renderer and support for character animation Horde3D [18] a lightweight, open-source renderer Panda3D an open-source game engine known for its intuitive Python-based API Of these we find StbES to be the most promising, as it will enable us to deliver the power of visage SDK animation system to mobile platforms and combine it with StSb's extensive AR features. VI. CONCLUSIONS AND FUTURE WORK Our system supports a variety of character animation features and facilitates rapid application development and art asset production. Its feature set makes it suitable for research and commercial applications such as embodied agents and avatars in networked virtual environments and telecommunications, while flexibility of its architecture means it can be used on a variety of platforms, including mobile devices. We have successfully integrated it with popular graphics engines and plan to provide more implementations in near future, while simultaneously striving to make integration even easier. Furthermore, we are continually working on enhancing our system with new features. An upcoming major upgrade will introduce a new system for interactive motion controls based on parametric motion graphs and introduce character behavior modeling capabilities via BML. Our goal is to develop a universal and modular system for powerful, yet intuitive modeling of character behavior and continue using it as a backbone of our research into high-level character control and applications involving virtual humans. We plan to release a substantial Figure 5: Lip sync in OGRE Figure 6: Body animation in OGRE
Figure 7: Body animation in Irrlicht Figure 8: Facial animation in Irrlicht portion of our system under an open-source license. ACKNOWLEDGMENT This work was partly carried out within the research project "Embodied Conversational Agents as interface for networked and mobile services" supported bythe Ministry of Science, Education and Sports of the Republic of Croatia. It was also partly supported by Visage Technologies. Integration of visage SDK with OGRE, Irrlicht and other engines was done by Mile Dogan, Danijel Pobi, Nikola Banko, Luka Šverko and Mario Medvedec, undergraduate students at the Faculty of Electrical Engineering and Computing in Zagreb, Croatia. REFERENCES [1] M. Thiebaux, A.N. Marshall, S. Marsella, M. Kallmann, "SmartBody: behavior realization for embodied conversational agents," in International Conference on Autonomous Agents, 2008, vol. 1, pp. 151-158. [2] S. Kopp et al., "Towards a common framework for multimodal generation: The behavior markup language," in Intelligent Virtual Agents, 2006, pp. 205-217. [3] I.S. Pandžić, J. Ahlberg, M. Wzorek, P. Rudol, M. Mošmondor, "Faces everywhere: towards ubiquitous production and delivery of face animation," in International Conference on Mobile and Ubiquitous Multimedia, 2003, pp. 49-55. [4] I.S. Pandžić, R. Forchheimer, Ed., MPEG-4 Facial Animation - The Standard, Implementations and Applications, John Wiley & Sons, 2002. [5] ISO/IEC 14496 MPEG-4 International Standard, Moving Picture Experts Group, www.cselt.it/mpeg [6] G. Zorić, I.S. Pandžić, "Real-time language independent lip synchronization method using a genetic algorithm," in special issue of Signal Processing Journal on Multimodal Human- Computer Interfaces, vol. 86, issue 12, pp. 3644-3656, 2006. [7] A. Čereković et al., "Towards an embodied conversational agent talking in Croatian," in International Conference on Telecommunications, 2007, pp. 41-47. [8] G. Zorić, I.S. Pandžić, "A real-time lip sync system using a genetic algorithm for automatic neural network configuration," in IEEE International Conference on Multimedia & Expo, 2005, vol. 6, pp. 1366-1369. [9] C. Pelachaud, "Visual Text-to-Speech" in MPEG-4 Facial Animation - The Standard, Implementations and Applications, I.S. Pandžić, R. Forchheimer, Ed., John Wiley & Sons, 2002. [10] G. Fanelli, M. Fratarcangeli, "A non-invasive approach for driving virtual talking heads from real facial movements," in 3DTV Conference, 2007, pp. 1-4. [11] M. Fratarcangeli, M. Andolfi, K. Stanković, I.S. Pandžić, "Animatable face models from uncalibrated input features," unpublished [12] I.S. Pandžić, "Facial Motion Cloning," Graphical Models Journal, vol. 65, issue 6, pp. 385-404, 2003. [13] D. Eberly, 3D Game Engine Architecture, Morgan Kaufmann, Elsevier, 2005. [14] S. Zerbst, O. Duvel, 3D Game Engine Programming, Course Technology PTR, 2004. [15] Havok Physics Animation 6.00 User Guide, Havok, 2008. [16] OGRE Manual v1.6, 2008, http://www.ogre3d.org/docs/manual/ [17] Nicolas Schulz, Horde3D Documentation, 2009, http://www.horde3d.org/docs/manual.html [18] Nikolaus Gebhardt, Irrlicht Engine 1.5 API Documentation, 2008, http://irrlicht.sourceforge.net/docu/index.html [19] Christian Doppler Laboratory, Graz University of Technology, "Handheld augmented reality," 2008, http://studierstube.icg.tugraz.ac.at/handheld_ar/