Architecture of an Animation System for Human Characters

Similar documents
Faces Everywhere: Towards Ubiquitous Production and Delivery of Face Animation

Fast Facial Motion Cloning in MPEG-4

Liv Personalized Avatars for Mobile Entertainment

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Liv Personalized Avatars for Mobile Entertainment

Automated Gesturing for Embodied Animated Agent: Speech-driven and Text-driven Approaches

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

Synthesizing Realistic Facial Expressions from Photographs

Advanced High Graphics

Standardized Prototyping and Development of Virtual Agents

Facial Deformations for MPEG-4

White Paper: Delivering Enterprise Web Applications on the Curl Platform

Cisco Digital Media System: Simply Compelling Communications

Virtual Human Creation Pipeline

Text2Video: Text-Driven Facial Animation using MPEG-4

A MAYA Exporting Plug-in for MPEG-4 FBA Human Characters

Communication in Virtual Environments. Communication in Virtual Environments

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Data-Driven Face Modeling and Animation

Principles of Computer Game Design and Implementation. Lecture 3

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

A Novel Unity-based Realizer for the Realization of Conversational Behavior on Embodied Conversational Agents

An Interactive Interface for Directing Virtual Humans

Face Synthesis in the VIDAS project

Breathing life into your applications: Animation with Qt 3D. Dr Sean Harmer Managing Director, KDAB (UK)

============================================================================

Animation. CS 465 Lecture 22

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

ADOBE CHARACTER ANIMATOR PREVIEW ADOBE BLOGS

Speech Driven Synthesis of Talking Head Sequences

A Hybrid System for Delivering Web Based Distance Learning and Teaching Material

Level-of-Detail Triangle Strips for Deforming. meshes

New Media Production week 3

Completing the Multimedia Architecture

Advanced Graphics and Animation

Animation & AR Modeling Guide. version 3.0

Character animation Christian Miller CS Fall 2011

3D Production Pipeline

Create Natural User Interfaces with the Intel RealSense SDK Beta 2014

A Scripting Language for Multimodal Presentation on Mobile Phones

lesson 24 Creating & Distributing New Media Content

MPML: A Multimodal Presentation Markup Language with Character Agent Control Functions

Animation of 3D surfaces

A Taxonomy of Web Agents

CG: Computer Graphics

Animation Tools THETOPPERSWAY.COM

Towards Audiovisual TTS

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database

Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions

Wimba Classroom VPAT

To Do. Advanced Computer Graphics. The Story So Far. Course Outline. Rendering (Creating, shading images from geometry, lighting, materials)

Course Outline. Advanced Computer Graphics. Animation. The Story So Far. Animation. To Do

CSE452 Computer Graphics

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO F ^ k.^

Tracking facial features using low resolution and low fps cameras under variable light conditions

A Novel Realizer of Conversational Behavior for Affective and Personalized Human Machine Interaction - EVA U-Realizer -

Pipeline and Modeling Guidelines

Adobe Authorware 7 as Programming Tool

Narrative Editing of Web Contexts on Online Community System with Avatar-like Agents

Date: June 27, 2016 Name of Product: Cisco Unified Customer Voice Portal (CVP) v11.5 Contact for more information:

Animations. Hakan Bilen University of Edinburgh. Computer Graphics Fall Some slides are courtesy of Steve Marschner and Kavita Bala

Animation Essentially a question of flipping between many still images, fast enough

3D on the Web Why We Need Declarative 3D Arguments for an W3C Incubator Group

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

YuJa Enterprise Video Platform Voluntary Product Accessibility Template (VPAT)

SMK SEKSYEN 5,WANGSAMAJU KUALA LUMPUR FORM

AUGMENTED REALITY BASED SHOPPING EXPERIENCE

VPAT. Voluntary Product Accessibility Template. Version 1.3. Supporting Features. Not Applicable. Supported with Exceptions. Supported with Exceptions

Computer Animation. Algorithms and Techniques. z< MORGAN KAUFMANN PUBLISHERS. Rick Parent Ohio State University AN IMPRINT OF ELSEVIER SCIENCE

Animation COM3404. Richard Everson. School of Engineering, Computer Science and Mathematics University of Exeter

Adding Advanced Shader Features and Handling Fragmentation

Tips on DVD Authoring and DVD Duplication M A X E L L P R O F E S S I O N A L M E D I A

Ecma TC43: Universal 3D

LATIHAN Identify the use of multimedia in various fields.

Animation by Adaptation Tutorial 1: Animation Basics

Voluntary Product Accessibility Template (VPAT) Applicable Sections

New Features. Importing Resources

DTask & LiteBody: Open Source, Standards-based Tools for Building Web-deployed Embodied Conversational Agents

Intel Authoring Tools for UPnP* Technologies

DIABLO VALLEY COLLEGE CATALOG

PERSONALIZED FACE ANIMATION IN SHOWFACE SYSTEM. Ali Arya Babak Hamidzadeh

Three-Dimensional Computer Animation

Using Classical Mechanism Concepts to Motivate Modern Mechanism Analysis and Synthesis Methods

The ExtReAM Library: Extensible Real-time Animations for Multiple Platforms

Rendering Grass with Instancing in DirectX* 10

Chapter 8 Visualization and Optimization

Cross-platform platform.

Lesson 5: Multimedia on the Web

Tata Elxsi benchmark report: Unreal Datasmith

Visualization of Manufacturing Composite Lay-up Technology by Augmented Reality Application

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Hardware Displacement Mapping

High Level Graphics Programming & VR System Architecture

Voluntary Product Accessibility Template (VPAT)

WebGL Meetup GDC Copyright Khronos Group, Page 1

Animation. Representation of objects as they vary over time. Traditionally, based on individual drawing or photographing the frames in a sequence

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

TRIBHUVAN UNIVERSITY Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering

Facial Animation System Based on Image Warping Algorithm

Streaming Media. Advanced Audio. Erik Noreke Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

Transcription:

Architecture of an Animation System for Human Characters T. Pejša * and I.S. Pandžić * * University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia (tomislav.pejsa, igor.pandzic)@fer.hr Abstract Virtual human characters are found in a broad range of applications, from movies, games and networked virtual environments to teleconferencing and tutoring applications. Such applications are available on a variety of platforms, from desktop and web to mobile devices. Highquality animation is an essential prerequisite for realistic and believable virtual characters. Though researchers and application developers have ample animation techniques for virtual characters at their disposal, implementation of these techniques into an existing application tends to be a daunting and time-consuming task. In this paper we present visage SDK, a versatile framework for real-time character animation based on MPEG-4 FBA standard that offers a wide spectrum of features that includes animation playback, lip synchronization and facial motion tracking, while facilitating rapid production of art assets and easy integration with existing graphics engines. I. INTRODUCTION Virtual characters have long been a staple of the entertainment industry namely, motion pictures and electronic games but in more recent times they have also found application in numerous other areas, such as education, communications, healthcare and business, where they are found in roles of avatars, virtual tutors, assistants, companions etc. A category of virtual characters that has been an exceptionally active topic of research are embodied conversational agents (ECAs), characters that interact with real humans in direct, face-toface conversations. Virtual character applications are of great potential interest to the field of telecommunications. Wellarticulated human characters are a common feature in networked virtual environments such as Second Life, Google Lively and World of Warcraft, where they are found in roles of user avatars and non-player characters (NPCs). A potential use of virtual characters is in video conferences, where digital avatars can be used to replace video streams of human participants and thus conserve bandwidth. Up to recently virtual characters have been almost exclusive to desktop and browser-based network applications, but growing processing power of mobile platforms now allows their use in mobile applications as well. These developments have resulted in increasing demand for high-quality visual simulation of virtual humans. This visual simulation consists of two aspects graphical model and animation. The latter encompasses body animation (locomotion, gestures) and facial animation (expressions, lip movements, facial gestures). While many open-source and proprietary rendering solutions deliver excellent graphical quality, their animation functionality, particularly facial animation, is often limited. Moreover, they often offer limited or no tools for production of characters and animations, requiring the user to invest a great deal of effort into setting up a suitable art pipeline. Our system seeks to address this by delivering greater animation capabilities, while being general enough to work with any 3D engine and thus facilitating development of applications with cutting edge visuals. Our principal contributions are these: design of a character animation system architecture that supports advanced animation features and provides tools for production of new character animations assets with minimal expenditure of time and effort a model for decoupling animation, asset production and rendering to enable fast and easy integration of the system with different graphics engines and application frameworks Facial motion tracking, lip synchronization and other advanced features make visage SDK especially suited for applications such as ECAs and low-bandwidth video communications. Due to simplicity of art asset production our system is ideal for researchers with limited resources at their disposal. We begin with a brief summary of related work and continue with an overview of our system's features, followed by a description of the underlying architecture. Finally, we discuss our future work and planned improvements to the system. II. RELATED WORK Though virtual characters have been a highly active area of research for years, little effort has been made to propose a system which would integrate various aspects of their visual simulation and be easily usable in combination with different graphics engines and for a broad range of applications. The most recent and ambitious effort is SmartBody, a modular system for animation and behavior modeling of ECAs [1]. SmartBody sports more advanced low-level animation than visage SDK, featuring hierarchies of customizable, scheduled controllers. SmartBody also supports behavior modeling through Behavior markup language (BML) scripts [2]. However, SmartBody lacks some of visage SDK's integrated functionality, namely face tracking, lip sync and visual text-to-speech and has no built-in capabilities for character model production. It also features a less common method of interfacing with

the renderer namely, via TCP whereas visage SDK is statically or dynamically linked with the main engine. The new visage SDK system builds upon the earlier visage framework for facial animation [3], introducing new features such as body animation support and facial motion tracking. It also greatly enhances integration capabilities by enabling easy integration into other graphics engines. Engines for simulations and electronic games typically have modular and extensible architectures, and it is common for such engines to feature third-party components. Companies such as Havok and NaturalMotion even specialize in developing modular animation and physics systems intended to be integrated into existing architectures. These architectural concepts are commonly found in non-science literature on graphics engine design and we found such resources to be very suitable references during development of our system [13] [14] [15]. III. FEATURES visage SDK includes the following core features: animation playback lip synchronization visual text-to-speech (VTTS) conversion facial motion tracking from video In addition to these, visage SDK also includes functionality for automatic off-line production of character models and their preparation for real-time animation: face model generation from photographs morph target cloning This functionality can be integrated into the user's own applications and it is also available as full-featured standalone tools or plug-ins for 3D modeling software. A. Animation playback visage SDK animation system is based on MPEG-4 Face and Body Animation (FBA) standard [4] [5], which defines a set of animation parameters (FBAPs) needed for detailed and efficient animation of virtual humans. These parameters can be divided into the following categories: body animation parameters (BAPs) these parameters control individual degrees of freedom (DOFs) of the character's skeleton (e.g., r_shoulder_abduct) low-level facial animation parameters (FAPs) these control movements of individual facial features (e.g., open_jaw or raise_l_i_eyebrow; see Fig. 1) expression high-level FAP which controls the facial expression (e.g., joy or sadness) viseme high-level FAP which controls the shape of the lips during speech (e.g., TH or aa) Animation in MPEG-4 FBA is nothing more than a temporal sequence of FBAP value sets. Our system is capable of loading FBA animations from MPEG-4 standard file format and applying them, frame-by-frame, to the character model. How each FBAP value is applied to the model depends on the graphics engine visage SDK doesn't concern itself with details of FBAP implementation. Figure 1: MPEG-4 FBA face, marked with facial definition parameters (FDPs) Figure 2: Face model imported from FaceGen and animated in visage SDK B. Lip synchronization visage SDK features a lip sync component for both online and off-line applications. Speech signal is analyzed and classified into visemes using neural networks (NNs). A genetic algorithm (GA) is used to automatically train the NNs [6] [8]. Our lip sync implementation is language-independent and has been successfully used with a number of different languages, including English, Croatian, Swedish and Japanese [7]. C. Visual text-to-speech visage SDK features a simple visual text-to-speech (VTTS) system based on Microsoft SAPI. It converts the SAPI output into a sequence of FBA visemes [9]. D. Facial motion tracking The facial motion tracker tracks facial movements of a real person from recorded or live video stream. The motion tracking algorithm is based on active appearance models (AAM) and doesn't require markers or special cameras a simple, low-cost webcam is sufficient. Tracked motion is encoded as a sequence of FAP values and applied to the virtual character in real-time. In addition to this functionality, the facial motion tracker also supports automatic feature detection in static 2D images, which can be used to further automate the process of face model generation from photographs (see next section) [10]. Potential applications of the system include humancomputer interaction and teleconferencing, where it can be

used to drive 3D avatars with the purpose of replacing video streams of human participants. E. Face model generation from photos Face model generator can be used to rapidly generate 3D face models. It takes a collection of orthogonal photographs of the head as input and uses them to deform a generic template face and produce a face model that matches the individual in the photographs [11]. Since the resulting models always have the same topology, the cloner can automatically generate morph targets for facial animation. F. Facial motion cloning The cloner copies morph targets from a source face model onto a target model [12]. For arbitrary models it requires that the user maps a set of feature points (FDPs) to vertices of the model, though this step can be bypassed if the target model and the source model have identical topologies. The cloner also supports fully automated processing of face models generated by Singular Inversions FaceGen application (Fig. 2). IV. ARCHITECTURE A. Components visage SDK has a multi-layered architecture and is composed of the following key components: Scene wrapper Animation player High-level components lip sync, TTS, face tracker, character model production libraries (face model generator, facial motion cloner) Scene wrapper provides a common, rendererindependent interface to the character model in the scene. Its main task is to interpret animation parameter values and apply them to the model. Furthermore, it aggregates information about the character model pertinent to MPEG- 4 FBA most notably mappings of FBAPs to skeleton joint transformations and mesh morph targets. This highlevel model data can be loaded and serialized to an XMLbased file format called VCM (Visage Character Model). Finally, scene wrapper also provides direct access to the model's geometry (meshes and morph targets) and joint transformations, permitting model production components to work with any model irrespective of the underlying renderer. Animation player is the core runtime component of the system, tasked with playing generalized FBA actions. These actions can be animations loaded from MPEG-4.fba files, but also procedural actions such as gaze following. Animation player can play the actions in its own thread or it can be updated manually in every frame. High-level components include lip sync, text-to-speech and facial motion tracker. They are implemented as FBA actions and therefore driven by the animation player. Character model production components are meant to be used off-line and so they don't interface with the Application visage SDK High-level components LipSync Configure actions & add them to the player Text-to-Speech Facial Motion Tracker visage SDK Model production Application Get FBAP values & blend Animation Player Apply FBAP value set Scene Wrapper Set bone transformations / morph weights Facial Motion Cloner Face Model Generator Scene Wrapper Get geometry Update geometry Get geometry Update geometry Rendering Engine Rendering Engine Figure 3: visage SDK architecture

animation player. They access the model's geometry via the common scene wrapper. B. Integration with a graphics engine When it comes to integration with graphics engines, visage SDK is highly flexible and places only bare minimum requirements before the target engine. The engine should support basic character animation techniques skeletal animation and mesh morphing and the engine's API should provide the ability to manually set joint transformations and morph target blend weights. Animation is possible even if some of these requirements aren't met for example, in absence of morph target support, a facial bone rig can be used for facial animation. Minimal integration of the system is a trivial endeavor, amounting to subclassing and implementation of a single wrapper class representing the character model. Depending on desired functionality, certain parts of the wrapper can be left unimplemented e.g. there is no need to provide geometry access if the developer doesn't plan to use the cloner of face model generation features in their application. The 3D model itself is loaded and handled by the engine, while FBAP mappings and other information pertaining to MPEG-4 FBA are loaded from VCM files. VCM files are tied to visage SDK rather than the graphics engine, which means they are portable and can be reused for a character model or even different models with a similar structure regardless of the underlying renderer. This greatly simplifies model production and reduces interdependence of the art pipelines. C. Component interactions A simplified overview of runtime component interactions is illustrated in Fig. 3. Animation process flows in the following manner: Application adds actions to the animation player. For example, lip sync coupled with gaze following and a set of simple repeating facial gestures (e.g. blinking). Animation player executes the animation loop. From each action it obtains the current frame of animation as a set of FBAP values, blends all the sets together and applies them to the character model via the wrapper. Scene wrapper receives the FBAP value set and interprets the values depending on the character's FBAP mappings. Typically, BAPs are converted to Euler angles and applied to bone transformations, while FAPs are interpreted as morph target blend weights. For cloner and face model generator interactions are even more straightforward and amount to obtaining and updating the model's geometry via the model wrapper. Figure 4: FBAPMapper an OGRE-based application for mapping animation parameters D. Art pipeline As previously indicated, the art pipeline is very flexible. Characters are modeled in 3D modeling applications and exported into the target engine. Naturally, FBAPs need to be mapped to joints and morph targets of the model. This is done using a special plug-in for the 3D modeling application if one is available, otherwise it needs to be handled by a stand-alone application with appropriate 3D format support. For animations the pipeline is similar, and again a plug-in is used for export and import. We also provide stand-alone face model and morph target production applications that use our production libraries. These applications rely on intermediate file formats (currently VRML or OGRE formats, though support for others will be added in the future) to obtain the model, while results are output via the intermediate format in combination with VCM. Fig. 4 shows a screenshot of a simple application for mapping and testing animation parameters. V. EXAMPLES We have so far successfully integrated our system with two open-source rendering engines, with more implementations on the way. The results are presented in this section. A. OGRE OGRE [16] is one of the most popular open-source, cross-platform rendering engines. Its features include a powerful object-oriented interface, support for both OpenGL and Direct3D graphical APIs, shader-driven architecture, material scripts, hardware-accelerated skeletal animation with manual bone control, hardwareaccelerated morph target animation etc. Despite challenges encountered in implementing a wrapper around certain features, we have achieves both face and body animation in OGRE (Fig. 5 and 6).

OGRE is also notable for its extensive art pipeline, supported by exporters from nearly every modeling suite in existence. We initially encountered difficulties in loading complex character models composed of multiple meshes, because basic OGRE doesn't support file formats capable of storing entire scenes. However, this shortcoming is rectified by the community-made DotScene loader plug-in, and a COLLADA loader is also under development by the OGRE community. B. Irrlicht Though Irrlicht [17] doesn't boast OGRE's power, it is nonetheless popular for its small size and ease of use. Its main shortcoming in regard to our system is lack of support for morph target animation. However, we were able to alleviate this by creating a face model with a bone rig and parametrizing it over MPEG-4 FBAPs, with very promising results (see Fig. 8). Unlike OGRE's art pipeline, which is based on exporter plug-ins for 3D modeling applications, Irrlicht's art pipeline relies on a large number of loaders for various file formats. We found the loader for Microsoft.x format to be the most suited to our needs and were able to successfully import several character models, both with body and face rig (Fig. 7). C. Upcoming implementations We are working on integrating visage SDK with several other engines in concurrence. These include: StudierStube (StbES) [19] a commercial augmented reality (AR) kit with a 3D renderer and support for character animation Horde3D [18] a lightweight, open-source renderer Panda3D an open-source game engine known for its intuitive Python-based API Of these we find StbES to be the most promising, as it will enable us to deliver the power of visage SDK animation system to mobile platforms and combine it with StSb's extensive AR features. VI. CONCLUSIONS AND FUTURE WORK Our system supports a variety of character animation features and facilitates rapid application development and art asset production. Its feature set makes it suitable for research and commercial applications such as embodied agents and avatars in networked virtual environments and telecommunications, while flexibility of its architecture means it can be used on a variety of platforms, including mobile devices. We have successfully integrated it with popular graphics engines and plan to provide more implementations in near future, while simultaneously striving to make integration even easier. Furthermore, we are continually working on enhancing our system with new features. An upcoming major upgrade will introduce a new system for interactive motion controls based on parametric motion graphs and introduce character behavior modeling capabilities via BML. Our goal is to develop a universal and modular system for powerful, yet intuitive modeling of character behavior and continue using it as a backbone of our research into high-level character control and applications involving virtual humans. We plan to release a substantial Figure 5: Lip sync in OGRE Figure 6: Body animation in OGRE

Figure 7: Body animation in Irrlicht Figure 8: Facial animation in Irrlicht portion of our system under an open-source license. ACKNOWLEDGMENT This work was partly carried out within the research project "Embodied Conversational Agents as interface for networked and mobile services" supported bythe Ministry of Science, Education and Sports of the Republic of Croatia. It was also partly supported by Visage Technologies. Integration of visage SDK with OGRE, Irrlicht and other engines was done by Mile Dogan, Danijel Pobi, Nikola Banko, Luka Šverko and Mario Medvedec, undergraduate students at the Faculty of Electrical Engineering and Computing in Zagreb, Croatia. REFERENCES [1] M. Thiebaux, A.N. Marshall, S. Marsella, M. Kallmann, "SmartBody: behavior realization for embodied conversational agents," in International Conference on Autonomous Agents, 2008, vol. 1, pp. 151-158. [2] S. Kopp et al., "Towards a common framework for multimodal generation: The behavior markup language," in Intelligent Virtual Agents, 2006, pp. 205-217. [3] I.S. Pandžić, J. Ahlberg, M. Wzorek, P. Rudol, M. Mošmondor, "Faces everywhere: towards ubiquitous production and delivery of face animation," in International Conference on Mobile and Ubiquitous Multimedia, 2003, pp. 49-55. [4] I.S. Pandžić, R. Forchheimer, Ed., MPEG-4 Facial Animation - The Standard, Implementations and Applications, John Wiley & Sons, 2002. [5] ISO/IEC 14496 MPEG-4 International Standard, Moving Picture Experts Group, www.cselt.it/mpeg [6] G. Zorić, I.S. Pandžić, "Real-time language independent lip synchronization method using a genetic algorithm," in special issue of Signal Processing Journal on Multimodal Human- Computer Interfaces, vol. 86, issue 12, pp. 3644-3656, 2006. [7] A. Čereković et al., "Towards an embodied conversational agent talking in Croatian," in International Conference on Telecommunications, 2007, pp. 41-47. [8] G. Zorić, I.S. Pandžić, "A real-time lip sync system using a genetic algorithm for automatic neural network configuration," in IEEE International Conference on Multimedia & Expo, 2005, vol. 6, pp. 1366-1369. [9] C. Pelachaud, "Visual Text-to-Speech" in MPEG-4 Facial Animation - The Standard, Implementations and Applications, I.S. Pandžić, R. Forchheimer, Ed., John Wiley & Sons, 2002. [10] G. Fanelli, M. Fratarcangeli, "A non-invasive approach for driving virtual talking heads from real facial movements," in 3DTV Conference, 2007, pp. 1-4. [11] M. Fratarcangeli, M. Andolfi, K. Stanković, I.S. Pandžić, "Animatable face models from uncalibrated input features," unpublished [12] I.S. Pandžić, "Facial Motion Cloning," Graphical Models Journal, vol. 65, issue 6, pp. 385-404, 2003. [13] D. Eberly, 3D Game Engine Architecture, Morgan Kaufmann, Elsevier, 2005. [14] S. Zerbst, O. Duvel, 3D Game Engine Programming, Course Technology PTR, 2004. [15] Havok Physics Animation 6.00 User Guide, Havok, 2008. [16] OGRE Manual v1.6, 2008, http://www.ogre3d.org/docs/manual/ [17] Nicolas Schulz, Horde3D Documentation, 2009, http://www.horde3d.org/docs/manual.html [18] Nikolaus Gebhardt, Irrlicht Engine 1.5 API Documentation, 2008, http://irrlicht.sourceforge.net/docu/index.html [19] Christian Doppler Laboratory, Graz University of Technology, "Handheld augmented reality," 2008, http://studierstube.icg.tugraz.ac.at/handheld_ar/