Generation of Optimized Facial. Animation Parameters

Size: px

Start display at page:

Download "Generation of Optimized Facial. Animation Parameters"

Alaina Sherman
5 years ago
Views:

1 Generation of Optimized Facial Animation Parameters By Gunnar Hovden THESIS Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Engineering in the School of Engineering of Santa Clara University, 2003 Santa Clara, California

2 As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality. Albert Einstein

3 Generation of MPEG-4 Facial Animation Parameters Gunnar Hovden Department of Computer Engineering Santa Clara University, 2003 Abstract The MPEG-4 FBA (Face and Body Animation) standard provides a set of FAPs (Facial Animation Parameters) for animating a talking face with its moods and expressions. A face model driven by this set of FAPs can produce high quality animation at a bitrate as low as 2 kb/s. The applications for the face model range from as diverse areas as video phone and video conferencing for wireless, portable units like PDAs (Personal Digital Assistants) and cell phones, to game development and movie production. The success of facial animation relies on being able to track facial features and to generate FAPs accurately and reliably. This thesis presents a method for extracting FAPs from a person s face in a video sequence. The goal is to generate FAPs that make the animated face resemble the original face in the video sequence. The proposed method is based on feedback from the render unit during the FAP generation process. This ensures that the animated face is as close to the original face as possible. A penalty function is derived to measure the resemblance between the animated and iv

4 the original face. The FAP generation process includes optimizing the penalty function, which consists of a match function and some barrier functions. The match function compares how well an animated face matches the original face in the video sequence. Knowledge about the appearance of a normal looking face is contained in the barrier functions. Each barrier functions indicates the level of distortion from a normal looking face for a certain part of a face, and advises the optimizer. Unnecessary FAPs are eliminated and the search space is partitioned into smaller, independent subspaces to speed up the optimization process. Steepest Descent Method, Cyclic Coordinate Method, Linear Search as well as Golden Section Line Search are applied to obtain an optimal solution. The results show that the generated FAPs are accurate and the proposed method is very robust. The generated FAPs can drive animations that are lifelike and truthful to the original sequence, making them suitable for very high quality applications. v

5 Table of Contents Abstract... Table of Contents... List of Figures... List of Tables... iv vi xii xvii Chapter 1 Introduction MPEG-4 FBA (Face and Body Animation) Applications Motivation Literature Survey Contribution... 9 Chapter 2 Face Encoding in MPEG Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs) Facial Animation Parameter Units (FAPUs) Encoding of FAP Stream DCT (Discrete Cosine Transform) Arithmetic Coding Chapter 3 FAP Generation Process FAP Generation as an Optimization Problem Proposed Solution for FAP Generation vi

6 3.3 Video Sequences Face Model Render Unit OpenGL Resolution Penalty Function Developing the Penalty Function Convexity First and Second Order Partial Derivatives Chapter 4 Search Strategies Elimination of Unnecessary FAPs Partitioning of the Search Space Search Order Transformation Translating Point of Rotation Condition Number Transforming Outer Lip Resulting Optimization Problem Chapter 5 Solutions to Our Nonlinear Optimization Problem Algorithms for Solving Nonlinear Optimization Problems Steepest Descent Method Cyclic Coordinates Method Golden Section Line Search vii

7 5.1.4 Linear Line Search Stopping Criteria Justification of Choice of Optimization Methods The Problem of Non-Convexity Chapter 6 Masking and Weighting Functions Masking the Face Masking the Nostrils Weighting the Inner Lips Masking the Eye Brows Masking the Iris Chapter 7 Anatomical Constraints Barrier Function Thickness of Lip Upper Lip Above Lower Lip Constraints on Shape of Lips Constraints on Height of Left and Right Corner of Inner Lip Resulting Optimization Problem Chapter 8 FAP Filtering Filtering Head Pose Filtering Other FAPs Chapter 9 Improvement of the Face Model Missing Depth Information Estimating Depth Information viii

8 Chapter 10 Results Erin Brennan Cori Chapter 11 Conclusion and Future Research Acknowledgment Appendix A - FAP Definitions, Units, Groups, Subgroups, Directions and Quantization Step Sizes References Publications ix

9 List of Figures Figure 1.1 Overview of typical MPEG-4 facial animation system... 2 Figure 2.1 FDPs (Facial Definition Parameters) and FAPs (Facial Animation Parameters) in MPEG-4 [1] Figure 2.2 Facial Animation Parameter Units [1] Figure 3.1 Overview of the FAP generation process Figure 3.2 Typical camera setup for acquiring video sequences Figure 3.3 A few examples of face models. From left to right is a real person, a fictional character, a talking blender and a parrot [30] Figure 3.4(a) A rendered face (b) The corresponding mask Figure 3.5 Penalty as a function of fap7 and fap Figure 3.6 Penalty as a function of fap7 with fap13= Figure 3.7 First order derivative of penalty with respect to fap Figure 3.8 Second order derivative of penalty with respect to fap Figure 3.9 Filtered first order derivative of penalty with respect to fap Figure 4.1 The effect of head roll on translation of tip of nose Figure 4.2 The effect of head pitch on translation of tip of nose Figure 4.3 The effect of head yaw on translation of tip of nose Figure 4.4 Contour plot of f pose (loc1, loc2) Figure 4.5 Contour plot of f pose (loc1, fap50). The center of rotation is top of spine Figure 4.6 Contour plot f pose (y1, fap50). The center of rotation is tip of nose Figure 4.7 Penalty plotted as function of loc1, rendered at high resolution x

10 Figure 4.8 First order derivative of f pose (...) with respect to loc Figure 4.9 Second order derivative of f pose (...) with respect to loc Figure 4.10 Step function with ripple effect Figure 4.11 Window function that limits ripple-effect Figure 4.12 Filtered first order derivative Figure 4.13 Second order derivative after filtering Figure 5.1 Illustration of the Golden Section Line Search Figure 5.2 Example of Linear Search Figure 5.3 Example of minimization problem with condition number greater than 1 being solved with Cyclic Coordinate Method Figure 5.4 Example of minimization problem with condition number greater than 1 being solved with Steepest Descent Method Figure 5.5 Example of minimization problem with condition number equal to 1 being solved with Cyclic Coordinate Method Figure 5.6 Example of minimization problem with condition number equal to 1 being solved with Steepest Descent Method Figure 5.7 Face from the video sequence with open mouth Figure 5.8 Detail from the mouth region with open mouth Figure 5.9 Penalty as function of mouth opening in the linear search Figure 6.1 An animated face and the corresponding mask Figure 6.2 Mask used when finding location of head Figure 6.3 Mask used when optimizing inner lip Figure 6.4 Mask used when optimizing eyebrows xi

11 Figure 7.1 Original image from frame 314 in the Erin sequence Figure 7.2 The FAP optimizer makes the lower lip thick in an attempt to match the tongue in the original image in frame 314 from the Erin sequence Figure 7.3 Result after controlling the lip thickness on frame 314 in the Erin sequence Figure 7.4 Original image from frame 715 in the Erin sequence Figure 7.5 Frame 715 in the Erin sequence. The lips look unsatisfactory Figure 7.6 Frame 715 from the Erin video. The border of the upper, inner lip is marked with yellow. The border of the lower, inner lip is marked with blue.. 97 Figure 7.7 The result of applying constraints on the shape of the lips in frame 715 in the Erin sequence Figure 7.8 The result of applying constraints on the shape of the lips in frame 715 from the Erin sequence. The border of the upper lip is marked with yellow, the lower one with blue Figure 7.9 Original image from frame 498 in the Erin sequence Figure 7.10 Frame 498 from the Erin sequence. The upper lip looks torn close to the left lip corner Figure 7.11 Feature points for inner lip. Feature point 2.6 and feature point 2.7 are too low Figure 7.12 Frame 498 from the Erin sequence. The border of the upper lip is marked with yellow. The border of the lower lip is marked with blue Figure 7.13 The result of applying constraints on the shape of the lips in frame 498 in the Erin sequence xii

12 Figure 7.14 The result of applying constraints on the shape of the lips in frame 498 from the Erin sequence. The border of the upper lip is marked with yellow, the lower one with blue Figure 7.15 Original image from frame 1233 in the Erin sequence Figure 7.16 The corner of the left lip is at a different height from the corner of the right lip in frame 1233 in the Erin sequence Figure 7.17 Frame 1233 from the Erin sequence after applying the barrier function Figure 8.1 FAP values for head pose before filtering Figure 8.2 FAP values for head pose after filtering Figure 8.3 FAP values for fap3, fap4 and fap34 before filtering Figure 8.4 FAP values for fap3, fap4 and fap34 after filtering Figure 9.1 Original face from frame 790 in the Erin sequence Figure 9.2 Animated face from frame 790 in the Erin sequence Figure 9.3 Animated face from frame 790 in the Erin sequence with corrected values for the bottom point of the upper teeth Figure 10.1 Neutral face generated by the render unit for Erin Figure 10.2 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Erin Figure 10.3 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Erin Figure 10.4 Neutral face generated by the render unit for Brennan Figure 10.5 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Brennan xiii

13 Figure 10.6 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Brennan Figure 10.7 Neutral face generated by the render unit for Cori Figure 10.8 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Cori Figure 10.9 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Cori xiv

14 List of Tables Table 2.1 Definition of FAP Units Table 4.1 FAPs used during the optimization process Table 5.1 Condition numbers for sub problems Table 7.1 Values for constants c 1,..., c Table 7.2 Values for constants c 11, c 12 and c Table 7.3 Values for constants c 14, c 15, c 16 and c Table 7.4 Values for constants c 1,..., c Table 8.1 Filters applied to FAPs xv

15 Chapter 1 Introduction 1.1 MPEG-4 FBA (Face and Body Animation) The MPEG-4 FBA (Face and Body Animation) standard [1, 2] includes a model for coding human faces at very low bit rates [4, 5]. By using Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs) a face can be defined and animated. The FDPs define a static, neutral face, and the FAPs describe the changes during animation to portray eye movements, lip movements, moods, expressions, etc. The use of FAPs to animate a face allows for very high compression compared to coding a face with conventional techniques like those found in MPEG-1, MPEG-2, and MPEG-4 Natural Visual [1, 2]. Figure 1.1 shows a typical system for facial animation. Frames from a natural video are processed by the FAP Generation Unit, which extracts the necessary information from the natural video and generates FAPs. The 1

16 FAPs can be sent over a network or phone line, or they can be stored on a storage device. The Render Unit receives the FAPs and animates a face based on the FAPs. The animated face can be a stand alone animation, or it can be included as part of a more complex scene where multiple animated faces can appear together with other objects and a background. The other objects and the background can be either synthetic or natural. The MPEG-4 standard offers a rich variety of coding schemes for both natural and synthetic video. Figure 1.1 Overview of typical MPEG-4 facial animation system 2

17 1.2 Applications A compressed stream of FAPs can be sent over a network with a bandwidth as low as 2 kb/s. The very low bandwidth required to send FAPs makes facial animation suitable for many wireless applications. The high quality animation that can be obtained by accurate and reliable FAPs expands the applications far beyond wireless and low bandwidth devices. Some possibilities include: Video phone, video conferencing and video chat rooms Many attempts have been made to design a video phone that will work on an analog phone line. The bandwidth on analog phone lines is limited to kb/s. With conventional compression algorithms, as employed by MPEG-1, MPEG-2, MPEG-4 Natural Visual, H.263 and H.264, the spatial and temporal resolution is too low to yield good quality at this bandwidth. Consumers have so far been less than enthusiastic due to the poor quality on video phones. The bandwidth required to transmit a stream of FAPs is well below what is available on an analog phone line. MPEG-4 facial animation will make it possible to design a video phone system with high quality animation. Existing video conferencing systems use MPEG-1, MPEG-2, H.261, H.263, H.264 or a similar compression standard. The video is transferred on a network with a bandwidth sufficient to yield good quality. By using MPEG-4 facial animation, bandwidth and cost can be significantly reduced. 3

18 Chat rooms have become increasingly popular in recent years. The concept of a chat room is similar to that of a news group, where somebody decides to create a place on the Internet for a special topic. Anyone interested in that topic can exchange ideas and experience. A chat room is a web page which can display typed messages in real-time. When several people are logged on to the same chat page via the Internet, each can type short messages (often with accepted abbreviations) onto an area in the web page. Each message, and replies from the others in the chat room, are immediately visible to all the participants as sequential lines with identification of the writer. Similar to the Internet chat room is the IRC (Internet Relay Chat) chat rooms. An IRC client is a program that is separate from the internet browser and connects to the IRC network via a server. Servers pass messages from user to user over the IRC network. Using an IRC client enables the user to hear and play sounds, receive and transfer files, and chat with people from all over the world. In addition to writing messages, facial animation can enable users to see and be seen in the chat room. Conversation based solely on text can be difficult in certain aspects. Emotions are often conveyed through facial expressions and the voice. Misunderstandings can easily happen when neither facial expressions nor the voice is available. Users of chat rooms have partly compensated by adding symbols to describe their emotions as they write. For example, :-) (a smiling face) means happiness, :-( (a sad face) means sadness or anger, \-o (a yawning face) means boredom, and ;) (a winking face) means the writer is joking or not telling the truth. These "emotion symbols" can be used to express only a limited range of emotions and 4

19 only those emotions that the writer wants to expose. Furthermore, not all readers, in particular newcomers, in a chat room will understand all the emotions expressed through symbols. Adding video of the users of chat rooms can prevent misunderstandings and enhance the chat room experience. However, the most important reason for adding video of the users to chat rooms may be because "it s cool". As most users of chat room are connected to the Internet, either through an analog phone line or an ISDN line, the bandwidth is low and they face the same problems as with current video phone systems. The use of MPEG-4 facial animation has the potential to make video chat rooms popular. MPEG-4 can also secure one s anonymity in chat rooms. Any MPEG-4 compliant face model can receive any MPEG-4 compliant FAP stream. The FAP stream does not follow the face model or vice versa. The FAP stream generated for a user of chat room can therefore be used to drive a model of a different, possibly fictitious, person. The visual anonymity currently present in chat rooms can therefore be maintained. Parents of kids on the net will certainly demand anonymity for their children, as nobody can control who is reading and participating in a chat room. Internet Agents Internet agents are virtual humans who guide the consumer through search engines, e- commerce transactions, and government and other informative sites [16]. Internet agents make the internet experience easier and more personal if used correctly by sites that feature them. The internet agents need to be animated and the MPEG-4 facial model is very suitable for 5

20 animating internet agents. The low bandwidth needed to transmit MPEG-4 facial animation is also important. Video Games Game developers will benefit from the realism provided by high quality animation of close up faces. Automatic FAP generation is necessary to keep the cost down and to meet the short deadlines in the game development industry. Movies Very high quality FAP generation can also be used when producing animation movies. In animation movies, it is labor-intensive, time consuming and costly to draw the facial expression of the characters. Realistic facial expression can be obtained quickly and cost effectively by letting an actor perform the part of the animated figure. Once the FAPs for the actor have been generated they can be used to drive the animated character. More of a curiosity is the possibility of using the FAPs based on one actor to drive a model of a different actor. Someday, some clever head in Hollywood may decide to re-release "Titanic" staring Vivian Leigh and Clark Gable. Wireless applications Compression becomes increasingly important as more portable devices are sold with internet access. The bandwidth of wireless devices like cell phones and PDA s (Personal Digital Assistants) is comparable to or slightly higher than analog phone lines. Currently GPRS 6

21 (General Packet Radio Service) enables cell phones and PDA s to connect to Internet at a theoretical speed of kb/s, but in most cases the actual speed is close to 45 kb/s. Applications mentioned earlier like video phone and animation of characters may also find their way to portable devices with internet access and they will certainly benefit from the very low bandwidth. 1.3 Motivation The success of facial animation relies on being able to track facial features and to generate FAPs accurately and reliably. High quality animation as needed by game developers and movie companies has put higher demands on the accuracy and reliability of the FAP generation. This thesis presents a method based on feedback from a render unit to ensure accuracy and realism for the generated FAP stream. 1.4 Literature Survey The MPEG-4 documents [1, 2] describe in detail the syntax and semantics of the MPEG-4 bit stream. They do not describe the encoding process. The MPEG-4 face model is loosely based on the work presented in [3]. An overview of the MPEG-4 facial animation is found in [4, 5]. 7

22 The decoding of the bit stream and animation of a face model is shown in [3-8]. Face processing and FAP generation cover many different tasks. Among them are detecting and locating one of more faces in an image or a video sequence. Before the FAPs can be generated, the location of the face in an image or in a video sequence must be known. Face detection and tracking are covered in [9-14]. Several techniques have been applied in an attempt to recognize the movements of facial features and to generate FAPs based on a video sequence. The techniques includes tracking of fiducial marks in the face [15]. Fiducial marks are artificial feature points that are easy to recognize and track for a computer. This will typically require the user to paint small dots of different colors in his/her face at specific locations before the video session begins. Most users find this to be inconvenient. Tracking of facial features that appear naturally in faces has been tried in many variations. Tian et al. [17] use templates and edge detection to acquire facial information. A three layer neural network with one hidden layer is then used to produce 13 different facial action units for the lower part of the face. Ahlberg [18] uses eigenspaces and deformable graphs, whereas Sarris et al. [19] use segmentation and known geometrical characteristics, like symmetry of a face, to find feature points. Color snakes are the basis for the work by Seo et al. [20]. Ahn and Kim [21] track colors in the image. They use statistical properties and a face mesh model to generate expression 8

23 parameters in MPEG-4 format. Kim et al. [22] use edge detection in combination with templates. Each template is a small part of a face used to find a certain feature point. Bernoegger et al. [23] expand on the techniques in [22]. Otsuka and Ohya [24] use optical flow and a Fourier transform for feature extraction and hidden Markov models for feature recognition. Known difficulties with the above methods include lack of robustness and poor accuracy. 1.5 Contribution The contribution of this thesis is to introduce a new method for automatic FAP generation and to demonstrate that the method works in a real-world system. Unlike all previous methods in the literature, the introduced method utilizes feedback from the render unit to generate FAPs that make animations accurate, lifelike, and truthful to the original face. This thesis also presents a method for estimating three-dimensional coordinates based on a sequence of two-dimensional images. The face models used in this thesis lack accurate depth information because of the way the face models are acquired. Three-dimensional coordinates for feature points are estimated based on a video sequence. The face model itself is improved by correcting the inaccurate depth values. This leads to better results as the FAP generation process is affected by inaccuracies in the face model. 9

24 Chapter 2 Face Encoding in MPEG Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs) The MPEG-4 standard [1, 2] includes techniques designed for specifying and modeling the human face. A face is described with a total of 80 FDPs (Facial Definition Parameters). Each FDP is a point on the face (for example, the tip of the nose, the left corner of the left eye, the top of the right ear, etc.) and is defined with a three-dimensional vector giving the exact location in space. The FDPs refer to a neutral face, i.e., a face with no particular expression, eyes open, mouth closed and other constraints as defined in MPEG-4. The constraints are as follows [1]: The coordinate system is right-handed; head axes are parallel to the world axes. 10

25 Gaze is in direction of z-axis. All face muscles are relaxed. Eyelids are tangent to the iris. The diameter of the pupil is one third of the diameter of the iris. Lips are in contact; the line of the lips is horizontal and at the same height at the lip corners. The mouth is closed and the upper teeth touch the lower ones. The tongue is flat, horizontal with the tip of tongue touching the boundary between upper and lower teeth (feature point 6.1 touching feature point 9.11) (see Figure 2.1). The FDPs are specified and transmitted from the encoder to the decoder prior to the animation. If the encoder does not transmit FDPs to the decoder, the decoder will use a default set of FDPs. Once the face model is established at the decoder, either by FDPs from the encoder or by default FDPs, the movements, moods and expressions of the face can be animated by a stream of FAPs. The FAPs are independent from the FDPs in the sense that a set of FAPs can be used to animate any face based on any set of FDPs. An actor can therefore act a part from which FAPs are extracted and later used to drive a model of a different actor or even a model of an animal or a cartoon character. The FAPs can be categorized into three different types: High level FAPs: A high level FAP can be either a viseme or an expression. A viseme describes the look of the mouth and lips when a particular sound (phoneme) is pronounced. Two visemes can be combined to a compound viseme. An expression describes an expression in a face, for example joy, anger, disgust, etc. Two expressions can be combined to a compound expression, for example 20% disgust and 40% surprise. 11

26 High-level FAPs are not considered in this work. Methods for recognizing and quantifying facial expressions are described in [25] and [26]. Displacement FAP: Each displacement FAP refers to a particular FDP and is a single value describing the displacement of that particular FDP compared to a neutral face. The displacement can be along either of the three axes in space. Examples of displacement FAPs are the horizontal displacement of the outer, left corner of the lip, or the vertical displacement of the right end of the right eyebrow. Angle FAP: An angle FAP describes rotation around an axis, for example the angle the head is tilted forward around the tip of the spine. Appendix A lists all the FAPs defined by MPEG-4. Many features on a face do not change, for example the distance between the ears, or the length of the nose. Therefore, relatively few FAPs are necessary to drive an animation. Whereas 80 FDPs, which are all three-dimensional vectors, are needed to specify a face, only 66 FAPs (if we do not consider the two high-level FAPs), which are all single values, are needed to portray the moods and expressions of a talking face. Glasses, mustaches, beards, etc., are not specifically addressed by the MPEG-4 standard, but can be accomplished by using a face model with texture mapping. Figure 2.1 shows the FDPs in MPEG-4 FBA [1, 2]. The solid dots ( ) show FDPs that are directly affected by one or more FAPs. The open dots ( ) show FDPs that are not affected by or only indirectly affected by FAPs. For example, feature point 8.1 is marked with a solid dot because it is directly affected by FAP # 51. Feature point 8.9 and feature point 8.10 are 12

27 Figure 2.1 FDPs (Facial Definition Parameters) and FAPs (Facial Animation Parameters) in MPEG-4 [1] 13

28 marked with open dots, because there are no FAPs that affect those points directly, but a good face model will move those points along with feature point 8.1. The table in Appendix A lists all the FAPs. For each FAP, the table contains, from left to right: The name of the FAP A short description of the FAP The unit used to represent the FAP (see Section 2.2) Whether the FAP is unidirectional (can only have positive value) or bidirectional (can have positive or negative value). For example, fap3 (open_jaw) is unidirectional, because the jaw is fully closed when fap3 = 0 and open if fap3 > 0. fap15 (shift_jaw) is bidirectional, because the jaw is centered in the face when fap15 = 0 and shifts to left or right for negative or positive values for fap15. The direction of the movement for a positive FAP value. The direction is along one of the three axes x, y or z. The FDP group number The FDP subgroup number The quantization step size (see Section 2.3) 2.2 Facial Animation Parameter Units (FAPUs) All the low level FAPs (angle and displacement FAPs) are expressed in terms of FAPU (Facial Animation Parameter Units). FAPUs correspond to fractions of distances between some key 14

29 facial features. The fractions are represented with fixed precision. The use of FAPUs allows interpretation of FAPs on any face model in a consistent way, producing reasonable results in terms of expression and movement. Table 2.1 lists the six units used when representing FAPs in MPEG-4. For each FAPU, the definition based on the distances between feature points in the face, a short description, and the FAPU value are listed. The notation 3.1.y means the y-value for feature point 3.1 in the neutral face (see Figure 2.1). IRISD, ES, ENS, MNS, MW and AU are the FAP units. IRISD0, ES0, ENS0, MNS0, MW0 are intermediate variables used only to define the FAP units. Figure 2.2 shows a face where the FAP units are indicated. The table in Appendix A lists, for each FAP, which FAPU is used when coding the FAP values. Table 2.1 Definition of FAP Units IRISD0 = 3.1.y y = 3.2.y y Description Iris diameter (by definition it is equal to the distance between upper and lower eyelid) in a neutral face FAPU Value IRISD = IRISD0 / 1024 ES0 = 3.5.x x Eye separation ES = ES0 / 1024 ENS0 = 3.5.y y Eye - nose separation ENS = ENS0 / 1024 MNS0 = 9.15.y y Mouth - nose separation MNS = MNS0 / 1024 MW0 = 8.3.x x Mouth width MW = MW0 / 1024 AU Angle Unit 10-5 rad 15

30 Figure 2.2 Facial Animation Parameter Units [1] As an example, let us consider FAP # 20 (close_t_r_eyelid), which moves feature point 3.2. From the table in Appendix A, we see that IRISD is the right FAPU for coding FAP # 20. From Table 2.1 we see that IRISD = IRISD0/1024 = (3.2.y-3.4.y)/1024. If we assume that feature point 3.2 is located at (-150, 200, 50) and feature point 3.4 is located at (-150, 180, 50) in the neutral face, then we get that IRISD = ( )/1024 = 20/1024. If we want the eyelid half closed, i.e., we want to move feature point 3.2 to position (-150, 190, 50), the distance we need to move is = 10. We then encode FAP # 20 with the value 512 because 512 IRISD = /1024 = 10, which is exactly the desired displacement. 16

31 2.3 Encoding of FAP Stream FAP values transmitted by the encoder are compressed either by DCT (Discrete Cosine Transform) or by arithmetic coding. The DCT compression scheme has lower complexity than arithmetic coding, but it introduces a 16-frames delay in the decoder as a 16-point DCT is used to compress the FAPs from 16 consecutive frames. The arithmetic coding scheme does not introduce delay. DCT and arithmetic coding of FAPs are briefly discussed in Sections and DCT (Discrete Cosine Transform) For any given frame, FAPs can be derived at the encoder side in one of three ways, as specified by the bit stream: 1. The FAP values are transmitted by the encoder. The FAP values for a sequence of 16 consecutive frames are compressed with a 16-point DCT and quantization. The DC component of the transform coefficients are predicted from the previous sequence of 16 frames, if such frames exist. The prediction error and the AC coefficients are compressed and coded with Huffman codes and run length codes. 2. The FAPs retain the values previously sent by the encoder. This is useful if some FAPs are unchanged from the previous frame. No coding is needed in this case. 17

32 3. The FAP values have to be estimated by the decoder. For example, the encoder may transmit the inner lip, but not the outer lip. The decoder is then expected to estimate good values for the outer lip based on the inner lip. The 16-point DCT cannot be completed until the 16 consecutive frames are known. Hence, a 16-frames delay, about 0.5 second to 1 second delay for frame rates between 15 to 30 frames per second, is introduced and makes DCT coding unsuitable for real-time, interactive systems Arithmetic Coding For any given frame, FAPs can be derived at the encoder side in one of three ways, as specified by the bit stream: 1. The FAP values are transmitted by the encoder. In this case, the FAP values are encoded with arithmetic coding. 2. The FAPs retain the values previously sent by the encoder. This is useful if some FAPs are unchanged from the previous frame. No coding is needed in this case. 3. The FAP values have to be estimated by the decoder. For example, the encoder may transmit the inner lip, but not the outer lip. The decoder is then expected to estimate good values for the outer lip bases on the inner lip. The precision (quantization) of the FAP values is chosen to accommodate sufficient accuracy without wasting bits. Although encoding FAP values with arithmetic codes has higher 18

33 complexity than encoding with DCT, the arithmetic encoding does not introduce delay. Arithmetic coding is therefore suitable for real-time, interactive systems like video phone and teleconferencing. 19

34 Chapter 3 FAP Generation Process The problem of FAP generation is to determine the value for each of the 66 FAPs so that the animated face looks as close to the original face as possible. We will consider the 66 FAPs as a 66-dimensional space, in which we need to find the point that provides the best match between the original face and the animated face. 3.1 FAP Generation as an Optimization Problem Consider a penalty function penalty = f(fap3, fap4, fap5,..., fap68), where fap3, fap4, fap5,..., fap68, are the values of the 66 low-level FAPs (fap1 and fap2 are reserved for viseme and expression) and penalty indicates how good the animated face matches the original face. 20

35 Higher value means poorer match. If penalty = 0, then the original face and the animated face are identical. The range of f, penalty, is real and non-negative. The 66 FAP values constitute the domain of the penalty function, f(...). The FAPs are all coded as integers. Hence, the domain of f(...) is a 66-tuple of discrete values. The FAPs are subject to various constraints. For example, the left corner of the lips should be to the left side of the right corner of the lips. The problem of generating FAPs is now transformed into the problem of minimizing f(...), given the constraints on the FAPs. The solution to this optimization problem is the set of FAPs that best matches the rendered face to the original face. 3.2 Proposed Solution for FAP Generation Figure 3.1 shows an overview of the FAP generation process. A video sequence contains frames of a talking person s face. The render unit generates an animated face based on an initial set of FAPs. The animated face is compared to the original face from the video sequence and the penalty for the animated face is computed. The penalty is fed back to the FAP optimizer, and is used to guide the FAP optimizer to find a set of better FAPs. Determining the values for the FAPs is an iterative process. The process continues until some stopping criterion is fulfilled, and the FAPs are finalized. 21

36 Figure 3.1 Overview of the FAP generation process 3.3 Video Sequences The video sequences used in this work are all shot with a digital video camera with a spatial resolution of pixel and a temporal resolution of 30 half-frames per second. The video sequences are all interlaced (see Section 3.5.2). Each pixel is represented with 8 bits for each of the three colors red, green and blue. The real-world camera setup is shown in Figure 3.2. The camera is placed in front of the person and at the same height as the person s face. The 22

37 distance between the face and the camera is about 150 cm. The lighting should be sufficient to make the face appear clear and with good color saturation. The face should not be underexposed. Figure 3.2 Typical camera setup for acquiring video sequences 3.4 Face Model The render unit used in this work is based on a three-dimensional face model with texture mapping [30]. The face model must be fitted to the face for which we want to generate FAPs prior to FAP generation. This process involves fitting a texture map from a frontal view of the face to the model and defining the position of the FDPs. The face model contains 274 triangles ordered in a Delaunay mesh. Figure 3.3 shows a few examples of the capabilities of the face model. Both real people, fictional characters and animals can be animated with the face model. 23

Figure 3.3 A few examples of face models. From left to right is a real person, a fictional character, a talking blender and a parrot [30]. 3.5 Render Unit The face model is rendered by a C program that makes calls to the OpenGL [27] graphics library.

38 Figure 3.3 A few examples of face models. From left to right is a real person, a fictional character, a talking blender and a parrot [30]. 3.5 Render Unit The face model is rendered by a C program that makes calls to the OpenGL [27] graphics library. The render unit delivers high quality, lifelike animations based on the face model and the FAPs. Orthogonal view is used during rendering as perspective view does not show any noticeable improvement of the results over orthogonal view OpenGL OpenGL was introduced in 1992 as a 2D and 3D graphics API (Application Programming Interface). Today it is widely used and supported and is available for several different operating systems, including UNIX, Linux, Macintosh OS and Microsoft Windows. OpenGL 24

39 is callable from Ada, C, C++, Fortran, Python, Perl and Java. Most high quality graphics cards have hardware support for OpenGL calls, making rendering very fast and freeing up CPU time. The feature particularly interesting for our purpose is OpenGL s ability to handle 3D triangles with texture mapping. OpenGL takes care of texture, scaling, light sources, anti-aliasing, lighting, shading, transformation and Z-buffering (for computing visibility and occlusions of objects). The graphic card used in this work only supports hardware acceleration for OpenGL on windows that are visible on the screen. Although it is possible to create and render the face on a virtual, non-visible, window, it is very slow because OpenGL calls are then executed in software with no hardware support. Hence, we have chosen to show all OpenGL windows on the screen during the optimization process Resolution To reduce the artifacts that result from a finite resolution, it is desirable to keep the resolution as large as possible, up to the resolution of the original video. The original video is pixels and interlaced. Being interlaced means that all even lines are shot 1/60 s after all the odd lines. In other words, the lines immediately above and under are shot 1/60 s before or after the line itself. Scaling down the video with factors other than 1/2, 1/4, 1/8,..., makes the 25

40 artifacts resulting from the interlacing very visible. We have chosen a resolution of pixels. The vertical resolution is the same as for the original video to limit the artifacts. The far left side and the far right side of the original video are removed, and the 420 pixels in the middle are preserved because that is where the face is. The horizontal resolution is set to 420, because then three windows can be placed side by side. 420 pixels horizontally are sufficient to show the face. The three windows are used to show the original face from the video, the animated face from the render unit and a utility window for helpful information. Each window has a 3 pixel wide border (decided by the operating system) on each side. The total width of the three windows is 3 ( ) = 1278 pixels, which fit a screen resolution of pixels. True color is used, i.e., 8 bits for each of the three colors red, green and blue. 3.6 Penalty Function The success of the method described in this work relies heavily on the penalty function. The penalty function needs to accurately describe, with one single value, how well two faces match Developing the Penalty Function The complexity of the penalty function depends on the quality of the face model and the render unit. If the image from the render unit is lifelike then a simple penalty function can be 26

41 used with great success. If the image is of low quality, a complex penalty function must be derived to compensate. A penalty function with psycho-visual properties, based on how humans interpret images in general and faces in particular, may be necessary. We are fortunate to have a very high quality render unit, giving very lifelike results. A pixelby-pixel penalty function will suffice for now. Each pixel in the original image will be compared with the pixel at the corresponding location in the animated image. We are using RGB color space. Colors are represented with a three-tuple of real numbers (r, g, b) ranging from 0 to 1. A match function, match(r1, g1, b1, r2, g2, b2), tells how well the colors match. The following characteristics are required from the match function: The match function should return values ranging from 0 to 1, higher number indicates better match. The match function should return 1 if the two colors (including their intensities) are identical. The match function should return a value smaller than 1 if the two colors (including their intensities) are not identical. The match function should not be color blind, i.e., it should return 0 if the colors are distinctly different even if their intensities (grey levels) are the same. For example, match(1, 0, 0, 0, 1, 0) should return 0 because red (1, 0, 0) is distinctly different from green (0, 1, 0). The match function should distinguish between different intensity levels. For example, match(0.4, 0.4, 0.4, 0.6, 0.6, 0.6) should return a value smaller than 1, because the grey In Chapter 7 we will evolve the penalty function to include knowledge about the appearance of human faces. 27

42 (0.4, 0.4, 0.4) has a lower intensity than that of the grey (0.6, 0.6, 0.6), even though they are of the same color. The skin on the face is mostly of the same color different intensity levels are thus important for matching. We have derived the following function, which satisfies all the requirements above: match(r 1,g 1,b 1,r 2,g 2,b 2 ) (1 abs(r 1 r 2 )) (1 abs(g 1 g 2 )) (1 abs(b 1 b 2 )) (3.1) where abs(...) is the absolute value. Now that we have a function that tells us how well two colors match, the penalty can be computed in two simple steps: 1. Create a mask that preserves the pixels inside the face and ignores the pixels outside the face (the background). Figure 3.4(a) shows a sample face. Figure 3.4(b) shows the corresponding mask where the black area is masked out and the white area is preserved when calculating the penalty. 2. Calculate the sum of 1-match(...) of all pixels inside the face according to (3.2). The variables ani(x, y) and ori(x, y) are the pixel values at position (x, y) for the animated face and for the original face, respectively. The subscripts r, g and b are used to denote respectively, the red, green and blue color components. weight(...) in (3.2) serves as the masking function. 28

43 penalty x y f(fap3,..., fap68) (3.2) weight(x,y) [1 match(ori r (x,y),ori g (x,y),ori b (x,y),ani r (x,y),ani g (x,y),ani b (x,y))] weight(x, y) x y where weight(x,y) 1, if the pixel at (x,y) is included in the face 0, if the pixel at (x,y) is not included in the face Figure 3.4(a) A rendered face (b) The corresponding mask It is important to divide the numerator in (3.2) by the number of pixels in the rendered face (given by the sum of the weights) to prevent the FAP optimizer unit from minimizing the penalty by making the head small. The masking function, weight(x, y), can be more sophisticated by not limiting its values to only 0 and 1. It can then serve as a weighting function to emphasize important parts of the face. This will be discussed in more detail in Chapter 6. 29

44 The mask is based on the rendered face rather than on the original face, since the area occupied by the rendered face, as opposed to the original face, can be determined very easily. Distinguishing between the face and the background in the original video is difficult. Although (3.2) defines f(fap3,..., fap68) in terms of pixels, color matches and weights, it is important to remember that this merely provides a way of evaluating the function f(fap3,..., fap68). The function f(fap3,..., fap68) is, as stated before, a function of the 66 FAP values given by fap3,..., fap68. It satisfies the requirements to a function. f(fap3,..., fap68) returns only one value for any argument and it returns the same value every time it is evaluated with the same argument Convexity Most algorithms for solving nonlinear optimization problems assume strict convexity. Let E n denote the n-dimensional Euclidian space). A function q(x), where x=(x 1,..., x n ) E n, is said to be strictly convex in E n if and only if q(λx 1 + (1-λ)x 2 )<λq(x 1 ) + (1-λ)q(x 2 ), for all x 1, x 2 E n and for all λ (0, 1). Strict convexity for q(x) implies that every local minimum of q(...) is identical to the unique global minimum. Before attempting to minimize the penalty function we need to ensure that the penalty function is convex. Figure 3.5 is a graph showing the penalty of our test face as a function of fap7 and fap13 only. fap7 and fap13 are the horizontal and the vertical displacements of the right corner 30

of the inner lip. The penalty varies between 0.06518 (shown as white) and 0.10153 (shown as black).

45 of the inner lip. The penalty varies between (shown as white) and (shown as black). The graph shows that the penalty function of fap7 and fap13 is convex, possibly with some noise, within the shown range. The other FAPs exhibit similar behaviors. Although the penalties for the FAPs studied have been shown to be convex within some range, we cannot guarantee convexity for all FAPs for all frames for all ranges. In the event that the penalty is not convex for a certain FAP in a certain frame within a certain range, the optimization may fail for that FAP in that frame. Luckily, the occurrence of a failed optimization is extremely rare. Although the penalty function is convex within a certain range of the FAPs, we can find examples where the penalty function is not convex for certain FAPs, in certain frames, for a certain range. Section 5.4 will discuss non-convexity in more details. Figure 3.5 Penalty as a function of fap7 and fap13 31

46 3.6.3 First and Second Order Partial Derivatives Many nonlinear optimization methods employ first and second order partial derivatives. It is desirable to be able to calculate derivatives for our penalty function. The first order, partial derivative of a function q(x 1,..., x n ) is defined as q(x q(x x 1,..., x n ) lim 1,..., x i x i,..., x n ) q(x 1,..., x i,..., x n ) i x i 0 x i (3.3) and can be approximated by letting x i approach a small value, greater than 0: q(x x 1,..., x n ) i q(x 1,..., x i x i,..., x n ) q(x 1,..., x i,..., x n ), for x x i small i (3.4) We obtain second order, partial derivative as follows: 2 q(x x i x 1,..., x n ) j ( q(x x i x 1,..., x n )) j ( q(x,..., x x 1 j j,..., x n ) q(x 1,..., x j,..., x n ) ), for x x i x j small j q(x 1,,..., x i x i,..., x j x j,..., x n ) q(x 1,..., x i,..., x j x j,..., x n ) x i x j q(x 1,..., x i x i,..., x j,..., x n ) q(x 1,..., x i,..., x j,..., x n ), for x x i x i small j 32

47 q(x 1,,..., x i x i,..., x j x j,..., x n ) q(x 1,..., x i,..., x j x j,..., x n ) x i x j (3.5) q(x 1,..., x i x i,..., x j,..., x n ) q(x 1,..., x i,..., x j,..., x n ) If i=j, we can simplify (3.5) to 2 x 2 i q(x 1,..., x n ) q(x 1,..., x i 2 x i,..., x n ) 2q(x 1,..., x i x i,..., x n ) q(x 1,..., x i,..., x n ), x 2 i (3.6) for x i small The useful search range is from -500 to 500 for most displacement FAPs, and from to for angle FAPs, so x i = 1 is considered small. Figure 3.6 shows the penalty as a function of fap7 while setting fap13 = 6. From a geometrical point of view, the penalty curve in Figure 3.6 is the intersection of the plane given by fap13 = 6 and the convex graph in Figure 3.5. The noise seen in the signal in Figure 3.6 is due to the discrete nature of FAPs and the limitation in the image resolution. Figure 3.7 shows the derivative of the penalty with respect to fap7, calculated according to (3.4). As expected, the minimum of the graph in Figure 3.6 is coincident with the zerocrossing in the first order derivative shown in Figure 3.7. Figure 3.8 shows the second order derivative of the penalty with respect to fap7, calculated according to (3.6). It is well known that calculating derivatives amplifies noise, which is also the case here. The first order derivative is somewhat noisy, but may still be used as basis for optimization unless the optimization method is very sensitive to the accuracy of the first order derivatives. The second 33

48 Figure 3.6 Penalty as a function of fap7 with fap13=6 order derivative is clearly too noisy and has no practical interest in its current form. First and second order derivatives for other FAPs exhibit similar behavior. It is tempting to process the derivatives to remove noise. Removal of noise involves some kind of filtering. Filtering requires multiple evaluations of the penalty function. Evaluation of the penalty function is time consuming because each evaluation corresponds to one rendering of the face. Hence, the use of derivatives should be avoided or be kept at a minimum so that the optimization process is not prohibitively slow. Figure 3.9 shows the result of applying a 5-point Gaussian filter to the graph in Figure 3.7. From (3.4) we see that calculating one first order derivative requires two function evaluations. A 5-point filter requires five derivatives, but because each derivative shares one function 34

49 Figure 3.7 First order derivative of penalty with respect to fap7 evaluation with its neighbor, only six function evaluations are required to calculate the first order derivative with a 5-point Gaussian filter. The filtered first order derivatives are used for the Steepest Descent Method described in Section Second order derivatives are not used because of the excessive amount of noise. We will, however, in Section 4.4.2, take a closer look at second order derivatives in order to justify our choices of search methods and search strategies. 35

50 Figure 3.8 Second order derivative of penalty with respect to fap7 Figure 3.9 Filtered first order derivative of penalty with respect to fap7 36

51 Chapter 4 Search Strategies We have defined the penalty function, its domain and range. In theory, we could just feed our optimization problem and a suitable algorithm for solving nonlinear optimization problem to the computer and await the amazing results. The reality is a little different, however. Searching in a 66-dimensional space is hard. When we consider noise and the problem of local versus global minima, the task becomes overwhelming. Hence, we need search strategies to make the search successful. This is described in Sections 4.1 through Elimination of Unnecessary FAPs Not all of the FAPs defined in the MPEG-4 FBA standard are necessary to make a lifelike animation that is truthful to the original face. For example, fap14 (thrust_jaw), fap16 37

52 (push_b_lip) and fap17 (push_t_lip) control the depth values of the jaw, the bottom lip and the upper lip, respectively. Changes in depth values are not very noticeable when mostly frontal view is used. Therefore, fap14, fap16 and fap17 are omitted in the FAP generation process. Upward and compressing movement of chin (feature point 2.10) is affected by fap18 (depress_chin) and this FAP has also been omitted. Furthermore, fap27 (thrust_l_eyeball) and fap28 (thrust_r_eyeball) adjust the depth for the two eyeballs. We have yet to see a person being able to adjust the depth of his or her eyeballs, so those FAPs are not used in this work. The size of the pupils, determined by fap29 and fap30, are also omitted. The tongue is controlled by fap43, fap44, fap45, fap46 and fap47 in the MPEG-4 FBA standard [1]. The face model we are using does not include the tongue. Hence, FAPs for the tongue are not generated. fap61, fap62, fap63 and fap64 are used to stretch and bend the nose. The face model does not support these FAPs, and they are therefore not used in this work. Finally fap65, fap66, fap67 and fap68, which raise and pull the ears, are not used in this work. Of the 66 low-level FAPs defined in the MPEG-4 FBA standard, we exclude 21 FAPs and preserve 45 FAPs. There are no FAPs for controlling the head s movement vertically, horizontally, or back and forth. The MPEG-4 FBA standard controls the head s movement along the three axes by BAPs (Body Animation Parameters). A body is defined and connected to the head at feature point 7.1, which is the top of the spine. The body s movement vertically, horizontally and back and forth is controlled by BAP # 181, BAP # 182 and BAP # 183, respectively [29]. Prior to optimizing the features of the face, it is necessary to know the exact location of the face on the screen and the size of the head. The location depends on the head s movement in the three 38

53 directions vertical, horizontal and back and forth. In our case, because we are using an orthogonal projection of the 3-dimensional face to the screen, the last direction (back and forth) is used to control the scale or size of the face. We need to include three variables controlling the location and size of the face in our search. Since BAP # 181, BAP # 182 and BAP # 183 do not necessarily specify the head s location in space, because the location of the top of spine, and hence, the head, also depend on the angle of all the joints connecting the head to the feet, we will instead introduce the three "location" parameters loc1, loc2 and loc3. The head s horizontal and vertical movement is controlled by loc1 and loc2, respectively. The last parameter, loc3, is a scaling factor used to control the size of the head on the screen. These three additional parameters augment the search space from 45 dimensions to 48 dimensions. We have selected the FAPs that are necessary to produce realistic and lifelike animations of high quality. Table 4.1 lists all the 45 FAPs and the 3 LOCs used in our experiment. The solution to our minimization problem is now defined as the set of values (loc1, loc2, loc3, fap4,..., fap60) that minimizes penalty, as expressed in (4.1). penalty min f(loc1, loc2, loc3, fap3, fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap15, fap19, fap20, fap21, fap22, fap23, fap24, fap25, fap26, fap31, fap32, fap33, fap34, fap35, fap36, fap37, fap38, fap39, fap40, fap41, fap42, fap48, fap49, fap50, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) (4.1) 39

54 Table 4.1 FAPs used during the optimization process Head location FAP/LOC FAP name FAP/LOC description loc1 loc2 loc3 Horizontal movement of head Vertical movement of head Size of head Jaw fap3 open_jaw Vertical jaw displacement (does not affect mouth opening) fap15 shift_jaw Side to side displacement of jaw Inner lip fap4 lower_t_midlip Vertical top middle inner lip displacement fap5 raise_b_midlip Vertical bottom middle inner lip displacement fap6 stretch_l_cornerlip Horizontal displacement of left inner lip corner fap7 stretch_r_cornerlip Horizontal displacement of right inner lip corner fap8 lower_t_lip_lm Vertical displacement of midpoint between left corner and middle of top inner lip fap9 lower_t_lip_rm Vertical displacement of midpoint between right corner and middle of top inner lip fap10 raise_b_lip_lm Vertical displacement of midpoint between left corner and middle of bottom inner lip fap11 raise_b_lip_rm Vertical displacement of midpoint between right corner and middle of bottom inner lip fap12 raise_l_cornerlip Vertical displacement of left inner lip corner fap13 raise_r_cornerlip Vertical displacement of right inner lip corner 40

55 Eyelids fap19 close_t_l_eyelid Vertical displacement of top left eyelid fap20 close_t_r_eyelid Vertical displacement of top right eyelid fap21 close_b_l_eyelid Vertical displacement of bottom left eyelid fap22 close_b_r_eyelid Vertical displacement of bottom right eyelid Eyeballs fap23 yaw_l_eyeball Horizontal orientation of left eyeball fap24 yaw_r_eyeball Horizontal orientation of right eyeball fap25 pitch_l_eyeball Vertical orientation of left eyeball fap26 pitch_r_eyeball Vertical orientation of right eyeball Eyebrows fap31 raise_l_i_eyebrow Vertical displacement of left inner eyebrow fap32 raise_r_i_eyebrow Vertical displacement of right inner eyebrow fap33 raise_l_m_eyebrow Vertical displacement of left middle eyebrow fap34 raise_r_m_eyebrow Vertical displacement of right middle eyebrow fap35 raise_l_o_eyebrow Vertical displacement of left outer eyebrow fap36 raise_r_o_eyebrow Vertical displacement of right outer eyebrow fap37 squeeze_l_eyebrow Horizontal displacement of left eyebrow fap38 squeeze_r_eyebrow Horizontal displacement of right eyebrow Cheeks fap39 puff_l_cheek Horizontal displacement of left cheek 41

56 fap40 puff_r_cheek Horizontal displacement of right cheek fap41 lift_l_cheek Vertical displacement of left cheek fap42 lift_r_cheek Vertical displacement of right cheek Head pose fap48 head_pitch Head pitch angle from top of spine fap49 head_yaw Head yaw angle from top of spine fap50 head_roll Head roll angle from top of spine Outer lip fap51 lower_t_midlip_o Vertical top middle outer lip displacement fap52 raise_b_midlip_o Vertical bottom middle outer lip displacement fap53 stretch_l_cornerlip_o Horizontal displacement of left outer lip corner fap54 stretch_r_cornerlip_o Horizontal displacement of right outer lip corner fap55 lower_t_lip_lm_o Vertical displacement of midpoint between left corner and middle of top outer lip fap56 lower_t_lip_rm_o Vertical displacement of midpoint between right corner and middle of top outer lip fap57 raise_b_lip_lm_o Vertical displacement of midpoint between left corner and middle of bottom outer lip fap58 raise_b_lip_rm_o Vertical displacement of midpoint between right corner and middle of bottom outer lip fap59 raise_l_cornerlip_o Vertical displacement of left outer lip corner fap60 raise_r_cornerlip_o Vertical displacement of right outer lip corner 42

57 4.2 Partitioning of the Search Space The complexity of the optimization problem can be reduced by partitioning the 48-tuple of LOC and FAP values into smaller, independent optimization problems. It is considerably easier (and faster) to partition a search space into independent spaces with lower dimension and search in each smaller space than to search in the original space. Dissimilarities between the original and the animated face in the upper part of the face (eyes and eyebrows) do not affect the dissimilarities in the lower part of the face (nose, lips, cheek, and chin), and vice versa. We can therefore search for an optimal solution for FAPs related to the upper face independently from the FAPs related to the lower face. Further partitioning is possible, an example is to partition the left side from the right side of the face. By studying the face model we have determined that the following constitutes a set of independent partitions of the search space (see Table 4.1 for a definition of each FAP): Head location and head pose (loc1, loc2, loc3, fap48, fap49 and fap50) Jaw (fap3 and fap15) Inner and outer lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59 and fap60) Left cheek (fap39 and fap41) Right cheek (fap40 and fap42) Left top eyelid (fap19) Right top eyelid (fap20) 43

58 Left bottom eyelid (fap21) Right bottom eyelid (fap22) Left eyeball (fap23 and fap25) Right eyeball (fap24 and fap26) Left eyebrow (fap31, fap33, fap35 and fap37) Right eyebrow (fap32, fap34, fap36 and fap38) Due to the independence of the partitions, we can rewrite the penalty function, f(...), as a sum of many functions with fewer arguments: f(loc1, loc2, loc3, fap3, fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap15, fap19, fap20, fap21, fap22, fap23, fap24, fap25, fap26, fap31, fap32, fap33, fap34, fap35, fap36, fap37, fap38, fap39, fap40, fap41, fap42, fap48, fap49, fap50, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) f pose (loc1, loc2, loc3, fap48, fap49, fap50) f jaw (fap3, fap15) f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) f top left eyelid (fap19) f top right eyelid (fap20) f bottom left eyelid (fap21) f bottom right eyelid (fap22) f left eyeball (fap23, fap25) f right eyeball (fap24, fap26) f left eyebrow (fap31, fap33, fap35, fap37) f right eyebrow (fap32, fap34, fap36, fap38) f left cheek (fap39, fap41) f right cheek (fap40, fap42) (4.2) 44

59 Using (4.2), our minimization problem can now be written as follows: penalty min f(loc1, loc2, loc3, fap3, fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap15, fap19, fap20, fap21, fap22, fap23, fap24, fap25, fap26, fap31, fap32, fap33, fap34, fap35, fap36, fap37, fap38, fap39, fap40, fap41, fap42, fap48, fap49, fap50, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) min [f pose (loc1, loc2, loc3, fap48, fap49, fap50) f jaw (fap3, fap15) f lip (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) f left cheek (fap39, fap41) f right cheek (fap40, fap42) f top left eyelid (fap19) f top right eyelid (fap20) f bottom left eyelid (fap21) f bottom right eyelid (fap22) f left eyeball (fap23, fap25) f right eyeball (fap24, fap26) f left eyebrow (fap31, fap33, fap35, fap37) f right eyebrow (fap32, fap34, fap36, fap38)] (4.3) And finally, we can rewrite (4.3) and state our optimization problem as: penalty min f pose (loc1, loc2, loc3, fap48, fap49, fap50) min f jaw (fap3, fap15) min f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) min f top left eyelid (fap19) min f top right eyelid (fap20) min f bottom left eyelid (fap21) min f bottom right eyelid (fap22) min f left eyeball (fap23, fap25) min f right eyeball (fap24, fap26) min f left eyebrow (fap31, fap33, fap35, fap37) min f right eyebrow (fap32, fap34, fap36, fap38) min f left cheek (fap39, fap41) (4.4) min f right cheek (fap40, fap42) Our original search space of 48 dimensions is now reduced to 13 independent search spaces with between 1 and 20 dimensions, most of which with dimension 6 or less. The partition 45

60 containing the inner and outer lips have the highest dimension. As expected, this is also the most time consuming subspace to search in. 4.3 Search Order As stated in the previous section and as implied by the notation in (4.4), each of the subproblems can be solved independently from the other. This is not entirely true. The order in which the FAPs are found is not insignificant. For example, the FAPs for the lips cannot be found before the location and pose of the head is determined. If the head is incorrectly displaced to the left, then the optimization algorithm may incorrectly displace the lips to the right in the animated face in an attempt to match the features of the animated face with the features of the original face. The same is true for the other FAPs. The correct head location and pose must be found before any other FAPs are generated. The remaining FAPs can be found in any order with no dependencies. We will continue to use the notation in (4.4) for convenience. We just need to keep in mind that there are dependencies that are not implied by the notation. 46

61 4.4 Transformation Let us consider a function q(x 1,..., x n ) to minimize with respect to x 1,..., x n. In some cases it may be more efficient to minimize q(r 1 (y 1,..., y n ),..., r n (y 1,..., y n )) with respect to y 1,..., y n and then substitute x 1 =r 1 (y 1,..., y n ),..., x n =r n (y 1,..., y n ). The substitution does not alter the minimum or the location of the minimum, but if the substitution is chosen carefully, it may help the search algorithm to approach the optimal point faster. The following sections will discuss various substitutions Translating Point of Rotation In the MPEG-4 FBA standard, fap48, fap49 and fap50 adjust the head s pitch, yaw and roll, respectively, around the top of the spine (feature point 7.1). In our work, it has been shown to be beneficial to use the before mentioned substitution to translate the center of rotation from the top of the spine to the tip of the nose. This translation makes the head pose less vulnerable to changes in head position and vice versa. We will discuss the substitution itself in this section and devote the next section to justify the substitution. From (4.4) we see that the head pose and location are determined by minimizing f pose (loc1, loc2, loc3, fap48, fap49, fap50), where loc1 and loc2 are the displacement along the x- and y-axes, respectively, loc3 is the scale factor of the face, and fap48, fap49, fap50 are the head pitch (rotation around the x-axis), head yaw (rotation around the y-axis) and head roll (rotation 47

62 around the z-axis), respectively. Figure 2.1 defines the x-, y- and z-axis with respect to the face model. We will use the following substitution: f pose (loc1, loc2, loc3, fap48, fap49, fap50) f pose (g 1 (y 1,...,y 6 ),...,g 6 (y 1,...,y 6 )) (4.5) where loc1 g 1 (y 1,..., y 6 ) loc2 g 2 (y 1,..., y 6 ) loc3 g 3 (y 1,..., y 6 ) fap48 g 4 (y 1,..., y 6 ) fap49 g 5 (y 1,..., y 6 ) fap50 g 6 (y 1,..., y 6 ) (4.6) Our transformation is not affected by, and does not affect loc3, which is the size of the head. We therefore choose to define y 3 = loc3. We need to change loc1 and loc2 to compensate for changes in fap48, fap49 and fap50, which in turn are not affected by loc1 or loc2. We can therefore define y 4 = fap48, y 5 = fap49 and y 6 = fap50. The equation in (4.6) then becomes: loc1 g 1 (y 1, y 2, fap48, fap49, fap50) loc2 g 2 (y 1, y 2, fap48, fap49, fap50) loc3 g 3 (loc3) loc3 fap48 g 4 (fap48) fap48 fap49 g 5 (fap49) fap49 fap50 g 6 (fap50) fap50 (4.7) fap50 adjusts the roll of the face around feature point 7.1. In order to change the center of rotation to feature point 9.3, which is the tip of the nose, we need to adjust for the change in translation of feature point 9.3 as a result of rotation around feature point 7.1. Figure 4.1 illustrates the translation. The MPEG-4 FBA standard specifies that for a neutral face, which 48

63 is obtained by setting all FAPs to zero, the head should face directly towards the camera. We can therefore assume that feature point 9.3 and feature point 7.1 both lie in the xy-plane, as indicated by Figures 4.1 and Figure 4.3. Let us define y 1 to be the horizontal displacement of the tip of nose and y 2 to be the vertical displacement of the tip of nose. Feature point 9.3 (tip of nose) will remain unaffected by head roll (fap50) if we define g 1 and g 2 as follows g 1 (y 1, y 2, loc3, fap48, fap49, fap50) y 1 u 1 g 2 (y 1, y 2, loc3, fap48, fap49, fap50) y 2 v 1 (4.8) (4.9) where u 1 h sin( fap50 fap50 ) (9.3.y 7.1.y) sin( ) (4.10) v 1 h (1 cos( fap50 fap50 )) (9.3.y 7.1.y) (1 cos( )) (4.11) where we adopt the notation that 7.1.y represents the y-coordinate of feature point 7.1. The adjustment of pitch (fap48) has an effect on vertical translation only (see Figure 4.2) and g 2 changes to: g 2 (y 1, y 2, loc3, fap48, fap49, fap50) y 2 v 1 v 2 (4.12) where v 2 l (cos(α fap ) cos(α)) (9.3.y 7.1.y) 2 (9.3.z 7.1.z) 2 (cos(α fap ) cos(α)) α atan( 9.3.z 7.1.z 9.3.y 7.1.y ) (4.13) (4.14) 49

64 Figure 4.1 The effect of head roll on translation of tip of nose The adjustment of yaw (fap49) has an effect on the horizontal translation only (see Figure 4.3) and g 1 changes to: g 1 (y 1, y 2, loc3, fap48, fap49, fap50) y 1 u 1 u 3 (4.15) where u 3 d (1 cos( fap )) (9.3.z 7.1.z) (1 cos( fap )) (4.16) By combining (4.8)-(4.16) we obtain the following function: 50

65 Figure 4.2 The effect of head pitch on translation of tip of nose f pose (loc1, loc2, loc3, fap48, fap49, fap50) f pose (g 1 (y 1, fap49, fap50), g 2 (y 2, fap48, fap50), loc3, fap48, fap49, fap50) (4.17) where g 1 (y 1, y 2, loc3, fap49, fap50) y 1 u 1 u 3 y 1 (9.3.y 7.1.y) sin( fap50 fap49 ) (9.3.z 7.1.z) (1 cos( )) g 2 (y 1, y 2, loc3, fap48, fap50) y 2 v 1 v 2 y 2 (9.3.y 7.1.y) (1 cos( fap )) (9.3.y 7.1.y) 2 (9.3.z 7.1.z) z 7.1.z (cos(arctan( 9.3.y 7.1.y ) fap z 7.1.z ) cos(arctan( y 7.1.y ))) (4.18) (4.19) which we need to minimize with respect to y 1, y 2, loc3, fap48, fap49 and fap50. 51

66 Figure 4.3 The effect of head yaw on translation of tip of nose Condition Number Transforming the search problem as described in the previous section has shown to make the search for correct pose and location considerably easier and faster, but we seek to explain this from a mathematical perspective. A contour plot can reveal much about the level of difficulty of a search problem. The contour plot for an easy problem consists of regular circles. The contour plot for a more difficult problem consists of ellipses. The wider the ellipses are, the more difficult the problem is. Figure 4.4 shows the contour plot of f pose (...), plotted with respect to loc1 and loc2. Aswecan 52

67 see, the contour plot is circular, indicating that the function f pose (...) is easy to optimize with respect to loc1 and loc2, but only if the function is to be optimized with respect to loc1 and loc2 and no other variables. Figure 4.5 shows the contour plot for f pose (...) plotted with respect to loc1 and fap50. This contour plot shows very wide ellipses, indicating a hard optimization problem. In Section we transformed the variables for f pose (...) to make the search faster. Figure 4.6 shows the contour plot of f pose (...) plotted with respect to y 1 and fap50. Aswecan see from Figure 4.6 the contour lines are almost circular. By comparing Figure 4.5 with Figure 4.6, we expect the search problem after transformation to be much easier to solve than the problem before transformation. We have plotted and compared f pose (...) before and after transformation for two variables. There are 6 2 = 30 ways of choosing two variables out of the six variables in f(y 1, y 2, loc3, fap48, fap49, fap50). It is not convenient to plot all 30 graphs before and after transformation and compare. Limited to showing contour plots of only two variables at a time, we would like an alternative way to compare the search problem before and after transformation. The "condition number" for a function describes the "width" of the ellipses in the contour plot of a function. The condition number can be calculated for functions with two or more variables. In order to find the condition number, we first need to define the Hessian matrix for a function. The Hessian matrix, H, for a function q(x 1,..., x n ) is defined as: 53

68 Figure 4.4 Contour plot of f pose (loc1, loc2) H(q(x 1,...,x n )) 2 q x q x 2 x 1 2 q x n x 1 2 q... x 1 x 2 2 q x q... x n x 2 2 q x 1 x n 2 q x 2 x n 2 q x 2 n (4.20) The Hessian matrix, H(q(x 1,..., x n )), tells us about the curvature at the point (x 1,..., x n ) for the function q(...), which is exactly what we need to quantify the "width" of the ellipses in the 54

69 Figure 4.5 Contour plot of f pose (loc1, fap50). The center of rotation is top of spine. contour plot of a function of many variables. The condition number of a matrix, q(...), is defined as the ratio of the largest eigenvalue to the smallest eigenvalue of the matrix H(q(...)). If the condition number for a function is 1, then the contour lines are circular. The higher the condition number is, the wider the ellipses in the contour lines are. For example the condition number of f(y1, fap50), depicted with the contour plot in Figure 4.6, is on average We would expect the number to be even closer to 1, as the contour lines are very close to being circular. The reason for the condition number not being closer to 1 is most likely the noise in 55

70 Figure 4.6 Contour plot f pose (y1, fap50). The center of rotation is tip of nose. the signal. The condition number of f(loc1, fap50), depicted with the contour plot in Figure 4.5, is on average , in other words, much higher than the condition number for the graph depicted in Figure 4.6. We will use the condition number as a measure of the success of transforming the point of rotation to make the search problem simpler. From (4.20) we see that we need to compute second order, partial derivatives of the penalty function in order to find the Hessian matrix, which is needed to compute the condition 56

71 number. From Section 3.6.3, we remember that the second order derivative has excessive amount of noise. After applying several different filters with little or no success, we ended up using a cosine transform and then applying a filter in the frequency domain. As an example we will study the second order derivative of f pose (y 1 ), where y 2 =0 (face is vertically centered on the screen), loc3=1 (scale is 100%), fap48=fap49=fap50=0 (neutral head pose). We saw in Section that our attempt to calculate second order derivatives failed because of excessive amount of noise. One of the sources of noise is the limited resolution at which the image is rendered. For the purpose of calculating the condition number we double the resolution both vertically and horizontally, from 420 by 480 pixels to 840 by 960 pixels. Figure 4.7, 4.8 and 4.9 show the function, the derivative and the second order derivative of the f pose (y 1 ) rendered at 842 by 960 pixels. Figure 4.7 Penalty plotted as function of loc1, rendered at high resolution 57

72 Figure 4.8 First order derivative of f pose (...) with respect to loc1 Figure 4.9 Second order derivative of f pose (...) with respect to loc1 58

73 Many time domain filters, including Gaussian filters of different lengths, have been applied to the signal without being able to remove a sufficient amount of noise. We decide to do filtering in frequency domain instead. The N-point DCT (Discrete Cosine Transform) for a function q(x), given by N 1 (2x 1)uπ Q(u) k(u) cos( )q(x), where k(u) x 0 2N 1 N for u 0 2 N for u>0 (4.21) is applied to f y pose (y 1 ) 1 (shown in Figure 4.8). The DC value and the low frequency values are found in F(u) for u close to 0, and the high frequency values are found in F(u) for u close to N-1. The noise is located in the high frequencies. We can remove the high frequencies from our signal by setting them to 0 with a window function. One possible window function is the step function shown in Figure 4.10, but this produces undesirable ripples around the cut-off frequencies. Instead, we apply the window function given by w(x) cos( 2π x) for 0 x< N N 4 0 for N x<n 4 (4.22) (shown in Figure 4.11). After the signal is inverse transformed back according to 59

74 q(x) N 1 u 0 k(u)cos( (2x 1)uπ 2N )Q(u), where k(u) 1 N for u 0 2 N for u>0 (4.23) the result is a lot more pleasing (see Figure 4.12). By computing the second order derivative based on the filtered signal in Figure 4.12, we get the result shown in Figure We conclude we removed a sufficient amount of noise, while retaining enough of the true signal to calculate the Hessian matrix. The above described method is applied to all the FAPs needed to determine the Hessian matrix. Figure 4.10 Step function with ripple effect 60

75 Figure 4.11 Window function that limits ripple-effect Figure 4.12 Filtered first order derivative 61

76 Figure 4.13 Second order derivative after filtering We expect little variation in the condition numbers for f pose (loc1, loc2, loc3, fap48, fap49, fap50) and f pose (y1, y2, loc3, fap48, fap49, fap50) as the contour plots in Figure 4.5 and Figure 4.6 are very regular. We will check the condition numbers on three different points, which are at the end of the last three iterations for f pose (...). The condition number of f pose (loc1, loc2, loc3, fap48, fap49, fap50) for these three points are 74.64, and before transformation. The reason the condition numbers at these three points are not equal is the residual noise still present in the signal and the slight variation of the curvature of the penalty function. The condition numbers for f pose (y1, y2, loc3, fap48, fap49, fap50) for the same three points are 5.52, 2.97 and In other words, the condition number has been significantly reduced as a result of the transformation given by (4.17) to (4.19). We can therefore rest assured that the search 62

77 problem after transformation is significantly easier than the original search problem, and the search is much faster. This confirms what we experienced when optimizing f pose (...) Transforming Outer Lip The outer lip is defined independently from the inner lip in MPEG-4 FBA. If the model is to open its mouth, it is necessary to adjust both the inner lip and the outer lip to achieve the desired effect. The outer lip in the MPEG-4 standard denotes the outer edge of the lip, and the inner lip denotes the inner edge of the lip (see Figure 2.1). From an optimizing point of view, this is not convenient. Our penalty function does not rely on edges or edge detection. It relies on colors over a certain area. The area occupied by the lips would increase or diminish if we adjusted the outer lips and not the inner lips, or vice versa. We solve this problem by adjusting the shape of the lips with one set of variables and the width of the lips with a different set of variables. The following substitution is used: f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) f lips (g 7 (y 7,...,y 26 ),...,g 26 (y 7,...,y 26 )) (4.24) where 63

78 fap4 g 7 (y 7,...,y 26 ) fap13 g 16 (y 7,...,y 26 ) fap51 g 17 (y 7,...,y 26 ) (4.25) fap60 g 26 (y 7,...,y 26 ) We then define fap4 g 7 (y 7 ) y 7 fap5 g 8 (y 8 ) y 8 fap6 g 9 (y 9 ) y 9 fap7 g 10 (y 10 ) y 10 fap8 g 11 (y 11 ) y 11 fap9 g 12 (y 12 ) y 12 fap10 g 13 (y 13 ) y 13 fap11 g 14 (y 14 ) y 14 fap12 g 15 (y 15 ) y 15 fap13 g 16 (y 16 ) y 16 fap51 g 17 (y 7, y 17 ) y 7 y 17 (4.26) fap52 g 18 (y 8, y 18 ) y 8 y 18 fap53 g 19 (y 9, y 19 ) y 9 y 19 fap54 g 20 (y 10, y 20 ) y 10 y 20 fap55 g 21 (y 11, y 21 ) y 11 y 21 fap56 g 22 (y 12, y 22 ) y 12 y 22 fap57 g 23 (y 13, y 23 ) y 13 y 23 fap58 g 24 (y 14, y 24 ) y 14 y 24 fap59 g 25 (y 15, y 25 ) y 15 y 25 fap60 g 26 (y 16, y 26 ) y 16 y 26 64

79 The effect of (4.26) is to let y 7,..., y 16 adjust the shape of the lips and y 17,..., y 26 adjust the thickness of the lips. Negative values for y 17,..., y 26 make the lips thinner than the default thickness (all FAPs are zero) and positive values make the lips thicker. By noting that fap4,..., fap13 are identical to y 7,..., y 16, we can write (4.24) as f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap54, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4,..., fap13, y 17,..., y 26 ),..., g 26 (fap4,..., fap13, y 17,..., y 26 )) (4.27) where fap51 g 17 (fap4, y 17 ) fap4 y 17 fap52 g 18 (fap5, y 18 ) fap5 y 18 fap53 g 19 (fap6, y 19 ) fap6 y 19 fap54 g 20 (fap7, y 20 ) fap7 y 20 fap55 g 21 (fap8, y 21 ) fap8 y 21 fap56 g 22 (fap9, y 22 ) fap9 y 22 (4.28) fap57 g 23 (fap10, y 23 ) fap10 y 23 fap58 g 24 (fap11, y 24 ) fap11 y 24 fap59 g 25 (fap12, y 25 ) fap12 y 25 fap60 g 26 (fap13, y 26 ) fap13 y 26 The variables y 19 and y 20 both have negative coefficients in (4.26) and (4.28). The positive direction of all feature points in group 8 (outer lip position), with the exception of the outer lip corners (feature point 8.3 and feature point 8.4), are towards the inner lip (see Appendix A). The positive direction of the outer lip corners (feature point 8.3 and feature point 8.4) are 65

80 away from the inner lips. The variables in (4.26) and (4.28) that affect the outer lip corners (y 19 and y 20 ) have negative coefficients so that negative values for y 17,..., y 26 make the lips thinner than the default thickness (all FAPs are zero) and positive values make the lips thicker. 4.5 Resulting Optimization Problem The substitutions discussed in the above sections reduce the number of iterations, and hence, the number of function evaluations needed to reach the optimal set of FAPs. The substitution also makes the search more robust. Our minimization problem as stated in (4.4), is stated one more time to reflect the transformations. The FAPs are found by minimizing the following function: penalty min f pose (loc1, loc2, loc3, fap48, fap49, fap50) min f jaw (fap3, fap15) min f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) min f top left eyelid (fap19) min f top right eyelid (fap20) min f bottom left eyelid (fap21) min f bottom right eyelid (fap22) min f left eyeball (fap23, fap25) min f right eyeball (fap24, fap26) min f left eyebrow (fap31, fap33, fap35, fap37) min f right eyebrow (fap32, fap34, fap36, fap38) min f left cheek (fap39, fap41) min f right cheek (fap40, fap42) (4.4) 66

81 with respect to y 1, y 2, loc3, fap48, fap49, fap50, fap3, fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap15, fap19, fap20, fap21, fap22, fap23, fap24, fap25, fap26, fap31, fap32, fap33, fap34, fap35, fap36, fap37, fap38, fap39, fap40, fap41, fap42, y 17, y 18, y 19, y 20, y 21, y 22, y 23, y 24, y 25, y 26 where g 1 (y 1, y 2, loc3, fap49, fap50) y 1 u 1 u 3 y 1 (9.3.y 7.1.y) sin( fap50 fap49 ) (9.3.z 7.1.z)) (1 cos( )) (4.18) g 2 (y 1, y 2, loc3, fap48, fap50) y 2 v 1 v 2 y 2 (9.3.y 7.1.y) (1 cos( fap )) (9.3.y 7.1.z) 2 (9.3.z 7.1.z) z 7.1.z (cos(arctan( 9.3.y 7.1.y ) fap z 7.1.z ) cos(arctan( y 7.1.y ))) (4.19) g 17 (fap4, y 17 ) fap4 y 17 g 18 (fap5, y 18 ) fap5 y 18 g 19 (fap6, y 19 ) fap6 y 19 g 20 (fap7, y 20 ) fap7 y 20 g 21 (fap8, y 21 ) fap8 y 21 g 22 (fap9, y 22 ) fap9 y 22 (4.28) g 23 (fap10, y 23 ) fap10 y 23 g 24 (fap11, y 24 ) fap11 y 24 g 25 (fap12, y 25 ) fap12 y 25 g 26 (fap13, y 26 ) fap13 y 26 The next chapter will discuss our choice of optimization algorithms for solving the above optimization problems. 67

82 Chapter 5 Solutions to Our Nonlinear Optimization Problem The chapter discusses different methods for solving nonlinear optimization problems. It includes discussions of particular problems related to the minimization of our penalty function and it attempts to justify the choices of optimization methods. 5.1 Algorithms for Solving Nonlinear Optimization Problems We use a variety of different search techniques. They all have different characteristic, making them desirable from different aspects: 68

83 Steepest Descent Method: We have chosen the Steepest Descent Method because it approaches the minimum with reasonably few iterations and because it does not use second order derivatives. As we recall from Section 3.6.3, the second order partial derivatives of our penalty function contain large amounts of noise. Not being able to calculate reliable second order derivatives prohibits us from using optimization algorithms with better convergence rate than the Steepest Descent. The Steepest Descent Method relies on a line search algorithm. Cyclic Coordinates Method: The last step in the FAP generation uses two iterations of the Cyclic Coordinates Method to refine the FAPs before they are finalized. When the optimization process reaches a point close to the optimal point, the small amount of noise remaining in the filtered first order partial derivatives (see Section 3.6.3) makes it difficult for the optimization process to further improve the solution. Even though Cyclic Coordinates Method converges slower than the Steepest Descent method, it does not use derivatives and for that reason is suitable for refining the solution before they are finalized. Cyclic Coordinate Method relies on a line search algorithm. Linear search: A coarse linear search is used to overcome the problem where the penalty function is not convex for the appropriate range of the FAPs. A linear search does not use derivatives and does not rely on a line search algorithm. Golden Section Line Search: We choose the Golden Section Line Search because it uses neither the first nor the second order derivatives. By avoiding the use of derivatives for the line search, the Steepest Descent method is prevented from going away from the optimal solution in the cases where the Steepest Descent is misled by noisy first order partial derivatives. 69

84 The above combination of search techniques has shown to perform well for our optimization problem Steepest Descent Method Consider a function q(x 1,..., x n ) to be minimized. The gradient vector q(x 1,..., x n ) is defined as: q(x 1,...,x n ) q x 1 q x n (5.1) The resulting vector has the very useful property that its negative points in the direction of the steepest descent of the function q(...), hence, the name of the method. It seems worthwhile to follow the direction of the steepest descent to find the minimum of the function. It remains to decide how far to go. We want to follow the direction of the steepest descent until we reach the minimal value in that direction and then calculate the gradient at this point for the next iteration. How to find the minimal value along the vector q(x 1,..., x n ) is discussed in Section Even if the computed gradient is occasionally inaccurate due to the noisy derivatives, the line search described in Section prevents the optimization algorithm from going away from the optimal point. Hence, the line search compensates for the inaccuracy introduced by the noisy derivatives used in the Steepest Descent Method. 70

85 The number of function evaluations for each iteration of the Steepest Descent Method is the sum of the number of function evaluations for calculating the gradient vector and the number of function evaluations needed for the line search Cyclic Coordinates Method Possibly the simplest method for solving a nonlinear minimization problem, the Cyclic Coordinate Method nevertheless has shown to possess some desirable characteristics for our purpose. Let us consider a function q(x 1,..., x n ) to be minimized. For each iteration we minimize along each of the directions x 1,..., x n, starting with x 1. For a strictly convex function, we are guaranteed that each iteration will bring us closer to the minimal point. The number of function evaluations is equal to the product of n (the dimension of the search space) and the number of iterations Golden Section Line Search The Steepest Descent Method and the Cyclic Coordinates Method rely on a line search algorithm. Due to the difficulty of calculating reliable first and second order derivatives, we have chosen a line search that does not utilize first or second order derivatives, namely the Golden Section Line Search [28]. Assume that a function q(x) is convex in the interval [a b], 71

86 i.e., d 2 q (x)>0 dx 2 x [a b]. Choose two points α and ß so that a < α < ß < b. There are now two possible cases: Case 1: If q(α) >q(ß) then the minimum must be in the interval [α b] and we do the next iteration inside this new interval. Case 2: If q(α) <q(ß) then the minimum must be in the interval [a ß] and we do the next iteration inside this new interval. The points α and ß at which to evaluate the function q(x) are chosen so that in the next iteration either the new α will coincide with the old ß (Case 1) or the new ß will coincide with the old α (Case 2). Thus, the number of function evaluations is limited to one per iteration. Figure 5.1 explains this principle graphically. Initial interval New interval: Case 1 New interval: Case 2 a α ß b a α ß b a α ß b Figure 5.1 Illustration of the Golden Section Line Search 72

87 The values for α and ß are chosen so that β b a a b b α a α b a α b β β a (5.2) We can calculate the reduction of the length of the search interval for each iteration using (5.2): b β β a β b a a (b a) (β a) β a β b a a [(b a) (β a)](b a) (β a) 2 (β a) 2 (b a)(β a) (b a) 2 0 (β a) (b a)± (b a) 2 4(b a) 2 2 1± 5 2 (b a) We ignore the negative solution and get that (β a) (b a) (5.3) A similar derivation can be done to show that (b - α) (b - a). In other words, the length of the new interval is reduced to of the length of the initial interval for each iteration. The iterations stop when the interval [a b] is sufficiently small. The stopping criterion is described in Section

88 5.1.4 Linear Line Search Unlike the Golden Section Line Search, a linear search does not require the function to be convex in the search interval. This is a very useful property as we will see in Section 5.4. Consider a function q(x 1,..., x n ) to be minimized along a line from a=(a 1,..., a n )tob=(b 1,..., b n ). We evaluate the function at q(a+λ(b-a)/m) for λ=0, 1, 2,..., m to find the minimum. m+1 function evaluations are required. For example, let us consider a function q(x, y) to be minimized along the line from a=(a 1, a 2 ) to b=(b 1, b 2 ) and let m=6. Figure 5.2 shows the m+1=7 points, including a and b, at which the function q(...) must be evaluated. The coarse estimate for the minimal point is the point where q(...) has the lowest value. Figure 5.2 Example of Linear Search 74

89 5.2 Stopping Criteria The Golden Section line search stops when the search interval is reduced to 0.5% of the initial search interval. For each iteration, the search interval is reduced to of its previous length. By solving λ n nln(λ) ln(0.005) n ln(0.005) ln(λ) ln(0.005) ln(0.618) 11 (5.4) we see that eleven iterations are required to achieve 0.5%. Experiments have shown that a fixed number of iterations are good for the other optimization methods. 5.3 Justification of Choice of Optimization Methods Cyclic Coordinates Method is known to be one of the simplest and slowest method, in terms of number of iterations, for solving a non-linear optimization problem. The Steepest Descent Method generally performs better than the Cyclic Coordinate Method, but the Steepest Descent Method is also considered a "slow" method compared to many other methods. The choice of these two methods can be justified by analyzing the optimization problem we are trying to solve. Recall from Section our discussion of contour plots and condition numbers. Let us consider a function, q(x, y), which has condition number much greater than 1. Figure

90 illustrates how the Cyclic Coordinate Method solves the optimization problem. x 0 denotes the initial value and x 1, x 2 are the resulting points after the first and second iteration. In the first iteration, the Cyclic Coordinate Method finds the minimum along the x-axis, and then along the y-axis. This is repeated in subsequent iterations, and the method zig-zags towards the minimum. A large number of iterations are needed until the method reaches a point sufficiently close to the true minimum. Figure 5.3 Example of minimization problem with condition number greater than 1 being solved with Cyclic Coordinate Method Figure 5.4 shows how the Steepest Descent Method solves the optimization problem. The numbers on the graph denotes the iteration. In the first iteration, the Steepest Descent Method finds the minimum along the direction that is perpendicular to the contour line at the starting point. This direction is given by the gradient q(x, y). This is repeated in subsequent 76

91 iterations, and the method zig-zags towards the minimum. A large number of iterations are needed until the method reaches a point sufficiently close to the true minimum. Figure 5.4 Example of minimization problem with condition number greater than 1 being solved with Steepest Descent Method Let us consider another function, r(x, y), with condition number equal to 1. Figure 5.5 shows the progress for the optimization using the Cyclic Coordinate Method. x 0 denotes the initial point and x 1 is the resulting point after one iteration. In the first iteration, the minimum along the x-axis is found, followed by the minimum along the y-axis. After only one iteration, the minimum is found, even with a method as little sophisticated as the Cyclic Coordinate Method, because the optimization problem is particularly simple. The result for the Steepest Descent Method is similar to the Cyclic Coordinate Method (see figure 5.6). The gradient r(x, y) is perpendicular to the contour lines, which are circular, and points directly towards the center of the circle, which is the optimal point we are seeking. 77

92 Figure 5.5 Example of minimization problem with condition number equal to 1 being solved with Cyclic Coordinate Method Figure 5.6 Example of minimization problem with condition number equal to 1 being solved with Steepest Descent Method Only one iteration is required for the Steepest Descent method to reach the minimum when the condition number is 1. 78

93 There are methods for solving non-linear optimization problems that will perform well on problems with a high condition number, but they all rely on the Hessian matrix in some way. One example is Newton s Method, which relies on the Hessian matrix and the gradient vector to choose the search direction and the distance to go in the search direction. It is intuitive that methods that are good for solving optimization problems with high condition number use the Hessian matrix, as it is the Hessian matrix that gives us information about the curvature of the surface (possible a high-dimensional surface) for the function we are optimizing. In our case, it is not an option to use optimization methods that employ the Hessian matrix because of the vast processing power needed to compute reasonably noise-free second order derivatives. The point we are trying to make is that both Steepest Descent Method and Cyclic Coordinate Method perform well if the problem has a condition number close to 1. In Section we studied the condition number for f pose (...) and found that after the transformation, the condition number was close to 1. Applying the same techniques, involving increased resolution and filtering, we will study the remaining sub problems to verify that they all have condition numbers close to 1, making the Steepest Descent Method and the Cyclic Coordinate Method good choices. The average condition number for each of the sub problems was computed for 10 consecutive frames of the Erin video sequence. It is desirable to compute the condition number for an even larger number of frames, but this has not been done due to the processing power and time needed to compute condition numbers for a large number of frames. The condition numbers are computed right before the last iteration for each of the sub problems. Table 5.1 shows the 79

94 average condition number for each sub problem. The functions f top left eyelid (fap19), f top right eyelid (fap20), f bottom left eyelid (fap21) and f bottom right eyelid (fap22) do not have condition numbers because they are all functions of a single variable. Table 5.1 Condition numbers for sub problems Function Condition number f jaw (fap3, fap15) f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap51, fap52, fap53, fap54, fap55, fap56, fap57, fap58, fap59, fap60) f top left eyelid (fap19) f top right eyelid (fap20) f bottom left eyelid (fap21) f bottom right eyelid (fap22) N/A N/A N/A N/A f left eyeball (fap23, fap25) f right eyeball (fap24, fap26) f left eyebrow (fap31, fap33, fap35, fap37) f right eyebrow (fap32, fap34, fap36, fap38) f left cheek (fap39, fap41) f right cheek (fap40, fap42) As we can see from Table 5.1, the condition number for each sub problem is fairly low. The highest condition number is for the lips, f lips (...), which is as expected because this sub problem has the search spaces of highest dimensions. The condition number for a problem can increase, but never decrease, as the dimension increases. We therefore expect the condition number to increase as the dimension increases. All condition numbers in Table 5.1 are reasonably close 80

95 to 1, and we will let this be the first half of our justification that the Steepest Descent Method and the Cyclic Coordinate Method are working well for our search problem. The second part of the justification was mentioned earlier in this section. It states that any method that will compensate for the high condition numbers, needs to know the curvature of the function to optimize. Knowing the curvature of the function requires the Hessian matrix. Calculating the Hessian matrix requires second order, partial derivatives. Second order, partial derivatives are extremely noisy and removing the noise requires filtering. The filtering requires a large number of function evaluations, which make the optimization prohibitively slow. Hence, Steepest Decent Method and Cyclic Coordinates Method are good choices for solving our minimization problem. 5.4 The Problem of Non-Convexity We saw in Section that the penalty function is convex, possibly with some noise, within certain ranges. As we will see there are FAPs that are outside the ranges where the penalty function is convex, and they will cause problems for any search algorithm that assumes convexity. As an example, we will study how the penalty varies as the mouth is opened. Figure 5.7 shows a frame where the mouth is open. The detail from the mouth region is shown in Figure 5.8. Figure 5.9 shows a graph of penalty=f pose (a+λ(b-a)/m) for λ=0, 1, 2,..., m=300, where a is the point where fap4=fap5=fap6=fap7=fap8=fap9=fap10=fap11=fap12=fap13=0, y i =0 for i=17,..., 26 and b is the point where fap4=fap6=fap7=fap8=fap9=0, fap5=600, fap10=fap11=500, fap12=fap13=75, y i =0 for i=17,..., 26. This corresponds to gradually 81

8 Detail from the mouth region with open mouth As we can see from Figure 5.9 the penalty is not convex for the range graphed.

96 opening the lower part of the mouth. Figure 5.9 also shows the mouth captured at seven different points between a and b. Figure 5.7 Face from the video sequence with open mouth Figure 5.8 Detail from the mouth region with open mouth As we can see from Figure 5.9 the penalty is not convex for the range graphed. On the other hand, it is very evident that the function has a global minimum. The local minimum around λ=120 occurs when the red, lower lip in the animated face is matched with the red tongue in 82

97 Figure 5.9 Penalty as function of mouth opening in the linear search 83

98 the original image. In our FAP generation process, we use a 30-point linear search to find a point close to the global minimum for the lower lip (affecting fap5, fap10, fap11, fap12 and fap13), a 20-point linear search for the upper lip (affecting fap4, fap8, fap9, fap12 and fap13) and a 16-point search for the width of the mouth (affecting fap6 and fap7). The three linear searches are followed by a Steepest Descent search to find the minimum more accurately. The initial linear search takes relatively few function evaluations (30 for the lower lip, 20 for the upper lip and 16 for the width of the lips, which totals 66 function evaluations) and provides the Steepest Descent Method with a good starting point. The rising of the left eyebrow (affecting fap31, fap33 and fap35) and the right eyebrow (affecting fap32, fap34 and fap36) are done in the same way first a coarse linear search, followed by a more accurate Steepest Descent search. The upper and lower eyelids on both left and right side use the same procedure. After the linear search and the Steepest Descent search are employed, we add two iterations of the Cyclic Coordinate Method to refine the FAPs before they are finalized. The Cyclic Coordinate Method has the advantage over the Steepest Descent Method that it does not rely on gradients or derivatives. In practice, the use of the Cyclic Coordinate Method to refine the FAPs has shown to be very successful. 84

99 Chapter 6 Masking and Weighting Functions Although this chapter may be considered a part of, or an extension to Search Strategies, we have chosen to separate it and discuss the masking and weighting functions after we have described the search algorithms in the previous chapter. In this way we can demonstrate the problems we encounter when optimizing the penalty function with the previously described search algorithms and show how to solve those problems. As discussed in Section 3.6.1, the weighting function in (3.2) is used to mask out the background. The weighting function can also be used to emphasize certain parts of the face. For example, when finding the correct head position, the area around the nose and nostrils has a heavier weight than those for the rest of the face. The nostrils have sharp edges and high 85

100 contrast, which makes them very suitable for matching. The detailed masking and weighting scheme is discussed in the following sections. 6.1 Masking the Face The background is always masked out, regardless of which FAPs are to be determined. The mask is based on the animated face, as opposed to the original face, because it is trivial to determine the area occupied by the animated face. Any pixel not written to during the rendering of the animated face is considered background. Figure 6.1 shows the animated face to the left and the corresponding mask to the right. The black area is masked out and the white area is preserved when calculating the penalty. Figure 6.1 An animated face and the corresponding mask 86

101 6.2 Masking the Nostrils The horizontal and vertical locations of the face (loc1 and loc2) are found in two steps. The first step relies on the complete face with the mask described in Section 6.1. This step gives a coarse estimate of the face location. The second step improves the accuracy and uses only the nostrils, as the nostrils have sharp edges and high contrast, which makes the search very accurate and robust. Furthermore, the nostrils are stationary with respect to the face, making them very suitable for finding the accurate head location. The mask for the nostrils is an ellipse with center at point The width and height are based on feature point 9.4 and feature point 9.5. Figure 6.2 shows an example of the mask. The light area is masked out and is not used for calculating the penalty. The area with normal exposure has mask equal to 1 and is used for calculating the penalty. Figure 6.2 Mask used when finding location of head 87

102 6.3 Weighting the Inner Lips The inner lip is considered more important than the outer lip for the purpose of animation. We would like to determine the lip so that the animated inner lip matches the original inner lip first. Then we adjust the outer edge of the animated lip to match the original lip. However, we need the aid provided by both the inner and the outer part of the lip to maintain the robustness of the optimization. We have chosen to weight the inner part of the lip by 3 and the outer lip by 2. Thus, the inner lip is the more influential on the optimization, while we preserve the guidance provided by the outer edge of the lip. The inner part of the lip is defined by the polygon described by the points mid between 2.2 and 8.1, 2.6 and 8.5, 2.4 and 8.3, 2.8 and 8.7, 2.3 and 8.2, 2.9 and 8.8, 2.5 and 8.4, 2.7 and 8.6. The corner of the outer lips is of special importance for obtaining realistic and truthful animation of the face. A small circular area around each of the outer lip corners (feature point 8.3 and feature point 8.4) therefore has a weight of 5, which is heavier than the rest of the lips. Figure 6.3 shows an example of the mask. Light color indicates weight = 0 (masked out). Normal colors indicate weight = 2, darker color indicates weight = 3 and the darkest color indicates weight = Masking the Eye Brows The area under the eyebrow, over the eyelid is often dark because the lighting is usually coming from above. This may confuse the optimizer unit to believe the dark area over the 88

103 Figure 6.3 Mask used when optimizing inner lip eyelid is part of the eyebrow. We solve this problem by masking out the area under the eyebrows. The only area used is from above the bottom of the eyebrows to the hair line. The masked area is the polygon defined by feature point 11.1, feature point 11.3, feature point 4.5, feature point 4.3, feature point 4.1, feature point 11.1, feature point 4.2, feature point 4.4, feature point 4.6 and feature point Figure 6.4 shows an example of this mask. 6.5 Masking the Iris The color and intensity of the white in the eyes can vary a lot between the face model and the face in the original video sequence, because the color or intensity in that area is highly dependent on location of light sources, head pose and whether the eye lids are wide open or almost closed. The white in the eyes is therefore masked out when the iris is found. No figure 89

104 Figure 6.4 Mask used when optimizing eyebrows is shown as example of the mask for the iris, as the iris itself is already very dark and difficult to distinguish by variation in intensity. 90

105 Chapter 7 Anatomical Constraints The penalty function, as given by (3.2), has no knowledge about the appearance of a normal face. What may seem to be a good fit according to the penalty may look exaggerated, twisted and distorted to a human. We need to guide the FAP optimizer to avoid exaggerated, twisted and distorted faces. Several problems that have been encountered and how to guide the FAP optimizer to avoid these problems are described in Sections 7.2 through 7.5. We will start by looking at the general strategy for guiding the FAP optimizer towards natural looking faces. 7.1 Barrier Function So far we have attempted to generate FAPs by minimizing the penalty function, f(...). We now add one term, barrier(...), which conveys knowledge about what normal and non-distorted 91

106 faces look like. The problem then becomes to minimize f(loc1,..., fap60) + barrier(loc1,..., fap60). The purpose of the function barrier(...) is to tell the FAP optimizer unit if a set of FAPs generates a distorted or unnatural face. If barrier(...) = 0 then the set of FAP generates a normal looking, undistorted face. A set of FAPs resulting in barrier(...) > 0 indicates that the FAPs represent an unnatural or distorted face. The higher the value of barrier(...) is, the more distorted the face is. The name of the function is chosen because it serves as a barrier that is difficult for the optimization algorithms to cross or overcome, hence, forcing the optimization unit to generate FAPs that do not cause the face to look distorted or unnatural. The barrier function does not prohibit any set of FAP values. It simply advises the optimizer unit on how a natural face looks. Even unnatural or unusual facial expressions are permitted if the minimal point in the penalty function is very dominant. In Section 4.2 the penalty function, f(...), was separated into independent parts. It is natural to do the same with the barrier function, so that each part of the penalty function, f(...), has its own barrier working only on the particular part of the face that the particular f(...) is working on. 7.2 Thickness of Lip The thickness of the lips varies because the amount of lip visible to the viewer depends on the shape, angle and stretch of the lip. The thickness of the lip is nevertheless subject to anatomical constraints. 92

107 If the tongue is very visible in the mouth and receives good lighting, the FAP optimizer unit may mistakenly take it for being part of the lip. If there are no constraints on the thickness of the lip, the optimizer unit may increase the lip thickness until it covers both the real lip and the tongue in the original image. Figure 7.1 shows an example where it is difficult to distinguish between the lower lip and the tongue. The result after generating FAPs for the frame shown in Figure 7.1 is shown in Figure 7.2. Feature points 2.3, 2.8 and 2.9 are too far up, resulting in a thick lip that also covers the area occupied by the tongue in the original frame. Figure 7.1 Original image from frame 314 in the Erin sequence We recall from section that the outer lip is transformed so that it depends on the inner lip and is given by the deviation with respect to the thickness of the lip. If y i <0 for i=17,..., 26, then the lip is thinner than the default thickness for the lip. If y i >0 for i=17,..., 26, then the lip is thicker than the default thickness for the lip. The default thickness is defined as the 93

108 Figure 7.2 The FAP optimizer makes the lower lip thick in an attempt to match the tongue in the original image in frame 314 from the Erin sequence thickness for the lips in the neutral face, i.e., all FAPs are set to 0. The barrier function used to advise the FAP optimizer unit with regard to lip thickness is defined as: barrier lips1 (fap4,..., fap13, y 17,..., y 26 ) c 1 y 17 c 2 y 18 c 3 y 19 c 4 y 20 c 5 y 21 c 6 y 22 c 7 y 23 c 8 y 24 c 9 y 25 c 10 y 26 (7.1) In our work, the values for c i listed in Table 7.1 have shown to provide a good guidance to the FAP optimizer unit on lip thickness. Figure 7.3 shows the result of controlling the thickness of the lip in frame 314 of the Erin sequence shown in Figure

109 Table 7.1 Values for constants c 1,..., c 10 Constant Description Value c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 Vertical distance between top, middle outer lip and top, middle inner lip Vertical distance between bottom, middle outer lip and bottom, middle inner lip Horizontal distance between left, outer lip corner and left inner lip corner Horizontal distance between right, outer lip corner and right inner lip corner Vertical distance between midpoint between left corner and middle of top outer lip and midpoint between left corner and middle of top inner lip Vertical distance between midpoint between right corner and middle of top outer lip and midpoint between right corner and middle of top inner lip Vertical distance between midpoint between left corner and middle of bottom outer lip and midpoint between left corner and middle of bottom inner lip Vertical distance between midpoint between right corner and middle of bottom outer lip and midpoint between right corner and middle of bottom inner lip Vertical distance between left, outer lip corner and left, inner lip corner Vertical distance between right, outer lip corner and right, inner lip corner Upper Lip Above Lower Lip It is difficult to distinguish between the upper and the lower lip when the mouth is closed. Sometimes the optimizer unit will position the lower part of the upper lip below the upper part of the lower lip. An example is shown in Figures 7.4, 7.5 and

110 Figure 7.3 Result after controlling the lip thickness on frame 314 in the Erin sequence Figure 7.4 Original image from frame 715 in the Erin sequence We can solve this problem by insisting that the lower part of the upper lip should always be above the upper part of the lower lip, i.e., feature point 2.7 should be above feature point 2.9, 96

111 Figure 7.5 Frame 715 in the Erin sequence. The lips look unsatisfactory. Figure 7.6 Frame 715 from the Erin video. The border of the upper, inner lip is marked with yellow. The border of the lower, inner lip is marked with blue. feature point 2.2 should be above feature point 2.3 and feature point 2.6 should be above 97

112 feature point 2.8. The following barrier function provides the optimizer unit with the necessary guidance: barrier lips2 (fap4,..., fap13, y 17,..., y 26 ) 0, if fap4 fap5 c 11 (fap4 fap5), if fap4<fap5 0, if fap8 fap10 c 12 (fap8 fap10), if fap8<fap10 (7.2) 0, if fap9 fap11 c 13 (fap9 fap11), if fap9<fap11 It is necessary to negate fap4 when comparing with fap5 because the positive direction of fap4 is down and the positive direction of fap5 is up. The same is true for fap8 and fap10 and for fap9 and fap11. The values for the constants c 11,c 12 and c 13 are much higher than the values restricting the thickness of the lips (c 1,..., c 10 ) because we insist that the upper lip shall be above the lower lip while we merely make recommendation about the thickness of the lips. The values for c 11,c 12 and c 13 are listed in Table 7.2. The result after applying the barrier to frame 715 of the Erin video shown in Figure 7.4 is shown in Figure 7.7 and Figure

113 Table 7.2 Values for constants c 11, c 12 and c 13 Constant Value c c c Figure 7.7 The result of applying constraints on the shape of the lips in frame 715 in the Erin sequence. 7.4 Constraints on Shape of Lips Sometimes bad things happen to good lips, especially if the mouth is almost, but not fully, closed. An example is shown in Figure 7.9 and 7.10, where feature point 2.7 is too close to feature point 2.9 and feature point 2.6 is too close to feature point

Figure 7.8 The result of applying constraints on the shape of the lips in frame 715 from the Erin sequence. The border of the upper lip is marked with yellow, the lower one with blue. Figure 7.

114 Figure 7.8 The result of applying constraints on the shape of the lips in frame 715 from the Erin sequence. The border of the upper lip is marked with yellow, the lower one with blue. Figure 7.8 Original image from frame 498 in the Erin sequence The shape of the upper lips can be approximated with a sine function with the argument ranging from 0 for the right, inner lip corner (feature point 2.5) to π for the left, inner lip 100

Figure 7.10 Frame 498 from the Erin sequence. The upper lip looks torn close to the left lip corner. Figure 7.11 Feature points for inner lip. Feature point 2.6 and feature point 2.7 are too low.

115 Figure 7.10 Frame 498 from the Erin sequence. The upper lip looks torn close to the left lip corner. Figure 7.11 Feature points for inner lip. Feature point 2.6 and feature point 2.7 are too low. corner (feature point 2.4), where the magnitude is given by the mid, inner lip (feature point 2.2). Figure 7.11 shows the feature points for the inner lip. Feature point 2.6 and feature point 2.7 are shown close to the bottom lip. The dotted curve from feature point 2.4, through feature point 2.2, to feature point 2.5 shows the sine function with magnitude given by feature point 2.2. We will use a barrier function to suggest to the optimizer unit that the shape of a naturally looking upper lip shall resemble a sine function from 0 to π. The upper lip is certainly allowed 101

116 Figure 7.12 Frame 498 from the Erin sequence. The border of the upper lip is marked with yellow. The border of the lower lip is marked with blue. to deviate from the shape of a sine function to ensure accuracy with the original lip. The barrier function is simply a guide to a natural looking lip. We need to take into account the orientation of the head when calculating the sine function for our barrier function. Figure 7.11 shows the inner lip with the sine function marked with a dashed curve. The distance, d 1, from the point C to the line AE in Figure 7.11 is given by d 1= AC AE AE (7.3) and the sine function from the right lip corner to the left lip corner is given by sx1( t) = 2.5.x +t (2.4.x x) sy1( t) = 2.5.y +t (2.4.y y) + d sz1( t) = 2.5.z +t (2.4.z z) 1 sin( π t) (7.4) 102

117 The two points t 1 and t 2 are found by projecting feature point 2.7 and feature point 2.6 onto line AE: t 1 AB AE AD AE, t AE 2 2 AE 2 (7.5) And finally, the barrier function is given by: barrier lips3 (fap4,..., fap13, y 17,..., y 26 ) c 14 (2.7.x sx 1 (t 1 )) 2 (2.7.y sy 1 (t 1 )) 2 (2.7.z sz 1 (t 1 )) 2 (7.6) c 15 (2.6.x sx 1 (t 2 )) 2 (2.6.y sy 1 (t 2 )) 2 (2.6.z sz 1 (t 2 )) 2 The result after applying (7.6) to the FAP generation for frame 715 of the Erin sequence is shown in Figure 7.13 and Figure Figure 7.13 The result of applying constraints on the shape of the lips in frame 498 in the Erin sequence. 103

Figure 7.14 The result of applying constraints on the shape of the lips in frame 498 from the Erin sequence. The border of the upper lip is marked with yellow., the lower one with blue.

118 Figure 7.14 The result of applying constraints on the shape of the lips in frame 498 from the Erin sequence. The border of the upper lip is marked with yellow., the lower one with blue. The lower lip can be approximated similarly to the upper lip. A sine function is used also for the lower lip. The only difference is that the coefficient of the sine is negative (unless she is smiling). The constraint is formulated as follows: d 2= AG AE AE (7.7) sx2 ( t) = 2.5.x +t (2.4.x x) sy2 ( t) = 2.5.y +t (2.4.y y) - d sz2 ( t) = 2.5.z +t (2.4.z z) 2 sin( π t) (7.8) AF AE AH AE =,t = 2 4 AE AE t3 2 (7.9) 104

119 barrier lips4 (fap4,..., fap13, y 17,..., y 26 ) c 16 (2.9.x sx 2 (t 3 )) 2 (2.9.y sy 2 (t 3 )) 2 (2.9.z sz 2 (t 3 )) 2 (7.10) c 17 (2.8.x sx 2 (t 4 )) 2 (2.8.y sy 2 (t 4 )) 2 (2.8.z sz 2 (t 4 )) 2 The values for c 14, c 15, c 16 and c 17 are listed in Table 7.3. Table 7.3 Values for constants c 14, c 15, c 16 and c 17 Constant Value c c c c Constraints on Height of Left and Right Corner of Inner Lip When the mouth is wide open, it is sometimes difficult to find the correct positions of the corners of the inner lips. Figure 7.16 shows an example where the right, inner corner of the lip is too high compared to the left, inner corner of the lip. The original frame is shown in Figure In general the corner of the right, inner lip (feature point 2.5) should be at the same level as the corner of the left inner lip (feature point 2.4), adjusted for head roll (rotation 105

120 around the z-axis). We start by finding the angle of a line going through the two feature points 2.4 and 2.5 (see Figure 2.1). The barrier is then defined as barrier lips5 c 18 abs(atan( 2.5.y 2.4.y 2.5.x 2.4.x ) fap ) (7.11) where the angle of head roll is fap rad. Experiments have shown that c 18 = 50 is a good compromise between preventing distorted facial expressions and allowing asymmetry in the face. The value for c 18 is greater than the values for the constants used in the other barrier functions because angle FAPs are specified with higher accuracy than displacement FAPs in the MPEG-4 standard. Figure 7.17 shows the result when (7.11) is applied when generating the FAPs for the frame shown in Figure Figure 7.15 Original image from frame 1233 in the Erin sequence 106

121 Figure 7.16 The corner of the left lip is at a different height from the corner of the right lip in frame 1233 in the Erin sequence Figure 7.17 Frame 1233 from the Erin sequence after applying the barrier function 107

122 7.6 Resulting Optimization Problem For completeness, the optimization problem including the barrier functions are stated in this section. The optimal set of FAPs is the set of FAPs that results from minimizing penalty with respect to y 1, y 2, loc3, fap48, fap49, fap50, fap3, fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, fap15, fap19, fap20, fap21, fap22, fap23, fap24, fap25, fap26, fap31, fap32, fap33, fap34, fap35, fap36, fap37, fap38, fap39, fap40, fap41, fap42, y 17, y 18, y 19, y 20, y 21, y 22, y 23, y 24, y 25, y 26 by solving the following minimization problems: 108

123 penalty min f pose (g 1 (y 1, fap49, fap50), g 2 (y 2, fap48, fap50), loc3, fap48, fap49, fap50) min f jaw (fap3, fap15) min f lips (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4, y 17 ), g 18 (fap5, y 18 ),..., g 26 (fap13, y 26 )) barrier lips1 (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4, y 17 ), g 18 (fap5, y 18 ),..., g 26 (fap13, y 26 )) barrier lips2 (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4, y 17 ), g 18 (fap5, y 18 ),..., g 26 (fap13, y 26 )) barrier lips3 (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4, y 17 ), g 18 (fap5, y 18 ),..., g 26 (fap13, y 26 )) barrier lips4 (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4, y 17 ), g 18 (fap5, y 18 ),..., g 26 (fap13, y 26 )) barrier lips5 (fap4, fap5, fap6, fap7, fap8, fap9, fap10, fap11, fap12, fap13, g 17 (fap4, y 17 ), g 18 (fap5, y 18 ),..., g 26 (fap13, y 26 )) min f top left eyelid (fap19) min f top right eyelid (fap20) min f bottom left eyelid (fap21) min f bottom right eyelid (fap22) min f left eyeball (fap23, fap25) min f right eyeball (fap24, fap26) min f left eyebrow (fap31, fap33, fap35, fap37) min f right eyebrow (fap32, fap34, fap36, fap38) min f left cheek (fap39, fap41) min f right cheek (fap40, fap42) (7.12) where g 1 (y 1, y 2, loc3, fap48, fap49, fap50) y 1 u 1 u 3 y 1 (9.3.y 7.1.y) sin( fap50 fap49 ) (9.3.z 7.1.z) (1 cos( )) (4.18) 109

124 g 2 (y 1, y 2, loc3, fap48, fap49, fap50) y 2 v 1 v 2 y 2 (9.3.y 7.1.y) (1 cos( fap )) (9.3.y 7.1.y) 2 (9.3.z 7.1.z) z 7.1.z (cos(arctan( 9.3.y 7.1.y ) fap z 7.1.z ) cos(arctan( y 7.1.y ))) (4.19) g 17 (fap4, y 17 ) fap4 y 17 g 18 (fap5, y 18 ) fap5 y 18 g 19 (fap6, y 19 ) fap6 y 19 g 20 (fap7, y 20 ) fap7 y 20 g 21 (fap8, y 21 ) fap8 y 21 g 22 (fap9, y 22 ) fap9 y 22 (4.28) g 23 (fap10, y 23 ) fap10 y 23 g 24 (fap11, y 24 ) fap11 y 24 g 25 (fap12, y 25 ) fap12 y 25 g 26 (fap13, y 26 ) fap13 y 26 barrier lips1 (fap4,..., fap13, y 17,..., y 26 ) c 1 y 17 c 2 y 18 c 3 y 19 c 4 y 20 c 5 y 21 c 6 y 22 c 7 y 23 c 8 y 24 c 9 y 25 c 10 y 26 (7.1) 110

125 barrier lips2 0, if fap4 fap5 c 11 (fap4 fap5), if fap4<fap5 0, if fap8 fap10 c 12 (fap8 fap10), if fap8<fap10 (7.2) 0, if fap9 fap11 c 13 (fap9 fap11), if fap9<fap11 d 1 AC AE AE (7.3) sx 1 (t) 2.5.x t(2.4.x 2.5.x) sy 1 (t) 2.5.y t(2.4.y 2.5.y) d 1 sin(πt) sz 1 (t) 2.5.y t(2.4.z 2.5.z) (7.4) t 1 AB AE AD AE, t AE 2 2 AE 2 (7.5) barrier lips3 (fap4,..., fap13, y 17,..., y 26 ) c 14 (2.7.x sx 1 (t 1 )) 2 (2.7.y sy 1 (t 1 )) 2 (2.7.z sz 1 (t 1 )) 2 (7.6) c 15 (2.6.x sx 1 (t 2 )) 2 (2.6.y sy 1 (t 2 )) 2 (2.6.z sz 1 (t 2 )) 2 d 2 AG AE AE (7.7) 111

126 sx 2 (t) 2.5.x t(2.4.x 2.5.x) sy 2 (t) 2.5.y t(2.4.y 2.5.y) d 2 sin(πt) sz 2 (t) 2.5.y t(2.4.z 2.5.z) (7.8) t 3 AF AE AH AE, t AE 2 4 AE 2 (7.9) barrier lips5 (fap4,..., fap13, y 17,..., y 26 ) c 16 (2.9.x sx 2 (t 3 )) 2 (2.9.y sy 2 (t 3 )) 2 (2.9.z sz 2 (t 3 )) 2 (7.10) c 17 (2.8.x sx 2 (t 4 )) 2 (2.8.y sy 2 (t 4 )) 2 (2.8.z sz 2 (t 4 )) 2 112

127 Table 7.4 Values for constants c 1,..., c 18 Constant Value c c c c c c c c c c c c c c c c c c

128 Chapter 8 FAP Filtering The FAPs are not found with absolute accuracy. The inaccuracy arises from limited resolution used during the rendering process, limited precision used for specifying FAP values and limitation in the model and the render unit. The inaccuracies are small and are not noticeable when each frame is examined separately. However, when the frames are considered in sequence, as an animation, the inaccuracies result in small variations or noise in the head pose from frame to frame. The head appears to be quivering and shaking. Other parts of the face, for example the mouth, have very a small, but noticeable, quivering. This chapter discusses how to remove the noise to make the movements smooth and steady. 114

129 8.1 Filtering Head Pose Figure 8.1 shows the FAP defining the head pose (yaw, pitch and roll) for frame 1000 to 1199 for the Erin sequence. The frames are 1/30 seconds apart in time. The head has great mass and moves slowly. Filtering the head pose to remove noise can therefore easily be done without significantly affecting the actual changes in head pose. A 7-point Gaussian filter has been shown to work very well. For the discrete case, a good approximation to the Gaussian function is the Pascal s triangle. We approximate the N-point Gaussian filter as the convolution of the function (FAP values) and the normalized n th row in the Pascal s triangle. The convolution, s(x), of two discrete signals q(x) and r(x) is defined as: s(x) q(x) r(x) k q(k) r(x k) (8.1) To apply a 7-point Gaussian filter, we let q(x) be the FAP values and r(0)=1/64, r(1)=6/64, r(2)=15/64, r(3)=20/64, r(4)=15/64, r(5)=6/64, r(6)=1/64 and r(x)=0 for x<0 and x>6. Figure 8.2 shows the FAPs for head pose for the same sequence as in Figure 8.1 after applying a 7-point Gaussian filter. When watching the animation after the FAPs controlling the head pose have been filtered, the head appears steady and stable and the movements seem natural. 115

130 Figure 8.1 FAP values for head pose before filtering 116

131 Figure 8.2 FAP values for head pose after filtering 117

132 8.2 Filtering Other FAPs Unlike the head, which is heavy and moves slowly, as described in the previous section, many other movements within the face are very fast. The eyelids can go from fully opened to fully closed in two frames. Similarly, the eyes can move very rapidly from looking in one direction to looking in a different direction. For this reason, the eyelids and the direction the eyeballs are gazing are not subject to any filtering. Fortunately the tracking of eyes and eyelids is very accurate and introduces very little noise, making the eyelids and eyeballs look stable and natural even without filtering. The mouth moves with a speed slower than the eyelids and eyeballs, but faster than the head pose. A small amount of quivering can be seen on the lips when watching an unfiltered animated sequence closely. We apply a 3-point filter to all FAPs related to the mouth. The 3- point filter is a good compromise between removing small amount of noise while preserving the true motion of the lips in a talking face. The jaw, cheeks and eyebrows move more rapidly than the head pose, but not as fast as the mouth. We use a 5-point Gaussian filter for the jaw, cheeks and eyebrows. The result looks natural and free from jitter without appearing delayed or mechanical. Figure 8.3 shows the values for three representative FAPs before filtering for 200 frames. Figure 8.4 shows the FAP values after filtering. Table 8.1 summarizes the filters used to postprocess the FAPs. 118

133 Figure 8.3 FAP values for fap3, fap4 and fap34 before filtering 119

134 Figure 8.4 FAP values for fap3, fap4 and fap34 after filtering 120

135 Table 8.1 Filters applied to FAPs # FAP name Pos motion Grp Subgrp Type of filter open_jaw lower_t_midlip raise_b_midlip stretch_l_cornerlip stretch_r_cornerlip lower_t_lip_lm lower_t_lip_rm raise_b_lip_lm raise_b_lip_rm raise_l_cornerlip raise_r_cornerlip shift_jaw close_t_l_eyelid close_t_r_eyelid close_b_l_eyelid close_b_r_eyelid yaw_l_eyeball yaw_r_eyeball pitch_l_eyeball pitch_r_eyeball raise_l_i_eyebrow raise_r_i_eyebrow raise_l_m_eyebrow raise_r_m_eyebrow raise_l_o_eyebrow raise_r_o_eyebrow squeeze_l_eyebrow squeeze_r_eyebrow lift_l_cheek lift_r_cheek head_pitch head_yaw head_roll lower_t_midlip_o raise_b_midlip_o stretch_l_cornerlip_o stretch_r_cornerlip_o lower_t_lip_lm_o lower_t_lip_rm_o raise_b_lip_lm_o raise_b_lip_rm_o raise_l_cornerlip_o raise_r_cornerlip_o down down up left right down down up up up up right down down up up left left down down up up up up up up right left up up down left right down up left right down down up up up up na na na na na na na point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 5-point Gaussian No filter No filter 3-point Gaussian 3-point Gaussian No filter No filter No filter No filter 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 5-point Gaussian 7-point Gaussian 7-point Gaussian 7-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 3-point Gaussian 5-point Gaussian 3-point Gaussian 3-point Gaussian 121

136 Chapter 9 Improvement of the Face Model The face model used in this research has some imperfections. The most prominent is the lack of true depth information for the feature points. This section describes a method for estimating the missing depth information based on feature point tracking and the change in head pose throughout the entire video sequence. 9.1 Missing Depth Information The face model is based on a frontal view of the person being animated. In the setup phase, feature points are defined in screen coordinates, i.e., x- and y-coordinates. Since a frontal view contains no depth information, the z-coordinates for the feature points are based on a "default" face a face that looks satisfactory for most fitted faces. In an exact frontal view, the 122

137 erroneous depth values have no effect on the animation, but when the head is turned around either axis, the erroneous depth values become evident. Figures 9.1 shows frame 790 from the Erin sequence where the head pose is far from neutral. The corresponding animated face is shown in Figure 9.2, the upper teeth appear to have moved vertically. We will now develop a method for estimating the depth value of feature points. Figure 9.1 Original face from frame 790 in the Erin sequence 9.2 Estimating Depth Information For the sake of the discussion, we show how to estimate the depth value of feature point 9.10 (bottom, mid point of upper teeth), but the method applies to other feature points as well. Assume we have a video sequence with N frames showing a face. The method is summarized as follows: 123

138 Figure 9.2 Animated face from frame 790 in the Erin sequence 1. Generate 2D tracking information for feature point 9.10 in screen coordinates for the entire video sequence. Let the screen coordinates for feature point 9.10 in the i th frame in the sequence be (u i, v i ). 2. Generate head pose and head translation information for the entire video sequence. Denote the head pose for the i th frame R i and denote the head translation T i. Let P be the projection matrix that projects a 3D point in world coordinates to screen coordinates. P is common for all frames. The head pose and translation is given by loc1, loc2, loc3, fap48, fap49 and fap50. The projection matrix, P, depends on the render unit and the screen resolution. The translation and rotation matrices are defined as follows: 124

139 R i cos( fap48 i ) sin( fap48 i ) 0 0 sin( fap48 i ) cos( fap48 i ) cos( fap49 i ) 0 sin( fap49 i ) sin( fap49 i ) 0 cos( fap49 i ) cos( fap50 i ) sin( fap50 i ) 0 0 sin( fap50 i ) cos( fap50 i ) (9.1) T i loc1 i loc2 i loc3 i (9.2) 3. Solve the following minimization problem minimize h(x, y, z) N i 1 [u i v i ] t P T i R i [xyz1] t (9.3) 125

140 with respect to (x, y, z), which is the estimate for feature point 9.3. The "norm" of a vector A =[a 1... a n ] t is defined as A a 2 i... a 2 n (9.4) The x, y and z that minimize (9.1) is the best estimate for feature point The first step is accomplished by including u i and v i as part of an extended set of FAPs. The values of u i and v i are then found by minimizing the penalty function for each frame in the video sequence by shifting the upper teeth up or down until they match the original frame. For feature point 9.10, we ignore all frames where the upper teeth are not visible. The second step is already done as part of the FAP generation. We only need to preserve the values for head pose and head translation for use in step 3. In the third step, we assume that (x, y, z) is the ideal position of feature point 9.10 in world coordinates. The point is subject to rotation and translation by T i R i [xyz1] t. The resulting vector is then projected to screen coordinates [u v] t = PT i R i [x y z 1] t. In (9.3) we minimize the norm, which is simply the Euclidian distance (measured in screen coordinates), between the point (x, y, z) projected onto the screen and the tracked feature point given by (u i, v i ), summed over all frames in the sequence. By minimizing the Euclidian distance over all the frames in the sequence we get an estimate for the point (x, y, z). Not only do we get the depth information we originally sought, we also get x and y information. It may be 126

141 necessary to change the x and y values also, if the x and y values were originally determined assuming incorrect depth values. The result using the new values obtained by the described method is shown in Figure 9.3. The location of the upper teeth is important for generating the FAPs for the lips. If the upper teeth are too far down, the optimization algorithm may try to cover them by the lips to decrease the penalty. The result will be incorrect shape of the lips. Figure 9.3 Animated face from frame 790 in the Erin sequence with corrected values for the bottom point of the upper teeth 127

142 Chapter 10 Results The implementation of the FAP generation method described in this thesis was developed with a test video sequence of Erin and later tested on video sequences of Brennan and Cori. All tests are performed in full color with each frame having a resolution of 420 by 480 pixels and 24 bits per pixel. A total of three video sequences, with lengths ranging from 750 to 2250 frames, have been used to test the program. The original videos are captured with a digital video camera at 720 by 480 pixels. A rectangle of 420 by 480 pixels from the center of the original video is used for generating FAPs. FAPs for each frame are found independently from past and future frames. This is by itself quite extraordinary, and shows how robust the FAP generation process is. The program is tested on a PC with an AMD Athlon CPU running at 1533 MHz on a ECS K7S5A motherboard with an IDE Radeon 32 graphics card. On average, 2200 function 128

143 evaluations (one function evaluation corresponds to rendering the animated face once) are required for each frame. The average processing time for each frame is 45 seconds. The next sections show the results when the FAP optimizer program is used to generate FAPs for the video sequences for Erin, Brennan and Cori. For each video sequence, we show the animated face with neutral expression, followed by several images showing the face from the original video sequence in the left column and the corresponding animated face in the right column Erin Erin is the video sequence used during the development and implementation of the FAP generation method proposed in this thesis. Figure 10.1 Neutral face generated by the render unit for Erin 129

144 Figure 10.2 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Erin 130

145 Figure 10.3 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Erin 131

146 10.2 Brennan In the test sequence with Brennan, Brennan moves his eyebrows a lot while talking. His eyebrows are not very easy to distinguish from the skin and are difficult to track with other methods known from the literature. The method presented in this thesis does a good job in tracking his eyebrows. Figure 10.4 Neutral face generated by the render unit for Brennan 132

147 Figure 10.5 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Brennan 133

148 Figure 10.6 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Brennan 134

149 10.3 Cori Cori displays some interesting examples of asymmetry, for example in the lips and eyes. She is also able to raise only one of the eyebrows. Many methods for FAP generation are not able to detect asymmetry in the face. Figure 10.7 Neutral face generated by the render unit for Cori 135

150 Figure 10.8 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Cori 136

151 Figure 10.9 Original face (left column) and the corresponding animated face (right column) for two different frames in the video sequence of Cori 137

An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June

An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June Appendix B Video Compression using Graphics Models B.1 Introduction In the past two decades, a new