Sample Based Texture extraction for Model based coding

Similar documents
Speech Driven Synthesis of Talking Head Sequences

Landmark Detection on 3D Face Scans by Facial Model Registration

HUMAN FACE AND FACIAL FEATURE TRACKING BY USING GEOMETRIC AND TEXTURE MODELS

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

Facial Expression Analysis for Model-Based Coding of Video Sequences

CSE452 Computer Graphics

FACIAL FEATURE EXTRACTION BASED ON THE SMALLEST UNIVALUE SEGMENT ASSIMILATING NUCLEUS (SUSAN) ALGORITHM. Mauricio Hess 1 Geovanni Martinez 2

RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY. Peter Eisert and Jürgen Rurainsky

Synthesizing Realistic Facial Expressions from Photographs

Model-based Enhancement of Lighting Conditions in Image Sequences

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding

Ultra-Low Bit Rate Facial Coding Hybrid Model Based on Saliency Detection

Muscle Based facial Modeling. Wei Xu

Face Tracking for Model-based Coding and Face Animation

Depth Estimation for View Synthesis in Multiview Video Coding

Image Coding with Active Appearance Models

Figure 1: Representation of moving images using layers Once a set of ane models has been found, similar models are grouped based in a mean-square dist

A Morphable Model for the Synthesis of 3D Faces

0. Introduction: What is Computer Graphics? 1. Basics of scan conversion (line drawing) 2. Representing 2D curves

Today s lecture. Image Alignment and Stitching. Readings. Motion models

Occluded Facial Expression Tracking

AAM Based Facial Feature Tracking with Kinect

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes

DEEP LEARNING OF COMPRESSED SENSING OPERATORS WITH STRUCTURAL SIMILARITY (SSIM) LOSS

Quality improving techniques in DIBR for free-viewpoint video Do, Q.L.; Zinger, S.; Morvan, Y.; de With, P.H.N.

View Synthesis Prediction for Rate-Overhead Reduction in FTV

A New Algorithm for Shape Detection

Geometry-Assisted Image-based Rendering for Facial Analysis and Synthesis

Performance Driven Facial Animation using Blendshape Interpolation

Motion Vector Coding Algorithm Based on Adaptive Template Matching

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India

FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING

Abstract We present a system which automatically generates a 3D face model from a single frontal image of a face. Our system consists of two component

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data

Mesh Based Interpolative Coding (MBIC)

PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION

arxiv: v1 [cs.cv] 2 May 2016

Image Based Rendering. D.A. Forsyth, with slides from John Hart

An Algorithm for Seamless Image Stitching and Its Application

Enhanced Active Shape Models with Global Texture Constraints for Image Analysis

FACIAL ANIMATION FROM SEVERAL IMAGES

Camera calibration. Robotic vision. Ville Kyrki

Image stitching. Digital Visual Effects Yung-Yu Chuang. with slides by Richard Szeliski, Steve Seitz, Matthew Brown and Vaclav Hlavac

Computer Animation Visualization. Lecture 5. Facial animation

Shweta Gandhi, Dr.D.M.Yadav JSPM S Bhivarabai sawant Institute of technology & research Electronics and telecom.dept, Wagholi, Pune

Predicting 3D Geometric shapes of objects from a Single Image

Statistical Learning of Human Body through Feature Wireframe

Registration of Dynamic Range Images

Super-resolution Image Reconstuction Performance

Anisotropic Methods for Image Registration

Mosaics. Today s Readings

Iterative Estimation of 3D Transformations for Object Alignment

Local Image Registration: An Adaptive Filtering Framework

Body Trunk Shape Estimation from Silhouettes by Using Homologous Human Body Model

CS 231. Deformation simulation (and faces)

Object Modeling from Multiple Images Using Genetic Algorithms. Hideo SAITO and Masayuki MORI. Department of Electrical Engineering, Keio University

Video Mosaics for Virtual Environments, R. Szeliski. Review by: Christopher Rasmussen

EE795: Computer Vision and Intelligent Systems

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Key words: B- Spline filters, filter banks, sub band coding, Pre processing, Image Averaging IJSER

3-Dimensional Object Modeling with Mesh Simplification Based Resolution Adjustment

3D FACE RECONSTRUCTION BASED ON EPIPOLAR GEOMETRY

Optimal Segmentation and Understanding of Motion Capture Data

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Dynamic Geometry Processing

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

Image-based Motion-driven Facial Texture

Bit-Plane Decomposition Steganography Using Wavelet Compressed Video

Component-based Face Recognition with 3D Morphable Models

Mesh Morphing. Ligang Liu Graphics&Geometric Computing Lab USTC

3D Object Scanning to Support Computer-Aided Conceptual Design

SYNTHESIS OF 3D FACES

1. Introduction. 2. Parametrization of General CCSSs. 3. One-Piece through Interpolation. 4. One-Piece through Boolean Operations

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

EE795: Computer Vision and Intelligent Systems

Render. Predicted Pose. Pose Estimator. Feature Position. Match Templates

A COMPETITION BASED ROOF DETECTION ALGORITHM FROM AIRBORNE LIDAR DATA

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

Dynamic Obstacle Detection Based on Background Compensation in Robot s Movement Space

A Survey of Light Source Detection Methods

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA

The Template Update Problem

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Feature Detection and Tracking with Constrained Local Models

Facial Feature Points Tracking Based on AAM with Optical Flow Constrained Initialization

Super-Resolution on Moving Objects and Background

Analyzing and Segmenting Finger Gestures in Meaningful Phases

Expanding gait identification methods from straight to curved trajectories

Colored Point Cloud Registration Revisited Supplementary Material

Shape and parameter optimization with ANSA and LS-OPT using a new flexible interface

Non-Differentiable Image Manifolds

ISSN: (Online) Volume 2, Issue 5, May 2014 International Journal of Advance Research in Computer Science and Management Studies

Supplementary Material for Synthesizing Normalized Faces from Facial Identity Features

DEVELOPMENT OF POSITION MEASUREMENT SYSTEM FOR CONSTRUCTION PILE USING LASER RANGE FINDER

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

Transcription:

DEPARTMENT OF APPLIED PHYSICS AND ELECTRONICS UMEÅ UNIVERISTY, SWEDEN DIGITAL MEDIA LAB Sample Based Texture extraction for Model based coding Zhengrong Yao 1 Dept. Applied Physics and Electronics Umeå University SE-90187, Umeå Sweden e-mail: Zhengrong.Yao@tfe.umu.se Haibo Li 2 Dept. Applied Physics and Electronics Umeå University SE-90187, Umeå Sweden e-mail: haibo.li@tfe.umu.se DML Technical Report: DML-TR-2004:1 ISSN Number: 1652-8441 Report Date: April 2, 2004 1 2

Abstract This paper addresses an important issue in Model-based coding, namely, how to extract the facial texture. The traditional method is to utilize a cyber scanner to get the user s facial texture together with personal 3D head model, and use the extracted texture and model in model based coding. It is an expensive and complicate method. In this paper, we suggest to use sample based method to incorporate both motion extraction and facial texture extraction within one coding loop. This is proved to be an effective and cheaper way for developing Model based coding system.

1 Introduction Model-based coding (MBC) is a promising video coding technique aiming at very low bit rate video transmissions [3]. Major contributions in this field include [8][2][4]. The idea behind the technique is to parameterize a talking face based on a 3D face model. Parameters describing facial movements and texture information are extracted at the transmitter side. The extracted parameters are then sent to the receiver to synthesize the talking head. Very high compression is achieved since only high-level, semantic motion parameters are transmitted. The quality of the synthesized video is mainly up to the quality of texture mapping, which in turn is determined by the extracted model information and facial texture. Texture mapping were firstly introduced into MBC in later 80s [8],[2], after that did people realize the potential usage of MBC. The traditional method of personal texture extraction often include the following steps [6]: Uses laser scanner to get 3D model shape. Uses cyber scanner to get facial texture, which unwrap the facial image from a cylindrical coordinates onto a 2D plane (as shown in Fig. 1). This is often referred as 2.5D texture. The 3D model is projected onto cylindrical coordinated for fitting. Figure 1: The facial texture used in MBC is often extracted with help of a cyber scanner. In traditional method of texture extraction, texture is estimated from huge amount of data such as images got by cyber scanner. This method is often tend to be expensive and complicated due to expensive equipments (such as range scanner, cyber scanner), careful image registration, well-controlled environment, etc. In this paper, we suggest to use sample based method for texture extraction. The extraction of motion parameter and texture parameter are performed in one coding loop. This method is easier and cheaper to implement without degrading the performance of synthesis video. 2 Texture in MBC The problem of extraction of texture could be defined as: given the collection of images (from different views with known parameters), compute for each point m on the face model its texture color T(m). The texture value T(m) at each point on face model is a combination of the scanned images [7]: k T(m) = gk (m)c k (x k,y k ). (1) k gk (m) The C k is the color at each pixel of the k-th image, and (x k,y k ) are the image coordinates of the projection of m onto the k-th image plane. The g k (m) is the weight of k-th image which specify how to blend the images for the texture. The key of texture extraction is to calculate the blending function which specify the weights of 1

scanned images, this often involve the registration of the 3D model with images and interpolation of texture for pixel. This is an expensive and complicate process, in many cases, the human involvement is inevitable. Since the synthesized video quality is largely depended on the extracted texture, it is a key step in MBC system. In traditional MBC scheme, the block diagram of the analysis process which happens at encoder side is shown in a as in Fig. 2. The texture extraction is often done off-line. When analyzing video for the motion parameter, often a Analysis-by-synthesis (ABS) method is used. The extracted texture information is involved in the analysis process at the encoder side, as shown in Fig. 2. Scanner data 2D texture Parameters Input Image 3D motion estimation Figure 2: The traditional MBC scheme at encoder side. The texture image is extracted from scanner image sequence off-line. When the input image is subjected to a 3D motion estimator, the texture information is often utilized. This method has proved to be expensive and complicate. Given some sample images, one alternative scheme for texture extraction is shown in Fig. 3. The 3D motion estimation and the texture extraction could be performed jointly within one coding loop. The difference compared to the traditional way is that the intermediate step of producing a normalized texture is dropped. By normalized we mean that the texture is unchanged when used in MBC system. This change is mainly due to the consideration of drop the constraint of using expensive equipment to acquire the texture information. The texture extraction is now based on some sample images. To compare the properties of these two scheme, we assume that both scheme use same sample images, input image and face model. We refer the traditional method as separate method where texture extraction is separated from usage. And we refer the suggested one as jointed method. 3 Motion estimation and texture extraction using ABS In order to jointly estimate the 3D motion parameter and texture parameters, some pre-knowledge about the face model has to be got. In our mind, this could be done off-line, where the user s face model is acquired and the user could perform some manual fitting of the face model on sample images to produce some template images (with known face pose parameters). This knowledge could then be used in a ABS method to estimate the 3D motion parameters of the face object, given a new input face image. Different ABS methods are available for such task. 2

Sample data Input Image Motion estimation & Texture extraction Parameters Figure 3: The new MBC scheme at encoder side. The input image is subjected to motion estimation and texture mapping within the same coding loop. 3.1 Motion estimation with analytical ABS In ABS method, to estimate the motion parameter p, it is possible to warp a template image I 0 according to the guessed parameter p, thus synthesize an image I 0 (W(x;p)). An error function is then defined to evaluate the discrepancy between the warped image I 0 (W(x;p)) and the input image I t (X): E(p) = x [I t (x) I 0 (W(x;p))] 2 (2) Let e = I t (x) I 0 (W(x;p)) (3) denote the error image. The goal is to analyze the motion parameter p which minimize the error function, thus comes the name Analysis-by-synthesis. The warping function W(x;p) is the transform is in fact the texture mapping in graphics literatures. Since the pixel values I(x) at pixel x are in general non-linear, the problem is in general a non-linear optimization problem. The widely used method for solving this problem is through an iterative way. Suppose the initial guess of the parameter at time t is known as p t, instead of try to solve p directly, the iterative method try to solve the increment to the parameter p, thus the following expression is minimized with respect to the parameter p: the parameters are updated: [I t (x) I 0 (W(x;p+ ))] 2 (4) x p t+1 = p t + p. (5) The key is then how to find the p in each iteration. A typical method is to use the Gauss-Newton gradient descent algorithm to solve this problem. The analytical method is a suitable tool for tracking problem. Different analytical methods have been successfully used for model based 3D motion tracking in video [7][5]. 3

3.2 Motion estimation with sabs Although the analytical method often has its sound theoretic base, the implementation of analytical method has been found week to difficult motion such as expression [7]. The main reason is that in deduction of the analytical method, there are a lot of factors that are hard to model into the analysis accurately. The key of each iteration in ABS is to calculate the updating of parameters p. It is possible to first produce the p and check if it lead to a better fitting, and decide if this produced p should be accepted, this is a perturbation based method. The performance of this method is largely depended on the perturbation scheme. Most calculation is spent in the warping process, or the texture mapping process. The fast developing computer graphics nowadays can do the texture mapping by hardware with extremely high speed. The texture mapping within the ABS could thus be done using the computer graphics technique. The warping could happens in hardware (i.e. graphics card).. I t (x) E 1 Analysis p,0 0, p p t+1 E 2 p t I 0 (W(x;p + p)) Synthesis p t + p p I 0 (W(x;p)) Synthesis Figure 4: Motion estimation using ABS with Simulated Annealing (sabs). p t D There is no need for explicit calculation of W p as done before. The parameter estimating is carried out in following turn: p E p (6) thus is much more simpler and efficient. Depend on whether the p is accepted, the final parameter updating is p or 0 as depicted in the figure. The perturbation p is produced by a random number generator. In order to guarantee a global optimal solution, the scheme of this random number generator is the key. The well-known Simulated Annealing (SA) is embedded in the searching loop for the p, and the ABS with SA is shorted as sabs in later parts in this paper. The sabs method is shown in Fig. 4. 3.3 Jointly texture extraction Using sabs the motion parameter of the moving face could be extracted. With a little modification the texture parameter could also be extracted at the same time. The overall scheme is shown in Fig. 5. The sample images with known face pose are used in each coding loop. They are used for two purposes. First, each sample image is used in the sabs step to estimate the motion parameters (rotation, translation) of the face object, Fig. 5 shows a example of using three sample images. Then, a least-square (LS) solution is used to estimate the texture parameters needed for synthesis process. 4

sabs W 1 sabs W 2 LS R, T, W i sabs W 3 Input Image Motion estimation Texture extraction Parameters Figure 5: Jointly extraction of motion parameters and texture parameters. The texture T(m) at each point now is a blended version of the sample images (not from cyber scanner). For a triangular meshed face model, as the face model Candide [1] used in this work, the blending function is estimated based on each triangular patch using LS solution. Let A = (P 1,P 2,...,P N ) where P i is the column vector consists of the pixel value in texture i. Let B = (Q 1,Q 2,...,Q N ) T represent the column vector of the real pixel value from the input image, W = (w i,w 2...,w N ) is the weights (blending function). We can write: The LS solution gives the blending function as: AW = B, (7) W = (A T A) 1 A T B. (8) The main difference of the jointed method with separate method is the extraction of texture. In separate method, the texture is extracted from sample images and has to be done only once. The texture is thus a normalized texture. In jointed method these extraction is embedded in the motion estimation process and has to be done for each input image. The texture could thus be referred as sample based texture. Although more complicate, in theory the jointed method should outperform the separate method. This is because for the separate method, the texture is got at the cost of normalizing all sample images, which is a process of losing information. For the jointed method there is no normalization operation, so there is less problem of losing information. Another advantage of the jointed method is that it does not rely on expensive equipment such as cyber scanner and is a suitable choice to set up a small scale MBC system. The sabs is highly rely on the speed of calculation speed, which could be foreseen as small challenge in the future. 4 Experiment and conclusion In order to validate our analysis, we conducted an experiment using both separate method and jointed method to extract parameter for a MBC encoder. After the extraction of the motion parameter and texture parameter, the 5

References video is reconstructed using both the normalized texture (come from separate method) and the sample based texture (come from jointed method). A video sequence of length of 50 frames is collected. Beside, three sample image is collected and manually fitted off-line. For the separate method case, the normalized texture is calculated from the manually fitted sample images. For the jointed method, the texture extraction is based on only these three images. The three sample images are shown in Fig. 6. Figure 6: The three sample images used for texture mapping. The comparison experiment result is shown in Fig. 7, where the solid line represent the result PSNR using sample based texture, the dash dot line shows the result PSNR curve using normalized texture. There is an average of 3dB difference for the two PSNR curves. The motion of head varies big in the whole sequence, several sample head image is attached in the figure to show the head pose. Our analysis and experiment shows that using sample based texture outperforms using the normalized texture. The development of hardware and computing speed could help to simplify the parameter extraction by using perturbation based method. This suggests a cheaper way to set up the MBC system. References [1] J. Ahlberg. Model-based coding - extraction, coding, and evaluation of face model parameters, dissertation no. 761, 2002. [2] K. Aizawa, H. Harashima, and T. Saito. Model-based analysis synthesis coding system for a person s face. Signal Processing: Image Communication, 1(2), Oct. 1989. [3] R. Forchheimer, O. Fahlander, and T. Kronander. Low bit-rate coding through animation. In Proc. International Picture Coding Symposium PCS 83, pages 113 114, Mar. 1983. [4] H. Li, A. Lundmark, and R. Forchheimer. Image sequence coding at very low bitrates: A review. 3(5):589 609, 1994. [5] Haibo Li. Low bitrate image sequence coding. Technical Report PhD dissertation 318, Dept. Elec. Eng., Linkoping University, 1993. [6] J. Noh and U. Neumann. A survey of facial modeling and animation techniques, 1998. [7] Frdric Pighin, Richard Szeliski, and David Salesin. Modeling and animating realistic faces from images. International Journal of Computer Vision, special issue on video computing, 50(2):143 169, 2002. [8] W.J. Welsh. Model-based coding of moving images at very low bit rates. In Proc. Int. Picture Coding Symp., 1987. 6

References Figure 7: The result PSNR after texture mapping using normalized texture (dash doted line) and sample based texture (solid line) for a test video sequence. 7