Gwenaëlle MARQUANT, Stéphane PATEUX and Claude LABIT IRISA/INRIA - Campus Universitaire de Beaulieu RENNES Cedex - France

Size: px

Start display at page:

Download "Gwenaëlle MARQUANT, Stéphane PATEUX and Claude LABIT IRISA/INRIA - Campus Universitaire de Beaulieu RENNES Cedex - France"

Sharleen Willis
6 years ago
Views:

1 Mesh-Based Scalable Video Coding with Rate-Distortion Optimization Gwenaëlle MARQUANT, Stéphane PATEUX and Claude LABIT IRISA/INRIA - Campus Universitaire de Beaulieu RENNES Cedex - France ABSTRACT In this paper, we present a mesh-based motion estimation scheme for image sequence. Nodal motion vectors optimization is performed by using a multi-resolution differential method. Because our final aim is mesh tracking throughout a video sequence with optimized reconstruction, neither backward tracking nor forward tracking is well suited. One motivation of our work is to take advantage of both forward tracking (which enables tracking) and backward tracking (for its efficiency) in a backward in forward method. For the optimization of the nodal motion vectors, we also propose a novel approach with multi-resolution and several hierarchy levels, which, in addition, makes it possible scalable representation. This is achieved with a progressive representation defined according to a rate distortion criterion. Results are presented to illustrate the proposed methods. Keywords: Object-based video coding, Mesh-based motion tracking, Rate/distortion optimization, Mixed hierarchical/multiresolution motion estimation, Scalability. 1. INTRODUCTION Recent development in video coding research deals with the use of a time-varying mesh for video representation and associated mesh-based model for motion description. New functionalities of video application, such as search and editing, but also transmission or storage ability, require a flexible framework of video representation. Mesh-based representation has been more and more employed for these purposes. Various approaches have emerged recently in order to deal with hierarchical and/or adaptive and/or progressive meshes ( 1{3 ). In addition, such mesh-based motion estimation schemes allow to characterize local deformations between two video frames more accurately, that is, without blocking artifacts. Moreover, transmitted bit-rates have to be reduced as much as possible in order to adapt to the network available bandwidth. For this purpose, most previous works in inter-frame coding deal with adaptive node sampling according to image content ( 1 ). Nevertheless, in the proposed approaches, no optimization of a compromise between distortion and bitrate is considered: the representation coding cost is often stated but not taken into account as a constraint. Compared to these methods, our paper presents a motion estimation scheme for image sequences using a triangular mesh and nodal motion vectors optimization, further improved by a backward in forward method. Our approach proposes for considering a coupled hierarchical/multi-resolution representation of 2D meshes. In addition, this representation is adaptive in such a way that its criterion optimizes both the coding cost and the tracked image rendering. Even if our major aim is mesh-based motion estimation, we impose several constraints: motion estimation accuracy, compression efficiency, suitability for a high level content-based representation, quality scalability, progressive representation. Further author information: G. Marquant: gwenaelle.marquant@irisa.fr S. Pateux: stephane.pateux@irisa.fr C. Labit: claude.labit@irisa.fr

2 This paper is organized as follows. In section 2, we present the mesh and the constraints that are requested. Then, in section 3, motion estimation with the backward in forward solution is explained. Section 4 reports the optimization of motion vectors. Section 5 explains how to get a scalable mesh representation. Finally, the last section presents experimental results and concluding remarks. 2. MESH MODEL DEFINITION Image rendering coding performances improvement needs obviously a data simplification while assuming a maximum subjective quality by mean of an accurate motion estimation. For several reasons (reported in 4 ), the most adapted mesh seems to be an adaptive hierarchical regular mesh: it combines easy coding of a regular triangulation with heterogeneity of irregular mesh nodes Mesh model The mesh we chose follows from previous works and is motivated by several constraints we are requested: motion description accuracy, low bit cost, scalability by making use of an embedded representation and adaptive location of nodes points that places more motion vectors over complex motion areas. Thanks to lots of discussions in literature, all these reasons lead us to choosing a regular triangular mesh because of its lower coding cost. Practically, the first image is coded using a quincunx triangular mesh on which an affine interpolation is achieved. In addition, such a mesh is hierarchical and adaptive. This hierarchy is adaptively refined by regular splitting according to a rate/distortion criterion, while enabling a simple topological embedding of various levels triangles (cf Fig.1 and 6). a b c Figure 1. (a)initial mesh, (b) first level of balanced mesh before deformation due to motion estimation, (c) Final adaptive mesh before deformation due to motion estimation Scalability We want our method to be scalable because it seems to be very useful to enable access to a given precision level. Furthermore, this level can be adapted to the image content, to the desired representation, to the admissible level with respect to the available network bandwidth, or to a chosen optimization criterion (rate/distortion for instance). With this scalability aim in view, we are able to increase representation quality of the motion estimation across mesh levels by locally adapting data density. Node points sampling rate is adapted to the local motion complexity of the image. In other words, our method makes it possible to control the motion vector density, control the mesh quality and low complexity by the use of the coarse-to-fine approach Triangulation splitting criterion In our approach, the hierarchical adaptive mesh is obtained by successive triangles splittings according to a simple criterion. Various solutions have already been studied to increase node density in critical areas: for more details, reader can report to 4. However, none of these techniques guarantees to obtain optimal results in the rate/distortion sense. Logically, we decided to split a triangle at level i if several successive splittings up to level i max lead to a significant fall of the squared error. As a result, this criterion takes into account the squared errors magnitudes SE and it can be considered according to a relative (rate-variation / distortion-variation) criterion for adaptive mesh reconstruction.

3 3. THE BACKWARD IN FORWARD SOLUTION 3.1. Classical motion estimation techniques for meshes Two methods allow to predict deformations of a mesh throughout a video sequence: forward or backward mesh tracking. On the one hand, forward tracking requires the mesh at time t?1 and looks for its matching position in frame t. On the other hand, backward tracking requires initial position of the mesh at t and looks for its matching position in frame t?1. The choice between backward or forward node tracking deserves to look at the pros and cons about each method. A lot of mesh based motion estimation methods have been widely tested: various block matching methods ( 5 ), differential methods in forward tracking ( 1 ) and backward tracking ( 6 ). Thanks to its easy implementation, the forward node tracking is the most widely used method in order to predict an image. However, this tracking scheme suffers from a few drawbacks: the estimation aim is to optimize the reconstruction at time t?1 even though we want to reconstruct t. Alternately, backward tracking optimizes reconstruction at t but involves knowing the mesh at time t. Therefore, in backward no tracking is feasible while keeping the same structure, unless all the sequence is available and the last frame can be considered as the one with the initial mesh, which is backward tracked back to the first frame. For all these reasons, we propose to improve the forward tracking by analyzing at time t?1 (as in the forward scheme), while simulating the analysis at t (backward) The backward in forward In forward tracking, we aim at minimizing: E F = P = P (x;y)2 t?1 (x;y)2 t?1 (I(x; y; t?1)? I(x +dx; y +dy; t)) 2? F (x; y; t? 1) 2 (1) where denotes the image domain definition or the object mask (cf.4.2) and I(x +dx; y +dy; t) denotes the intensity evaluated at the pixel (x; y) displaced by the motion vector (dx; dy). Alternately, backward tracking aims at reconstructing the current image t from the previous one t?1, and this tracking method supposes that mesh (t) is known. In backward, we aim at minimizing: P E B = = P (x 0 ;y 0 )2 t (I(x 0 ; y 0 ; t)? I(x 0?dx; y 0?dy; t?1)) 2 (x 0 ;y 0 )2 t Equations (1) and (2) can also be written in the continuous case: E B = ZZ? B (x 0 ; y 0 ; t) 2 (2)? B (x 0 ; y 0 ; t) 2 dst S t and E F = ZZ S t?1? F (x; y; t? 1) 2 dst?1 where ds t?1 and ds t represent the surface element, that is to say one pixel. Moreover, F (x; y; t) = B (x 0?dx; y 0?dy; t?1). This implies that the integration over S t can be accomplished over ds t?1 by a change of variables, as ZZ? B 2 ZZ? (x; y; t) dst = F (x 0 ; y 0 2 ; t?1) J(x 0 ; y 0 )ds t?1 (3) S t S t?1 where J(x 0 ; y 0 ) is the Jacobian of the affine transformation between the two related triangles and is defined as J(x 0 ; y 0 )

4 Equation (3) looks like the computation of the error in a forward mapping, excepted here the Jacobian which is easy to compute. Finally, equation (3) can be written in the discrete form: E B = X (x;y)2 t?1? F (x; y; t? 1) 2 J(x 0 ; y 0 ) The Jacobian describes the infinitesimal area ratio of the deformed triangle to the original at point (x; y). Consequently, it can be shown that J(x 0 ; y 0 ) measures the ratio between post- and pre-deformations triangles. To conclude, nodal tracking algorithms are developed by minimizing the prediction errors measured over the original triangle at time (t?1). Wang and Lee 7 used a similar change of variable with the use of a mapping function into a master element between the two patches to match. However, they concluded that the two Jacobians were equal, and for this reason did not use further the area ratio. Furthermore, we propose to integrate at time (t?1) rather than at an other triangle that may imply sampling problems Affine warping 4. OPTIMIZATION OF MOTION VECTORS To get the optimum node positions for the grids of the current and the previous frames, we have to perform a spatial transformation between patches of these two grids in order to predict as good as possible the current frame. The matching transformation is applied independently for each patch. Several warpings are commonly used: bilinear between quadrangles, affine between triangles,... However, some assumptions about the mapping transform can be pointed out as the fact that the interiors of the patches will be mapped to the corresponding interiors in the prediction frame using a geometrical transform. Such assumptions guarantee that the resulting prediction frame, reconstructed by transforming the patches in the previous image to the patches in the current one, will be spatially continuous, which is of great importance in the forward matching process. This transform involves finding the spatially corresponding position of each pixel from an image to another one. In this way, the resulting coordinates are generally not integers and a final bilinear interpolation is used in order to get the pixel value. In our study, we use a triangular mesh which implies affine warpings between triangular patches Motion estimation method Scalable transmission and representation of video is actually of major importance, in order to be adapted to the available bandwidth of the network. In this way, we propose to use several hierarchical levels for motion estimation. In order to minimize equation (3), we use a Gauss Newton algorithm. Moreover, a multi-resolution strategy is used. This implementation is optimized over a Burt and Adelson multi-resolution pyramid (Fig.2) from coarse to fine level, according to the hierarchical levels. In this manner, large displacements are taken into account and mesh overlappings are reduced by motion estimation only on triangles with a sufficient area at the processed resolution. So, our solution enables to get the optimal motion adapted to the received hierarchical level, and not to the complete sent hierarchy. level L motion optimization low pass filtering downsampling projection motion optimization projection level 0 motion optimization Figure 2. multi-resolution strategy Fig.3 shows various ways that can be performed for motion estimation : by mean of multi-resolution over the image, for a given mesh ( 6 ) : method (a) in figure 3; by mean of hierarchical meshes over the image, at full image resolution : method (b) in figure 3;

5 by mean of mixed hierarchical/multi-resolution solutions over the image : method (c) and (d) in figure 3. Nevertheless, we can notice that solutions (c) and (d) have different functionalities: (c) makes it possible to have available full resolution image reconstruction for each hierarchy level and consequently adaptive reconstruction is feasible; during the motion estimation process, (d) takes better into account the triangle area (due to the hierarchical level) according to the resolution level. However, this solution only provides a full resolution image reconstruction for the last hierarchy level which does not make possible scalable transmission. HIERARCHY b c RESOLUTION H0R0 (full resolution) a H1R0 d H2R0 H0R1 H1R1 H0R2 Figure 3. Combination of multi-resolution and hierarchy levels 4.3. Definition of the optimization problem For each position (x; y) in images I 1 and I 2 at time t 1 and t 2 we associate grey level values I 1 (x 1 ; y 1 ) and I 2 (x 2 ; y 2 ). If we consider the basic initial assumption that the brightness of a particular moving point is constant in time, so, if dx and dy represent the motion of a point (x; y) between I 1 and I 2, the reconstructed image b I2 satisfies : 8(x; y) b I2 (x; y) = I 1 (x + dx; y + dy) (4) We introduce a mesh-based motion estimation scheme for image sequence : given the motion of the node points, we are able to reconstruct the image. The major problem consists in estimating as well as possible the motion vectors between frames in order to encode and reconstruct an image given the previous one. Because there is a mutual influence between mesh nodes, we can express the displacements as a weighted sum over the displacements of the nodes j : dx(x; y) = P j w j (x; y)dx j and dy(x; y) = P j w j(x; y)dy j. The functions w j represents the barycentric coordinate of the j th vertex. It is a function of all possible positions (x; y) in the image support. It can be pointed out that: w j (x; y) = = 0 If (x; y) 62 node j surrounding hexagon 6= 0 Else In this manner, considering that functions have limited supports, the optimization constraint of the j th node value only depends on a limited number of coefficients in the surrounding hexagon around this node. If w j and w i relative supports do not intersect, then w j w i = 0. Consequently, the system to solve is a sparse matrix and can be quickly processed with iterative techniques. In order to optimize the problem, we have to minimize the error E(x; y) = I 2 (x; y)? b I2 (x; y) over the whole image, that is min? X (x;y)2i 2 E 2 (x; y) {z } E The Gauss Newton algorithm leads to the following system to solve : (5)

6 j = 0,, X I 2 (x; y)? I2 b (x; y) {z } (x;y)2 I 2 V ariation to linearize X (x;y)2 I 2 X?! 5 x I 1 (x + dx; y + dy) w j (x; y) = 0?!5 x I 1 (x + dx; y + dy) w j (x; y) w l (x; y) l 5 x I 1 (x + dx; y + dy) dx l + 5 y I 1 (x + dx; y + dy) dy l = X (x;y)2 I 2 w j (x; y) 5 x I 1 (x + dx; y + dy) I 2 (x; y)? b I k 2 (x; y) (6) Furthermore, on both sides of these equations, we can add the jacobian term J(x; y) introduced by the backward in forward method. Such an hierarchical/multi-resolution optimization leads to significant quality improvements: multi-resolution takes into account large displacements, differential approach outperforms block matching methods and the Jacobian approach also improves usual backward or forward tracking methods. Furthermore, this can be achieved by a fast iterative solving of the large sparse linear system above mentioned (equation 6). It can be pointed out that our error minimization only handles pixels inside the video object mask. This implies that motion estimation is object-based. Additionally, a constraint region ( 8,1 ) is defined for each node to avoid mesh overlappings. Unlike other existing solutions ( 1,3 ), we propose the node locations at coarser levels not to be updated, even though refreshment is performed at finer hierarchy levels to increase motion accuracy. So, our solution enables to get the optimal motion adapted to the received hierarchical level, and not to the complete hierarchy sent. In this way, successive mesh levels in the hierarchy are not geometrically embedded but topologically embedded. Implementations show that this solution allows to outperform mono-resolution and/or single level motion estimation and is adapted to progressive/scalable transmission. 5. OPTIMUM RECONSTRUCTION IN THE RATE DISTORTION SENSE In order to control this quality scalability, we also propose to take the rate-distortion criterion into account (details in 4 ). This is of key importance while transmitted bit-rates have to be reduced as much as possible in order to adapt to the network capabilities. Therefore, this additional representation scheme authorizes to reconstruct the video sequence as best suited anytime, for a given rate or distortion Adaptive mesh Thanks to lots of discussions in literature ( 4,3 ), we have chosen a regular triangular mesh because of its lower coding cost. This mesh is then adaptively refined by regular triangles splitting according to an error criterion, in an embedded pyramid way. This criterion in inter frame tracking is the companion piece to the one we have introduced in intra-frame coding ( 4 ). This method allows control of the node density in high motion areas, control of the mesh quality and low complexity. As regards the adaptive concept, we suggest to split a triangle if its iterative division - thus more motion vectors in this area - leads to a significant fall of its squared error between the considered level and the finest level, that is, less errors in motion estimation. Thus, the adequate criterion is a compromise between this and the splitting coding cost. Classically, in a rate/distortion approach, a Lagrangian multipliers technique is used. It amounts to minimize a R + D criterion for a given, where R is the bit rate and D is the distortion 9. The best suited is then searched to reach a desired bit rate or quality. Our method is not Lagrangian multipliers-based and avoids the -search step. We propose an iterative method by using at each step the splitting that maximizes the criterion jdj R. This technique can be interpreted as a Lagrangian minimization scheme, with a progressive rise of (in the Lagrangian minimization, we want R + D < 0, that is jdj R > 1, thus a sorting according to a criterion jdj R ). Consequently, each representation level is defined by the list of successive triangles to split. Therefore, each representation level contains all size triangles. Given this triangles divisions sorting, the coder is able to adapt to the desired quality or to the available capabilities of the network. For a given representation level, the progressive coding can be improved by interlacing the structure informations and nodes motion informations: firstly, nodes values in the initial grid are transmitted, then, for each division information, new created nodes values are transmitted. Therefore, mesh triangles are sorted in a specific way that authorizes to reconstruct the image at best anytime, for a given rate or for a desired quality.

7 5.2. Adaptive warping The image is reconstructed once the mesh is positioned. It can be pointed out that when we have a triangle that keeps close to finer level triangles (for instance (c) in figure 1, with a regular embedded mesh), the affine used interpolation ensures C 0 continuity between triangles by using fictitious triangles in the larger triangle to interpolate continuously displacements with respect to the smaller triangles that stand opposite to it (cf figure 4). With regard to new nodes in following hierarchy levels, a b Figure 4. (a) Original hierarchical adaptive mesh. (b) Original hierarchical adaptive mesh and the added fictitious triangles (dotted lines) : this ensures C 0 continuity across different hierarchy levels triangles. new nodes in hierarchy h + 1 located in the middle of an arc between two nodes in hierarchy h : its initial motion (in hierarchy h + 1) is set equal to the mean of its two parent nodes in hierarchy h. Obviously, in a rate/distortion context, the transmitted information must be as relevant as possible. That is the reason why an adaptive mesh is gradually transmitted thanks to a triangles sorting. This triangles sorting is obtained by comparing the influence of splitting one triangle into four smaller ones to others feasible splittings (cf subsection 5.1). However, when one triangle has been chosen for division, its six children nodes can move. It involves that non-split neighbor triangles are influenced and put out of shape. Figure 5 shows that affine mapping for the original ABC triangle must be performed with four simultaneous smaller ones triangles Age, efg, Bef and fgc. B B e f C A C A g C Figure 5. (on the left) Original triangle ABC. (on the right) Original triangle ABC influenced by neighbor smaller triangles : in order to reconstruct the image, triangle ABC has to be split into 4 smaller fictitious ones : Aeg, efg, Bef and fgc. Figure 6 illustrates the embedded point of view : when one triangle is divided into children triangles (whom are adjusted to better predict motion in this area), triangles A eg, efg, B ef and fgc are said topologically embedded in ABC even if they are not geometrically embedded. 6. EXPERIMENTAL RESULTS This section shows the experimental results we obtained for all the methods we have just presented, where an initial uniform triangular mesh is used. We measured the peak signal-to-noise ratio of warped frames with respect to the original frames The backward in forward method We compare the performances of forward and backward in forward tracking. The visual tracking results are demonstrated in figure 7 which represents the original Lena frame (considered as frame at time t? 1) and an artificial warping of Lena

e A B g C f Figure 6. (on the left) Original mesh with hierarchy 0. (in the middle) Adaptive mesh and the influence area when the central triangle has been split.

Fictitious triangles (dotted lines) ensure C 0 continuity across different hierarchy levels triangles. forward backward-forward mono multi mono multi artificial-lena 19.02 25.27 19.14 25.40 Suzie 30.

Each entry gives the PSNR (db) of a rendered video frame, using either mono-resolution or multi-resolution (three levels) motion estimation, for the two different estimation schemes.

The tracked meshes are overlaid on the original frame at time t?1 and the motion compensated frame at time t.

8 e A B g C f Figure 6. (on the left) Original mesh with hierarchy 0. (in the middle) Adaptive mesh and the influence area when the central triangle has been split. (on the right) Adaptive mesh : Original mesh hierarchy 0 with children triangles with hierarchy 1 for one triangle. Fictitious triangles (dotted lines) ensure C 0 continuity across different hierarchy levels triangles. forward backward-forward mono multi mono multi artificial-lena Suzie Table 1. Mesh tracking results for artificial warped Lena frame and Suzie (frames 51-52). Each entry gives the PSNR (db) of a rendered video frame, using either mono-resolution or multi-resolution (three levels) motion estimation, for the two different estimation schemes. The entries are the results obtained at the finest full-detail level. (considered as frame at time t) generated with a non-piecewise affine motion field. The tracked meshes are overlaid on the original frame at time t?1 and the motion compensated frame at time t. The results show that backward in forward tracking is better than forward tracking in case there is complex motion in the video sequence, and at least equal if motion is simple. In the same way, this scheme is improved by making use of multi-resolution, especially to deal with large motion.to evaluate the performance of the multi-resolution versus mono-resolution motion estimation, and forward versus backward in forward, we measured the peak signal-to-noise ratio of warped frames with respect to the original frames. Different PSNR results for artificial-warped-lena and frames from the Suzie sequence are summarized in Table 1. Figure 7. Tracked meshes are superimposed on the original frame at time t? 1 and the motion compensated frame at time t. On the left: image Lena with an artificial warping, on the right: images Suzie In our experiments, we do not ensure special conditions for node points on border areas. In other words, we consider that if a node point lies on a corner or an edge of an image, it does not mean necessarily that this node has to stay at the same place or to move along the edge in the following frame: objects can appear and disappear in an image (zoom, background movement,...). Since the reconstructed image can not always be predicted near borders, we build our mesh slightly smaller than the full frame dimension. Moreover, we compute the PSNR on a smaller area (10 pixels smaller) than the full mesh (because it can move from the image border and uncover areas).

9 Coast-guard Table-tennis H0R0 = full resolution, no hierarchy a = multi-resolution, no hierarchy b = hierarchical, full image resolution c = mix a+b Table 2. Mesh tracking results for Coast-guard (frames ) and Table-tennis (frames 1-2). Each entry gives the PSNR (db) of a rendered video frame, using different triangle sizes (172 or 43 pixels large). Each row corresponds with various approaches pointed out in Fig.3. A maximum of 3 hierarchical levels and 3 resolution levels are used. H0R0 means no hierarchy and full resolution Mixed hierarchical/multi-resolution approach Different mixed hierarchical/multi-resolution approaches and results are summarized in Fig.3 and Table 2. First of all, we can notice that multi-resolution improves accuracy in mesh-based motion estimation. However, it is all the more pronounced as motion range is large (see Table 2). Similarly, making use of several hierarchy levels increases performances since it allows to refine motion estimation in high activity areas. Let us notice that, for all these methods, the number of triangles can be reduced in case of quite rigid objects without significant quality loss. Consequently, dealing with large triangles means few node points to encode, which is of major interest given our scalability constraints. Whereas most classical methods would use much more triangles to stay into the object, Table 2 reports good results using only 10 triangles for coast-guard, results that are not really improved (0:24dB) by successive hierarchical levels. Furthermore, we notice that obtained results with triangles of side of 172 pixels after 3 successive divisions (thus 43 pixels large)(raws a,b,c and column 172 in table 2) are better than those obtained from initial triangles with side if 43 pixels (raw H0R0, column 43): it is due to a better initialization transmitted to successive hierarchical levels. Various tests have reported that solution (c) provides the best results compared to (a) and (b). As for methods (c) and (d), method (c) provides motion estimations and reconstructions at various levels of hierarchy with full image resolution, thus (c) allows progressive data reconstruction unlike method (d). In the same way, these schemes are slightly improved by making use of backward in forward. Finally, using jointly multi-resolution and hierarchy appears very efficient and accurate even if the corresponding performances mainly depend on the object internal properties: rigidity and motion amplitude. Finally, the motion compensation can be transmitted at various levels of details according to the available adaptive representation thanks to progressive rate distortion splittings and sortings. 7. CONCLUSION All things considered, our motion estimation is obtained from a multi-resolution and hierarchical method while taking the rate-distortion criterion into account. A novel adapted tracking scheme backward in forward has been presented to take advantage of both forward and backward tracking. Results obtained show the relevance of the proposed approach. Such an implementation works for the constraints we are requested. Particularly, progressive/scalable transmission is now feasible, adapted to the available bandwidth of the network. REFERENCES 1. P. van Beek, A. Tekalp, N. Zhuang, I. Celasun, and M. Xia, Hierarchical 2D mesh representation, tracking and compression for object-based video, IEEE Trans. on Circ. and Syst. for Video Tech. 9, p. 353, march special issue. 2. C. L. B. Jordan, T. Ebrahimi, and M. Kunt, Progressive mesh-based coding of arbitrary-shaped video objects, in SPIE VCIP, pp , january P. Lechat, N. Laurent, and H. Sanson, Scalable image coding with fine granularity based on hierarchical mesh, in SPIE VCIP, january G. Marquant, S. Pateux, and C. Labit, Mesh-based scalable image coding with rate-distortion optimization, in Image and Video Communications and Processing 2000, (San Jose, USA), Jan J. Nieweglowski, T. G. Campbell, and P. Haavisto, A novel video coding scheme based on temporal prediction using digital image warping, in IEEE Transactions on Consumer Electronics, vol. 39, pp , August 1993.

10 6. P. Lechat, M. Ropert, and H. Sanson, Hierarchical mesh-based motion estimation using a differential approach and application to video coding, in EUSIPCO 98, vol. 4, pp , sept Y. Wang and O. Lee, Use of two-dimensional deformable mesh structures for video coding, part i-the synthesis problem: mesh-based function approximation and mapping, in IEEE Transactions on circuits and systems for video technology, vol. 6, pp , december N. Zhuang, P. van Beek, I. Celasun, and A. M. Tekalp, Hierarchical 2D content-based mesh tracking for video object, in SPIE VCIP, pp , january D. Tzovaras, S. Vachtsevanos, and M. Strintzis, Optimization of wireframe model adaptation and motion estimation in a rate-distortion framework, International journal of Imaging systems and technology 9, pp , 1998.

Tracking of video objects using a backward projection technique

Tracking of video objects using a backward projection technique Stéphane Pateux IRISA/INRIA, Temics Project Campus Universitaire de Beaulieu 35042 Rennes Cedex, FRANCE ABSTRACT In this paper, we present