3D object articulation and motion estimation in model-based stereoscopic videoconference image sequence analysis and coding

Size: px
Start display at page:

Download "3D object articulation and motion estimation in model-based stereoscopic videoconference image sequence analysis and coding"

Transcription

1 Signal Processing: Image Communication 14 (1999) 817}840 3D object articulation and motion estimation in model-based stereoscopic videoconference image sequence analysis and coding Dimitrios Tzovaras*, Ioannis Kompatsiaris, Michael G. Strintzis Information Processing Laboratory, Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Thessaloniki 54006, Greece Received 29 November 1996 Abstract This paper describes a procedure for model-based analysis and coding of both left and right channels of a stereoscopic image sequence. The proposed scheme starts with a hierarchical dynamic programming technique for matching across the epipolar line for e$cient disparity/depth estimation. Foreground/background segmentation is initially based on depth estimation and is improved using motion and luminance information. The model is initialised by the adaptation of a wireframe model to the consistent depth information. Robust classi"cation techniques are then used to obtain an articulated description of the foreground of the scene (head, neck, shoulders). The object articulation procedure is based on a novel scheme for the segmentation of the rigid 3D motion "elds of the triangle patches of the 3D model object. Spatial neighbourhood constraints are used to improve the reliability of the original triangle motion estimation. The motion estimation and motion "eld segmentation procedures are repeated iteratively until a satisfactory object articulation emerges. The rigid 3D motion is then re-computed for each sub-object and "nally, a novel technique is used to estimate #exible motion of the nodes of the wireframe from the rigid 3D motion vectors computed for the wireframe triangles containing each speci"c node. The performance of the resulting analysis and compression method is evaluated experimentally Elsevier Science B.V. All rights reserved. Keywords: Stereoscopic image sequence analysis; Model-based coding; Object articulation; Non-rigid 3D motion estimation 1. Introduction The transmission of full-motion video through limited capacity channels is critically dependent on the ability of the compression schemes to reach target bit-rates while still maintaining acceptable visual quality * Corresponding author. Tel.: # ; fax: # ; tzovaras@dion.ee.auth.gr This work was supported by the EU CEC Project ACTS PANORAMA (Package for New Autostereoscopic Multiview Systems and Applications, ACTS project 092) /99/$ - see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S ( 9 8 )

2 818 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 [15]. In order to achieve this, motion estimation and motion compensated prediction are frequently used, so as to reduce temporal redundancy in image sequences [22]. Similarly in the coding of stereo and multiview images, prediction may be based on disparity compensation [33] or the best of motion and disparity compensation [34]. Stereoscopic video processing has recently been the focus of considerable attention in the literature [3,6,10,13,16,24,26,28,32]. A stereoscopic pair of image sequences, recorded with a di!erence in the view angle, allows the three-dimensional (3D) perception of the scene by the human observer, by exposing to each eye the respective image sequence. This creates an enhanced 3D feeling and increased `telepresencea in teleconferencing and several other (medical, entertainment, etc.) applications. In both monoscopic and stereoscopic vision, the ability of model-based techniques to describe a scene in a structural way has opened new areas of applications. Video production, realistic computer graphics, multimedia interfaces and medical visualisation are some of the applications that may bene"t by exploiting the potential of model-based schemes. Object-based techniques have been extensively investigated for monoscopic image sequence coding [4,11,12,21]. Several object-oriented coding schemes have also been proposed for stereoscopic image sequence coding [7,24,25,29,31,32]. The advantages of using model-based techniques for stereo image sequence coding were reviewed in [25], where a feature-based 3D motion estimation scheme was presented. In [24], disparity is estimated using a dynamic programming scheme and is subsequently used for object segmentation. The segmentation algorithm is based on region growing and the criterion used for the de"nition of each object is based on the homogeneity of the respective disparity "elds. In [10], the objects in the scene are identi"ed using a segmentation method based on the homogeneity of the 2D motion "eld computed by a block matching procedure. Then the 3D motion of each object is modeled using the approach presented in [1] with depth estimated from disparity. Finally, an interframe coding scheme based on 3D motion compensation is evaluated. A disadvantage of the segmentation technique used in this procedure, is its failure to guarantee high performance of the resulting 3D motion compensation method. Alternatively, 3D models of objects may be derived from stereo images. This usually requires estimation of dense disparity "elds, postprocessing to remove erroneous estimates and "tting of a parametrised surface model to the calculated depth map [14]. In [17] an algorithm was presented which optimally models the scenes using a hierarchically structured wire-frame model derived directly from intensity images. The wire-frame model consists of adjacent triangles that may be split into smaller ones over areas that need to be represented in higher detail. The motion of the model surface using both rigid and non-rigid body assumptions is estimated concurrently with depth parameters. Knowledge-based image sequence coding has also attracted much interest recently, especially for the coding of facial image sequences in videophone applications. In [2], one such method is based on the generation of a generic face model and the use of e$cient techniques for rigid and #exible 3D motion estimation. In the present paper, a procedure for model-based analysis and coding of both left and right channels of a stereoscopic image sequence is proposed. The methodology used, overcomes a major obstacle in stereoscopic video coding, caused by the di$cult problem of determining and handling coherently corresponding objects in the left and right images. This is achieved in this paper by de"ning segmentation and object articulation in the 3D space, thus ensuring that all ensuing operations remain coherent for both the left and the right aspects of the scene. Each object is described by a mesh consisting of a set of interconnected triangles. The 3D motion of each triangle is estimated using a robust algorithm for the minimisation of the least median of squares error and by imposing neighbourhood constraints, such as introduced in [18,19], to guarantee the smoothness of the resulting vector "eld. A novel iterative object articulation technique for stereoscopic image sequences is then used to segment the 3D vector "eld and thus to derive a foreground object articulation. Triangle motion estimation and classi"cation are repeated iteratively until satisfactory object articulation is achieved. Rigid 3D motion estimation is performed next for each resulting sub-object, using motion information from both left and right cameras. Finally, a procedure is proposed for the

3 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} estimation of the non-rigid motion of each wireframe node based on the 3D motion of the neighbouring wireframe triangles. The paper is organised as follows. In Section 2 the camera geometry of the stereoscopic system is described. Next, in Section 3 an overview of the proposed stereoscopic image sequence analysis system is presented. Section 4 presents the techniques used for disparity/depth estimation, foreground/background segmentation and model initialisation. The technique used for object articulation is examined in Section 6 while the 3D motion estimation procedure used is presented in Section 7. The rigid 3D motion estimation procedure for each articulated 3D object is discussed in Section 7.1. Finally, in Section 7.2, an approach is considered for non-rigid motion estimation based on the rigid 3D motion vectors of small surface patches, computed during the object articulation procedure. Experimental results given in Section 8 demonstrate the performance of the proposed methods. Conclusions are drawn in Section Camera geometry The geometry of the stereoscopic camera arrangement used is shown in Fig. 1, where three reference coordinate frames are de"ned: World reference frame, attached to the imaged scene. Camera reference frame, attached to the camera system. Notice that the Z-axis is the optical axis, while the X and > axes are parallel to the image plane. Here c refers to the respective camera, i.e. c"l, r for the left, respectively, right cameras. Image reference frame, where the X and > axes, respectively, de"ne the horizontal and vertical directions on the digital image, where again c"l, r refer to the images produced, respectively, by the left, right cameras. Fig. 1. Stereoscopic camera geometry.

4 820 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 The camera geometry is described by the following set of equations mapping the 3D world-coordinates (x, y, z ) of a generic point P into the 2D coordinates (X,> ) of its projection on the image planes: Change of reference frame from world-coordinates to camera-coordinates: P "x y z "R x y, (1) z #T where c"l, c"r, for the left and right cameras, respectively, and R and T are, respectively, the rotation matrix and the translation vector. Perspective projection of a scene point to the image plane (the centre of projection is the centre of the lens and the projection plane is the camera CCD sensor): P " X > "f z x y, (2) Change of coordinate frame from camera-coordinates (X,> ) to image coordinates (X,> ). This operation simply consists of a 2D translation and scale change X "C # X d, > "C # > d, (3) where d and d are the horizontal and vertical size of an image pixel, respectively, and (C, C ) are the image coordinates of the optical centre OC in camera c. As seen from the above description, the camera geometry is completely speci"ed by a small set of parameters estimated during camera calibration. 3. Overview of the stereoscopic image sequence analysis and coding scheme In the proposed model-based stereoscopic image sequence analysis and coding scheme (see Fig. 2), both left and right channels are coded using 3D rigid and non-rigid motion compensation. The approach taken is to de"ne fully 3D models of objects composed of interconnecting wire-mesh triangles. In this way, complete left-to-right object correspondence is intrinsically established. Fig. 2. The proposed stereoscopic image sequence coding scheme.

5 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} The left-to-right (LR) and right-to-left (RL) disparity "elds are estimated "rst, using a hierarchical dynamic programming disparity estimation procedure. The consistency of the computed disparity "elds is then checked, for the &points of interest' which are provided by the model initialisation procedure. A reliable disparity estimate is obtained for those points of interest with inconsistent left}right disparities. An initial foreground/background segmentation procedure follows, leading to a 3D wireframe model adapted to the foreground object using the reliable disparity estimates. In order to improve rigid 3D motion estimation the foreground object is articulated producing sub-objects de"ned by the homogeneity of their 3D motion. This rigid 3D motion is estimated using least median of squares minimisation of a cost function taking into account the reliability of the projected rigid 3D motion in both the left and right channels. Neighbourhood constraints are also imposed to improve the reliability of the motion estimation procedure. Following articulation, the rigid 3D motion of each sub-object produced is estimated using the same motion estimation procedure, without this time imposing neighbourhood constraints. Finally, non-rigid motion of each node of the wireframe is estimated from the rigid 3D motion of the wireframe triangles containing this node as a vertex. A block diagram of the proposed encoder is shown in Fig. 3. Its constituent components are described in detail in the ensuing sections. 4. Depth estimation-scene segmentation and model initialisation 4.1. Disparity/depth estimation Since the stereo camera con"guration is known, the depth estimation problem reduces to that of disparity estimation [3,28,30]. A dynamic programming algorithm, minimising a combined cost function for two corresponding lines of the stereoscopic image pair is used for disparity estimation. The basic algorithm adapts the results of [5,24] using blocks rather than pixels. Furthermore, a novel hierarchical version of this algorithm is implemented so as to speed up its execution. The cost function takes into consideration the displaced frame di!erence (DFD) as well as the smoothness of the resulting vector "eld in the following way. Due to the epipolar line constraint [3] the search area for each pixel p "(i, j ) of the right image is an interval S p on the epipolar line in the left image determined by a minimum and maximum allowed disparity. If p "(i, j )3S p is the pixel in the left image matching with pixel p of the right image and if d p is the disparity vector corresponding to this match, the following cumulative cost function is minimised with respect to d p for the path ending at the pixel p in each line i of the right image: C(i )"min C(i!1)#c(p, d d p ). (4) p The cost function c(p, d p ) is determined by c(p, d p )"R(p )DFD(p, d p )#SMF(p, d p ). (5) The "rst term in Eq. (5) contains the absolute di!erence of two corresponding image intensity blocks, centered at the working pixels (k, i) and (l, j) in the right and left images, respectively, DFD(p, d p )" I (i #X, j #>)!I (i #X, j #>), (6) W where W is a rectangular window. Multiplication with the reliability function R(d) relaxes the DFD weight, keeping only the second term active in homogeneous regions where the matching reliability is small. The

6 822 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 3. A block diagram of the proposed encoder. disparity vector is considered reliable whenever it corresponds to a pixel on an edge or in a highly textured area. For the detection of edges and textured areas a variant of the technique in [8] was used, based on the observation that highly textured areas exhibit high local intensity variance in all directions while on edges the intensity variance is higher across the direction of the edge. The second term in Eq. (5) is the smoothing function, SMF(d p )" d p!d R(d ), (7)

7 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} where d, n"1,2, N, are vectors neighbouring d. Multiplication by the factor R(d ) aims to attenuate the contribution of unreliable vectors to the smoothing function. Finally, the dynamic programming algorithm selects as the best path up to that stage, the one with the minimum cumulative cost (Eq. (4)). A hierarchical version of this approach was utilised in order to speed up the estimation process and to produce a smooth disparity "eld without discontinuities. In this version, the dynamic programming algorithm is applied at the coarse resolution level and an initial estimate for the disparity vectors is produced. The disparity information is then propagated to the next resolution level where it is corrected so that the cost function is further minimised. This process is iterated until full resolution is achieved. Along with the dense disparity "eld, the variance of the disparity estimate for each pixel of the image is also computed, using σ (p, d p )" 1 (I (i #k, j #l)!i (i #k, j #l)), (8) N where (2N#1)(2N#1) is the dimension of the rectangular window W. Finally, depth is estimated from disparity, using the camera geometry as in [32] Foreground/background segmentation The model is initialised by separating the body in the videoconference scene from the background using an initial foreground/background segmentation procedure. The depth map produced by the method in Section 4.1 applied to the full resolution image may be used for this purpose. However, to reduce as much as possible the e!ects of errors in depth estimation, we propose instead the use of a hierarchical foreground/background segmentation, focused on the determination of only the largest disparity vectors. These vectors correspond to foreground objects (objects that lie very close to the camera). This information is propagated to the higher resolution level where it is corrected in a coarse to "ne manner. Thus, by carefully selecting the search area of the disparity estimator at each resolution an initial foreground/background segmentation mask is formed. The resulting segmentation map is then post-processed using a motion detection mask and the luminance edge information. The motion detection mask is de"ned by simple subtraction of consequent frames of the same channel of the image sequence. Note that in this phase, the aim is not to calculate motion accurately, but rather to identify regions with very high or very low motion. The motion detection mask contains important information for both inner and boundary areas of the foreground object while luminance edge information carries important information about errors that occur mainly on the silhouette (border) of the foreground object. The foreground object boundary is found as the part of the image where both the depth gradient and the luminance gradient are high. Summarising, the following algorithm is used for foreground/background separation, as shown in Fig. 4: The disparity information in level l of the algorithm is segmented using a histogram based segmentation algorithm and areas corresponding to large disparity values, are identi"ed as objects close to the camera. The segmentation information is propagated to the "ner resolution level where it is corrected appropriately. In the full resolution level, the resulting segmentation mask is post-processed using motion and luminance information as follows: each portion of the scene designated as background by the disparity segmentation procedure is reexamined in view of its motion u and depth and luminance gradients g (Fig. 4). If all these parameters exceed preselected thresholds, this portion of the scene is con"rmed as being part of the foreground. Otherwise, it is relegated to the background.

8 824 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 4. The proposed foreground/background estimation scheme. A 3D wireframe is adapted to the foreground object produced by the described procedure. Then, using the reliable depth estimates as described in the following sections, the "nal 3D model is created Consistency checking and disparity evaluation for the points of interest A set F of points of interest is "rst de"ned, composed of points in the 3D space with left or right image projections located on depth and luminance edges. The latter are extracted using the edge detection algorithm presented in Section 4.1. For each of these points, the disparity estimation algorithm produces left-to-right (LR) and right-to-left (RL) disparity "elds. However, the LR and RL disparity "elds may be

9 inconsistent because of occlusions and errors in the disparity estimation procedure. Thus, a consistency checking algorithm is used to indicate the correct matches followed by an averaging procedure (Kalman estimate) which assigns a depth value to pixels with inconsistent matches. More speci"cally, the correspondence between pixels p "(i, j ) of the right image and p "(i, j ) of the left image is considered consistent if d(p )"!d(p #d(p )). D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} If the above relation is not valid, a more reliable depth estimate must be assigned to that pixel. The method in [9] is applied to this e!ect, using the reliability of the disparity estimates as a weighting function. Speci"cally, the disparity d(p ) and the disparity d(p ) satisfying p "p #d(p ), (9) are averaged with respect to their disparity error variances as follows: dk " dσ!dσ σ #σ, σ K " σ σ σ #σ, where σ and σ are, respectively, the variances of the disparity estimates d and d, computed at the disparity/depth estimation stage using Eq. (8), and σ K is the variance of the averaged disparity. If more than one disparity vectors d(p ) satisfy Eq. (9), the one with the minimum estimation variance σ is selected. The consistency checking algorithm is applied to the set of all points of interest, selected as above so as to have projections on depth and luminance edges, and reliable depth estimates for pixels with either consistent or corrected disparity are obtained to be used for model initialisation. The result of this procedure is a set F of points of interest (xl, yl, zl ) whose projections are located on the foreground depth map and luminance edges of either the left or right camera and zl is their estimated depth. 5. Initial 3D model adaptation For the generation of the 3D model object, depth information must be modeled using a wire mesh. We shall generate a surface model of the form [17] z"s(x, y, PK ), (10) where PK is a set of 3D &control' points or &nodes' PK "(x, y, z ), i"1,2,n that determine the shape of the surface and (x, y, z) are the coordinates of a 3D point. An initial choice for PK is the regular tessellation shown in Fig. 5(a). The consistency checking algorithm, described in the previous section, is applied to all control points to assign corrected depth values to every node of the 3D model. Automatic adaptation of the 3D model (Fig. 5(c) and (d)) to the foreground object is sought by forcing the 3D model to meet the boundary of the foreground/background segmentation mask (Fig. 5(b)). A set of reference image points G"(xJ, yj, zj ), i"1, 2, Q is de"ned as the aggregate of F and PK : G"FPK, (11) where F is the set of points of interest de"ned in the preceding section and PK are the nodes of the 3D model with the corrected depth values. Then, S can be modelled by a piecewise linear surface, consisting of adjoint triangular patches, which may be written in the form z"z g (x, y)#z g (x, y)#z g (x, y), (12)

10 826 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 5. (a) Initial triangulation of the image plane. (b) Foreground/background segmentation. (c) Part of the initial triangulation corresponding to the foreground object. (d) Expanded wireframe adapted to the foreground object. (e) Barycentric coordinates.

11 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} if (x, y, z) is a point on the triangular patch P PY P "(x, y, z ), (x, y, z ), (x, y, z ). The functions g (x, y) are the barycentric coordinates of (x, y, z) relative to the triangle and they are given by g (x, y)"area(a )/Area(P PK P ) (Fig. 5(e)). The reconstruction of a surface from consistent sparse depth measurements may be e!ected by minimising a functional of the form E (PK )" (S(xJ, yj, PK )!zj ). (13) The value of sum (13) expresses con"dence to the reference points (xj, yj, zj )3G, i"1, 2, Q. Note that no smoothness constraint is imposed to the surface of the 3D model, since the depth estimates for these points are considered very reliable. Replacing Eq. (12) in Eq. (13) yields E (PK )"APK!B, (14) where A is a QN matrix and B a Q1 vector given by A " g (xj, yj ) if (xj, yj ) inside triangle (x, y ), (x, y ), (x, y ), i"1, 2, Q, 0 otherwise, j"1,2, N, B "zj, i"1, 2, Q, where (i, j) are two triangle indices. The vector PK minimising Eq. (14) is PK "(AA)AB, (15) which de"nes the nodes of the wire-mesh surface. Using Eq. (12), the depth z of any point on a patch can be expressed in terms of the depth information of the nodes of the wireframe and the X and > coordinates of that point. Hence, full depth information will be available if only the depths of the nodes of the wireframe are transmitted. 6. Object articulation A novel subdivision method based on the rigid 3D motion parameters of each triangle and the error variance of the rigid 3D motion estimation is proposed for the articulation of the foreground object (separation of the head and shoulders). The model initialisation procedure described above, results in a set of interconnected triangles in the 3D space: ¹, k"1,2, K where K is the number of triangles of the 3D model. In the following, S will denote an articulation of the 3D model at iteration i of the articulation algorithm, consisting of s, k"1, 2, M sub-objects. The proposed iterative object articulation procedure is composed of the following steps: Step 1. Set i"0. Let an initial segmentation S"s, k"1, 2, K, with s "¹. Let also the initial neighbourhood for each triangle to be empty, i.e. ¹S ". Step 2. Apply the 3D rigid motion estimation algorithm to each triangle ¹, taking into account the neighbourhood constraint imposed by the neighbourhood ¹S. This constraint is described in detail in the Section 6.1 that follows. Step 3. Set i"i#1. Execute the object segmentation procedure that subdivides the initial object into M sub-objects, i.e. S"s, k"1, 2, M. Step 4. Use the segmentation map S to de"ne the new neighbourhood ¹S Step 5. If S"S then stop. Else go to step 2. of each triangle ¹. The proposed algorithm can be also explained by the example of Fig. 6(a)}(c). Fig. 6(a) illustrates the initial phase of the algorithm where each triangle is treated as an object. The estimated rigid 3D motion

12 828 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 6. (a) Initial phase of the object articulation algorithm. (b) The output of the rigid 3D motion estimation procedure for each triangle. (c) The output of the object segmentation procedure. (d) Non-rigid 3D motion estimation example. The light grey vector represents the rigid motion of the working node while the black vectors represent estimates for the motion of the same node using the 3D motion parameters corresponding to each triangle containing the working node.

13 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} vectors of each triangle, computed at the second step of the proposed algorithm are shown in Fig. 6(b) and the output of the object segmentation procedure of step 3 is shown in Fig. 6(c). Based on this new object segmentation map the rigid 3D motion estimation and object segmentation procedures are then further re"ned iteratively. The 3D motion estimation of each triangle and the object segmentation procedure are described in more detail below Rigid 3D motion estimation of small surface patches The foreground object of a typical videophone scene is composed of more than one sub-objects (head, neck, shoulders, etc.) each of which exhibits di!erent rigid 3D motion. Thus, object articulation has to be completed and the rigid motion of each sub-object must be estimated. For rigid 3D motion estimation of each triangle ¹ we use least median of squares minimisation. This procedure removes the outliers from the initial data set and "nds the estimate that minimises the median of the square error. More speci"cally, the rigid motion of each triangle ¹, k"1,2, K, where K is the number of triangles in the foreground object, is modeled using a linear 3D model, with three rotation and three translation parameters [1]: x(t#1) 1!w w y(t#1) 1!w y(t) t z(t#1)"!w w 1 x(t) z(t)#t, (16) t where (x(t), y(t), z(t)) is a point on the plane de"ned by the coordinates of the vertices of triangle ¹. Since the triangle motion is to be used for object articulation, neighbourhood constraints are needed for the estimation of the model parameter vector a"(w, w, w, t, t, t ), in order to guarantee a smooth estimated triangle motion vector "eld, that can be successfully segmented. Let N be the ensemble of the triangles neighbouring the triangle ¹. If triangle ¹ belongs to the region s of S at iteration i of the object articulation algorithm, we de"ne as neighbourhood ¹S of each triangle ¹ the set of triangles ¹ in ¹S "¹ 3N ¹ 3s. For example, in order to de"ne the neighbourhood of triangle A in Fig. 6(c) we "rst consider all triangles that share at least one common vertex with triangle A (i.e. N "B, C, D, E, H, I, J, K). From the set N only the triangles belonging to the same object with triangle A, are "nally de"ned as neighbourhood of triangle A (i.e. ¹S "B, C, D, E). Then for each triangle ¹ the set of points belonging to ¹S are input to the 3D rigid motion estimation procedure so as to smooth the motion "eld produced The 3D motion estimation algorithm For the estimation of the model parameter vector a"(w, w, w, t, t, t ) for each neighbourhood ¹S at iteration i of the object articulation procedure, the MLMS iterative algorithm [27] was used. The MLMS algorithm is based on median "ltering and is very e$cient in suppressing noise with a large amount of outliers (i.e. in situations where conventional least-squares techniques usually fail). As noted in the previous sections, the 3D motion of each extended neighbourhood ¹S of a triangle ¹ is modelled in the global coordinate system by P(t#1)"R P(t)#T, (17)

14 830 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 where the matrix R and the vector T are de"ned from Eq. (16). Since initial motion estimates are available in the left and right camera images, the rigid 3D motion must be projected on the left and right coordinate systems. Using Eqs. (17) and (1), P (t#1)"r P (t)#t, (18) where and R "R R R T "!R R R T #R T #T, (20) where R and T are the 3D motion rotation and translation matrices corresponding to camera c and triangle k. Using the fact that the matrices R and T are of the form " 1!w w "t R w 1!w, T, (21)!w w 1 t and also using Eqs. (2) and (3), the projected 2D motion vector in camera c, d (X,>) is given by d (X(t),>(t))"f!w x (t)y (t)#w (x (t)#z (t))!w y (t)z (t)#t z (t)!t x (t) (!w x (t)#w y (t)#z (t)#t )z (t)d, (22) d (X(t),>(t))"f w (y (t)#z (t))!w x (t)y (t)!w x (t)z (t)!t z (t)#t y (t) (!w x (t)#w y (t)#z (t)#t )z (t)d, (23) where d (X,>)"(d (X(t),>(t)), d (X(t), >(t))). Using the initially estimated 2D motion vectors corresponding to the left and right cameras and Eqs. (22) and (23) along with Eqs. (19) and (20) evaluated for c"l and c"r, a linear system for the global motion parameter vector a for triangle ¹ is formed. Note that the parameters of a are implicitly contained in Eqs. (22) and (23), since a "(w, w, w, t, t, t ) and a are related by Eqs. (19) and (20). This is a system of equations with six unknowns, where are the number of reference points in the set 2( # ), G of (Eq. (11)), contained in the plane de"ned by the coordinates of the vertices of triangle k in the left and right image planes, respectively. If # *2 this is overdetermined and can be solved using least-squares methods or alternately, by the robust least median of squares motion estimation algorithm described in detail in [27]. The reference points initially chosen should be enough to guarantee # *2 for each triangle. As explained in Section 5, this is ensured by choosing in Eq. (11) as reference points all triangle vertices plus the points of interest on depth and luminance edges. (19) 6.3. Object segmentation At each iteration of the object articulation method, the rigidity constraint imposed on each rigid object component is exploited. This constraint requires that the distance between any pair of points of a rigid object component must remain constant at all times and con"gurations. Thus, the motion of a rigid model object component represented by a mesh of triangles can be completely described by using the same 6 motion parameters. Therefore, to achieve object articulation, neighbouring triangles which exhibit similar 3D motion parameters are clustered into patches. In an ideal case, these patches will represent the complete visible surface of the moving object components of the articulated object.

15 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} More speci"cally, the following iterative algorithm is proposed: Step 1. Set j"0. Set M"K. Set S "S. Step 2. For each patch s for k"1, 2, M execute the following clustering algorithm. Step 3. Set s "s s. For all the patches s that belong to the neighbourhood of s,if σ a #σa!a σ #σ )th, cluster s to s and set s "s and M "M!1. In the above a, m"k, l are the motion parameters, σ, m"k, l is the variance of the 3D motion estimate, i.e. the Displaced Frame Di!erence (DFD) of patch s computed by compensating the projected 3D motion in the left and right cameras and th is a threshold. Also, σ " 1 (I(P(t))!I(P(t#1)))# 1 (I N P N P (P(t))!I (P(t#1))), where N is the number of points contained in patch s and P(t) and P(t#1) are two corresponding points in time instants t and t#1, respectively. Step 4. Set j"j#1 and M "M. Set S "s, k"1, 2,M.IfS "S stop. Else go to step D motion estimation of each sub-object 7.1. Rigid 3D motion estimation of each sub-object The object articulation procedure identi"es a number of sub-objects of the 3D model object, as areas with homogeneous motion. A sub-object s represents a surface patch of the 3D model object consisting of N control points and q triangles. A sub-object may consist of q"1 triangle only. The motion of an arbitrary point P(t) on the sub-object s to its new position P(t#1) is described by P(t#1)"RP(t)#T, (24) where k"1,2, M, and M is the number of sub-objects, where as before [1]: 1!w w w 1!w T"t R", t.!w w 1 t For the estimation of the model parameter vector a"(w, w, w, t, t, t ) the MLMS iterative algorithm described earlier is used, this time without imposing neighbourhood constraints Non-rigid 3D motion estimation The rigid motion of the articulated objects cannot compensate errors occurring due to local motion (such as due to movement of eyes and lips). These errors can only be compensated by deforming appropriately the nodes of the wireframe, in order to also follow the local motion. An analysis-by-synthesis approach is proposed for the computation of the non-rigid motion *J at node i, which minimises the DFD between the image frame at time t#1 and the 3D non-rigid motion compensated estimate of frame t#1 from frame t.

16 832 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 More speci"cally, the 3D motion r, i"1,2, N, of the wire-mesh nodes is governed by (24). Alternative estimates of the motion of the same node are provided by applying to (16) the 3D motion parameters originally estimated for a triangle containing node i (see Fig. 6(d)). Since the motion of each triangle re#ects both global rigid motion and local deformations, the di!erence of these two estimates of the motion of each node may be assumed to approximate the non-rigid motion component of the node. If r is the rigid 3D motion of node i and *, k"1,2, N, are the estimates for the motion of node i produced by rotating and translating this node with the rotation and translation parameters corresponding to each triangle ¹ containing node i, wede"ne as candidates for the minimisation of the DFD of the reconstruction error *J "*!r, k"1,2, N, (25) where N is the number of neighbourhood triangles of node i. The "nal non-rigid motion vector *J is chosen to be where *J "arg min 2 (DFD (*I )#DFD (*J )), DFD (*J )" 1 (I N (P(t))!I (P(t)#r #*J )). In the above equation, P(t) are the 3D coordinates of node i at time instance t and P(t)#r #*J are the corresponding corrected coordinates at time instance t#1 corresponding to the *J non-rigid motion vector. The intensities I at time instances t and t#1 are calculated for cameras c"l,roveraregionr de"ned as the aggregate of the planes of all triangles containing node i,andn is the number of points contained in region R. 8. Experimental results The proposed model-based analysis and coding method was evaluated for the right and left channels of a stereoscopic image sequence. The "rst frame of both channels is transmitted using intra frame coding techniques, as in H263 [20]. The performance of the proposed methods was investigated in application to the compression of the interlaced stereoscopic videoconference sequence &Claude' of size The hierarchical dynamic programming procedure for matching across the epipolar line, described in Section 4.1, with 2 levels of hierarchy was used for LR and RL disparity/depth estimation. The search area for disparity was chosen to be $62 and $2 half pixels for the x and y coordinate, respectively. Fig. 7(b) and (d) show the computed left and right channel depth maps using the hierarchical dynamic programming approach. The depth map has the same resolution as the original image (since it is computed by a dense disparity "eld). Depth information is quantised to 256 levels. Darker areas represent objects closer to the cameras. The smoothing properties of the dynamic programming method are seen to result in more realistic depth-map estimates. Foreground/background separation is performed next, using the coarse to "ne technique described in Section 4.2. The motion detection mask along with the luminance edge information are then used to improve the results of the initial segmentation. The resulting foreground/background mask of &Claude' is shown in Fig. 5(b). This sequence were prepared by the THOMPSON BROADCASTING SYSTEMS for use in the DISTIMA RACE project.

17 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} Fig. 7. (a) Original left channel image &Claude' (frame 2). (b) Corresponding depth map estimated using dynamic programming. (c) Original right channel image &Claude' (frame 2). (d) Corresponding depth map estimated using dynamic programming. The LR and RL disparity estimates are then subjected to the consistency checking procedure and inconsistent matches are corrected by reliably fusing LR and RL information as described in Section 4.3. On the basis of the consistent and the corrected depth information at all reference points, the wireframe model is adapted to the foreground object (Fig. 5(c) and (d)). The rigid 3D motion of each triangle of the foreground object is computed next, using the technique described in Section 6.1. The output of the proposed local 3D motion estimator was a set of 3D motion parameters assigned to each triangle of the wireframe 3D model. In order to show the resulting local 3D motion we have produced a visualization of the rotation and translation parameters of the homogeneous motion matrix. For the rotation parameters, the direction of the vector assigned to each triangle shows the rotation axis and the size of the vector as well as the color of the triangle show the magnitude of the angle of

18 834 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 8. (a) Visualization of the rotation parameters of the rigid 3D motion for each triangle of the 3D model of &Claude'. (b) Visualization of the translation parameters of the rigid 3D motion for each triangle of the 3D model of &Claude'. the rotation. For the translation parameters, the direction of the vector at each triangle shows the direction of the local 3D motion, while the size of the vector as well as the color of the triangle show the magnitude of translation. The visualisation of the rotation and the translation parameters of the rigid 3D motion of each wireframe triangle for &Claude' are shown in Fig. 8(a) and (b), respectively. As is in this way demonstrated, the head and the shoulders undergo di!erent motion. The resulting articulation of the foreground object achieved by the methods of Section 6 is shown in Fig. 9(a). The accuracy of this object articulation is remarkable and is due to the fact that the foreground/background segmentation and the object articulation procedures are de"ned on the 3D space in terms of triangle rather than pixel motion. In this way, complete correspondence is achieved between objects in the right and left channel image. Following object articulation, the algorithm presented in Section 7.1 is used for rigid 3D motion estimation. The computed motion parameter vectors, between frames 1 and 2, for the head and shoulders sub-objects are shown in Fig. 9(b). As seen, the 3D motion of the &shoulders' sub-object is negligible while the 3D motion of the &head' has signi"cant rotation and translation parameters (this can also be observed by examining the original frames 1 and 2 of &Claude'). The performance of the algorithm in terms of PSNR is evaluated in Tables 1 and 2 where the quality of the reconstruction of the whole image as well as only the &head' or &shoulders' sub-objects is presented. Fig. 10(a) and (c) show the reconstructed left and right images, respectively, using rigid 3D motion compensation while Fig. 10(b) and (d) show the corresponding prediction errors. The performance of the algorithm in terms of PSNR is shown in Tables 1 and 2 where the quality of the reconstruction of the whole image as well as only the &head' or &shoulders' sub-objects is presented. As seen, rigid 3D motion is not su$cient for very accurate reconstruction of the &head' sub-object, and thus non-rigid 3D motion must be used to improve the performance of the algorithm. The analysis-by-synthesis technique presented in Section 7.2 is then used for non-rigid 3D motion estimation. The quality of the reconstruction in terms of PSNR is described in Tables 1 and 2 where an improvement of about 1 db is seen to be achieved by non-rigid 3D motion compensation. Fig. 11(a) and (c) show details of the reconstruction error using 3D rigid motion compensation of the left and right images, respectively, while Fig. 11(b) and (d) show the corresponding prediction errors using 3D non-rigid motion compensation.

19 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} Fig. 9. (a) Articulation of the foreground object. (b) The 3D motion parameter vectors corresponding to the head and shoulder sub-objects. Table 1 PSNR (db) of the reconstruction of the left channel frame 2 of &Claude' using rigid and non-rigid 3D motion compensation Object Rigid Non-rigid Whole image Head Shoulders Table 2 PSNR (db) of the reconstruction of the right channel frame 2 of &Claude' using rigid and non-rigid 3D motion compensation Object Rigid Non-rigid Whole image Head Shoulders The proposed algorithm was also tested for the coding of a sequence of frames at 10 frames/s. The model adaptation, depth estimation and object articulation procedures were applied only at the beginning of each group of frames. Each group of frames consists of 10 frames. The "rst frame of each group of frames was transmitted using intra frame coding techniques. In the intermediate frames the model and articulation

20 836 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 10. (a) Reconstructed left channel of &Claude' using rigid 3D motion compensation. (b) The corresponding prediction error. (c) Reconstructed right channel of &Claude' using rigid 3D motion compensation. (d) The corresponding prediction error. formation are self-adapted using the rigid and non-rigid 3D motion information. The only parameters that need to be transmitted are the 6 parameters of the rigid 3D motion and the 3D non-rigid motion vector for each node of the wireframe. The methodology developed in this paper allows both left and right images to be reconstructed using the same 3D rigid motion vectors, thus achieving considerable bit-rate savings. The coding algorithm requires a bit-rate of 24.4 kbps and produces better image quality compared to a correspondingly simple block matching motion estimation algorithm [23], as shown in Figs. 12 and 13. The simple block matching approach is identical to that used in H263 and consists of absolute displaced frame di!erence minimization, by searching exhaustively within a search area of!15,2,15 half-pixels in the previous in time frame, centered at the position of the examined block. In both coders, only the "rst frame of

21 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} Fig. 11. (a) Detail of the reconstruction error of left channel of &Claude' using rigid 3D motion compensation. (b) The corresponding prediction error using non-rigid 3D motion compensation. (c) Detail of the reconstruction error of right channel of &Claude' using rigid 3D motion compensation. (d) The corresponding prediction error using non-rigid 3D motion compensation. Fig. 12. PSNR of each frame of the left channel of the proposed algorithm, compared with the block matching scheme with a block size of 1616 pixels.

22 838 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817}840 Fig. 13. PSNR of each frame of the right channel of the proposed algorithm, compared with the block matching scheme with a block size of 1616 pixels. each group of frames was transmitted using intra frame coding. It was also assumed that each frame was predicted using the reconstructed previous frame, and that the prediction error was not transmitted. The bit-rate required by this scheme with a 1616 block size was 24.5 kbps. 9. Conclusions In this paper we addressed the problem of rigid and non-rigid 3D motion estimation for model-based stereo videoconference image sequence analysis and coding. On the basis of foreground/background segmentation using motion, depth and luminance information, the model was initialised by automatically adapting a wireframe model to the consistent depth information. Object articulation was then performed based on the rigid 3D motion of small surface patches. Spatial constraints were imposed to increase the reliability of the obtained 3D motion estimates for each triangle patch. A novel iterative classi"cation technique was then used to obtain an articulated description of the scene (head, neck, shoulders). Finally, #exible motion of the nodes of the wireframe was estimated using a novel technique based on the rigid 3D motions of the triangles containing the speci"c node. The results of the algorithm can be used in a series of applications. For <ideo Production and Computer Graphics applications, the 3D motion of a speci"c scene could be used to produce a scene with similar motion but di!erent texture, as when producing a video with a model mimicking the motion of an actor. The method can have also useful applications in Image Analysis since an analytic representation of the motion of the object is given (either in triangle or in wireframe node level) that can be used for the segmentation or articulation of the object into uniform moving rigidly components. Finally, the method was experimentally shown to be e$cient for very low bit-rate coding of stereoscopic image sequences.

23 D. Tzovaras et al. / Signal Processing: Image Communication 14 (1999) 817} References [1] G. Adiv, Determining three-dimensional motion and structure from optical #ow generated by several moving objects, IEEE Trans. on Pattern Analysis and Machine Intelligence 7 (July 1985) 384}401. [2] K. Aizawa, H. Harashima, T. Saito, Model-based analysis synthesis image coding (MBASIC) system for a person's face, Signal Processing: Image Communication 1 (October 1989) 139}152. [3] S. Barnard, W. Tompson, Disparity analysis of images, IEEE Trans. Pattern Anal. Mach. Intell. 2 (July 1980) 333}340. [4] G. Bozdagi, A.M. Tekalp, L. Onural, 3-D Motion estimation wireframe adaptation including photometric e!ects for model-based coding of facial image sequences, IEEE Trans. Circuits Systems Video Technol. (June 1994) 246}256. [5] I.J. Cox, S. Hingorani, B.M. Maggs, S.B. Rao, Stereo without regularization, tech. rep., NEC Research Institute, Princeton, USA, October [6] I. Cox, S. Hingorani, S. Rao, B. Maggs, A maximum likelihood stereo algorithm, Comput. Vision, Graphics Image Process. (1995) to appear. [7] J.L. Dugelay, D. Pele, Motion disparity analysis of a stereoscopic sequence. Application to 3DTV coding, EUSIPCO '92, October 1992, pp. 1295}1298. [8] W.L.O. Egger, M. Kunt, High compression image coding using an adaptive morphological subband decomposition, Proc. IEEE 83 (February 1995) 272}287. [9] L. Falkenhagen, 3D Object-based depth estimation from stereoscopic image sequences, in: Proc. Internat. Workshop on Stereoscopic and 3D Imaging '95, Santorini, Greece, September 1995, pp. 81}86. [10] N. Grammalidis, S. Malassiotis, D. Tzovaras, M.G. Strintzis, Stereo image sequence coding based on three-dimensional motion estimation compensation, Signal Processing: Image Communication 7 (August 1995) 129}145. [11] M. HoK tter, Object-oriented analysis}synthesis coding based on moving two-dimensional objects, Signal Processing: Image Communication 2 (December 1990) 409}428. [12] M. HoK tter, Optimization and e$ciency of an object-oriented analysis-synthesis coder, Signal Processing: Image Communication 4 (April 1994) 181}194. [13] E. Izquierdo, M. Ernst, Motion/disparity analysis for 3D-video-conference applications, in: M.G.S. et al. (Eds.), Proc. Internat. Workshop Stereoscopic and 3D Imaging, Santorini, Greece, September 1995, pp. 180}186. [14] R. Koch, Dynamic 3D scene analysis through synthesis feedback control, IEEE Trans. Pattern Anal. and Mach. Intell. 15 (June 1993) 556}568. [15] H. Li, A. Lundmark, R. Forchheimer, Image sequence coding at very low bitrates } a review, IEEE Trans. Image Process. 3 (September 1995) 589}609. [16] J. Liu, R. Skerjanc, Stereo and motion correspondence in a sequence of stereo images, Signal Processing: Image Communication 5 (October 1993) 305}318. [17] S. Malassiotis, M.G. Strintzis, Optimal 3D mesh object modeling for depth estimation from stereo images, in: Proc. 4th European Workshop on 3D Television, Rome, October [18] G. Martinez, Shape estimation of moving articulated 3D objects for object-based analysis-synthesis coding (OBASC), in: Internat. Workshop on Coding Techniques for Very Low Bit-rate Video, Tokyo, Japan, November [19] G. MartmH nez, Object articulation for model-based facial image coding, Signal Processing: Image Communication, (September 1996). [20] MPEG-2, Generic coding of moving pictures and associated audio information, tech. rep., ISO/IEC 13818, [21] H.G. Mussman, M. HoK tter, J. Ostermann, Object-oriented analysis}synthesis coding of moving images, Signal Processing: Image Communication 1 (October 1989) 117}138. [22] H.G. Musmann, P. Pirsch, H.J. Grallert, Advances in picture coding, Proc. IEEE 73 (April 1985) 523}548. [23] A.N. Netravali, B.G. Haskell, Digital Pictures } Representation and Compression. Plenum Press, New York and London, [24] S. Panis, M. Ziegler, Object based coding using motion stereo information, in: Proc. Picture Coding Symposium (PCS '94), Sacramento, California, September 1994, pp. 308}312. [25] D.V. Papadimitriou, Stereo in model-based image coding, in: Internat. Workshop on Coding Techniques for Very Low Bit-rate Video (VLBV 94), Colchester, April 1994, p [26] L. Robert, R. Deriche, Dense depth map reconstruction using a multiscale regularization approach with discontinuities preserving, in: M.G.S. et al. (Eds.), Proc. Internat. Workshop Stereoscopic and 3D Imaging, Santorini, Greece, September 1995, pp. 32}39. [27] S.S. Sinha, B.G. Schunck, A two-stage algorithm for discontinuity-preserving surface reconstruction, IEEE Trans. on PAMI 14 (January 1992). [28] A. Tamtaoui, C. Labit, Constrained disparity motion estimators for 3DTV image sequence coding, Signal Processing: Image Communication 4 (November 1991) 45}54. [29] A. Tamtaoui, C. Labit, Symmetrical stereo matching for 3DTV sequence coding, in: Picture Coding Symp. PCS '93, March [30] D. Tzovaras, N. Grammalidis, M.G. Strintzis, Depth map coding for stereo and multiview image sequence transmission, in: Internat. Workshop on Stereoscopic and 3D Imaging (IWS3DI'95), Santorini, Greece, September 1995, pp. 75}80.

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press,   ISSN Hopeld Network for Stereo Correspondence Using Block-Matching Techniques Dimitrios Tzovaras and Michael G. Strintzis Information Processing Laboratory, Electrical and Computer Engineering Department, Aristotle

More information

x L d +W/2 -W/2 W x R (d k+1/2, o k+1/2 ) d (x k, d k ) (x k-1, d k-1 ) (a) (b)

x L d +W/2 -W/2 W x R (d k+1/2, o k+1/2 ) d (x k, d k ) (x k-1, d k-1 ) (a) (b) Disparity Estimation with Modeling of Occlusion and Object Orientation Andre Redert, Chun-Jen Tsai +, Emile Hendriks, Aggelos K. Katsaggelos + Information Theory Group, Department of Electrical Engineering

More information

DIGITAL video is an integral part of many newly emerging

DIGITAL video is an integral part of many newly emerging IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 5, SEPTEMBER 1998 547 3-D Model-Based Segmentation of Videoconference Image Sequences Ioannis Kompatsiaris, Student Member, IEEE,

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 8, August 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study on Block

More information

Multiview Image Compression using Algebraic Constraints

Multiview Image Compression using Algebraic Constraints Multiview Image Compression using Algebraic Constraints Chaitanya Kamisetty and C. V. Jawahar Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, INDIA-500019

More information

Disparity map coding for 3D teleconferencing applications

Disparity map coding for 3D teleconferencing applications Disparity map coding for 3D teleconferencing applications André Redert, Emile Hendriks Information Theory Group, Department of Electrical Engineering Delft University of Technology, Mekelweg 4, 2628 CD

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

Sprite Generation and Coding in Multiview Image Sequences

Sprite Generation and Coding in Multiview Image Sequences 302 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 2, MARCH 2000 Sprite Generation and Coding in Multiview Image Sequences Nikos Grammalidis, Student Member, IEEE, Dimitris

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD Ertem Tuncel and Levent Onural Electrical and Electronics Engineering Department, Bilkent University, TR-06533, Ankara, Turkey

More information

Multi-View Stereo for Static and Dynamic Scenes

Multi-View Stereo for Static and Dynamic Scenes Multi-View Stereo for Static and Dynamic Scenes Wolfgang Burgard Jan 6, 2010 Main references Yasutaka Furukawa and Jean Ponce, Accurate, Dense and Robust Multi-View Stereopsis, 2007 C.L. Zitnick, S.B.

More information

Proceedings of the 6th Int. Conf. on Computer Analysis of Images and Patterns. Direct Obstacle Detection and Motion. from Spatio-Temporal Derivatives

Proceedings of the 6th Int. Conf. on Computer Analysis of Images and Patterns. Direct Obstacle Detection and Motion. from Spatio-Temporal Derivatives Proceedings of the 6th Int. Conf. on Computer Analysis of Images and Patterns CAIP'95, pp. 874-879, Prague, Czech Republic, Sep 1995 Direct Obstacle Detection and Motion from Spatio-Temporal Derivatives

More information

Joint Position Estimation for Object Based Analysis Synthesis Coding

Joint Position Estimation for Object Based Analysis Synthesis Coding Joint Position Estimation for Object Based Analysis Synthesis Coding Geovanni Martínez* Escuela de Ingeniería Eléctrica, Universidad de Costa Rica, 2060, San José, Costa Rica ABSTRACT An object based analysis

More information

Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting.

Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting. Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting. Nikos Grammalidis and Michael G.Strintzis Department of Electrical Engineering, University of Thessaloniki Thessaloniki 540 06, GREECE ngramm@panorama.ee.auth.gr,

More information

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and

More information

Estimation of eye and mouth corner point positions in a knowledge based coding system

Estimation of eye and mouth corner point positions in a knowledge based coding system Estimation of eye and mouth corner point positions in a knowledge based coding system Liang Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität Hannover, Appelstraße 9A,

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH

BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH Marc Servais, Theo Vlachos and Thomas Davies University of Surrey, UK; and BBC Research and Development,

More information

A multiresolutional region based segmentation scheme for stereoscopic image compression

A multiresolutional region based segmentation scheme for stereoscopic image compression Published in SPIE Vol.2419 - Digital Video compression - Algorithms and technologies in 1995 A multiresolutional region based segmentation scheme for stereoscopic image compression Sriram Sethuraman, M.

More information

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47 Flow Estimation Min Bai University of Toronto February 8, 2016 Min Bai (UofT) Flow Estimation February 8, 2016 1 / 47 Outline Optical Flow - Continued Min Bai (UofT) Flow Estimation February 8, 2016 2

More information

Intermediate view synthesis considering occluded and ambiguously referenced image regions 1. Carnegie Mellon University, Pittsburgh, PA 15213

Intermediate view synthesis considering occluded and ambiguously referenced image regions 1. Carnegie Mellon University, Pittsburgh, PA 15213 1 Intermediate view synthesis considering occluded and ambiguously referenced image regions 1 Jeffrey S. McVeigh *, M. W. Siegel ** and Angel G. Jordan * * Department of Electrical and Computer Engineering

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING

NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING Nicole Atzpadin 1, Serap Askar, Peter Kauff, Oliver Schreer Fraunhofer Institut für Nachrichtentechnik, Heinrich-Hertz-Institut,

More information

Mesh Based Interpolative Coding (MBIC)

Mesh Based Interpolative Coding (MBIC) Mesh Based Interpolative Coding (MBIC) Eckhart Baum, Joachim Speidel Institut für Nachrichtenübertragung, University of Stuttgart An alternative method to H.6 encoding of moving images at bit rates below

More information

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth Common Classification Tasks Recognition of individual objects/faces Analyze object-specific features (e.g., key points) Train with images from different viewing angles Recognition of object classes Analyze

More information

University of Erlangen-Nuremberg. Cauerstrasse 7, Erlangen, Germany. Abstract

University of Erlangen-Nuremberg. Cauerstrasse 7, Erlangen, Germany. Abstract Motion-Based Analysis and Segmentation of Image Sequences using 3-D Scene Models Eckehard Steinbach, Peter Eisert, and Bernd Girod Telecommunications Laboratory, University of Erlangen-Nuremberg Cauerstrasse

More information

Speech Driven Synthesis of Talking Head Sequences

Speech Driven Synthesis of Talking Head Sequences 3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University

More information

FACIAL FEATURE EXTRACTION BASED ON THE SMALLEST UNIVALUE SEGMENT ASSIMILATING NUCLEUS (SUSAN) ALGORITHM. Mauricio Hess 1 Geovanni Martinez 2

FACIAL FEATURE EXTRACTION BASED ON THE SMALLEST UNIVALUE SEGMENT ASSIMILATING NUCLEUS (SUSAN) ALGORITHM. Mauricio Hess 1 Geovanni Martinez 2 FACIAL FEATURE EXTRACTION BASED ON THE SMALLEST UNIVALUE SEGMENT ASSIMILATING NUCLEUS (SUSAN) ALGORITHM Mauricio Hess 1 Geovanni Martinez 2 Image Processing and Computer Vision Research Lab (IPCV-LAB)

More information

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures Now we will talk about Motion Analysis Motion analysis Motion analysis is dealing with three main groups of motionrelated problems: Motion detection Moving object detection and location. Derivation of

More information

Efficient Block Matching Algorithm for Motion Estimation

Efficient Block Matching Algorithm for Motion Estimation Efficient Block Matching Algorithm for Motion Estimation Zong Chen International Science Inde Computer and Information Engineering waset.org/publication/1581 Abstract Motion estimation is a key problem

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k TEMPORAL STABILITY IN SEQUENCE SEGMENTATION USING THE WATERSHED ALGORITHM FERRAN MARQU ES Dept. of Signal Theory and Communications Universitat Politecnica de Catalunya Campus Nord - Modulo D5 C/ Gran

More information

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Y. Vatis, B. Edler, I. Wassermann, D. T. Nguyen and J. Ostermann ABSTRACT Standard video compression techniques

More information

A The left scanline The right scanline

A The left scanline The right scanline Dense Disparity Estimation via Global and Local Matching Chun-Jen Tsai and Aggelos K. Katsaggelos Electrical and Computer Engineering Northwestern University Evanston, IL 60208-3118, USA E-mail: tsai@ece.nwu.edu,

More information

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Final Report Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This report describes a method to align two videos.

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17 Stereo Wrap + Motion CSE252A Lecture 17 Some Issues Ambiguity Window size Window shape Lighting Half occluded regions Problem of Occlusion Stereo Constraints CONSTRAINT BRIEF DESCRIPTION 1-D Epipolar Search

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

Facial Expression Analysis for Model-Based Coding of Video Sequences

Facial Expression Analysis for Model-Based Coding of Video Sequences Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of

More information

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO-IEC JTC1/SC29/WG11 CODING OF MOVING PICTRES AND ASSOCIATED ADIO ISO-IEC/JTC1/SC29/WG11 MPEG 95/ July 1995

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Literature Survey Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This literature survey compares various methods

More information

THE GENERATION of a stereoscopic image sequence

THE GENERATION of a stereoscopic image sequence IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005 1065 Stereoscopic Video Generation Based on Efficient Layered Structure and Motion Estimation From a Monoscopic

More information

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera Wolfgang Niem, Jochen Wingbermühle Universität Hannover Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

Department of Electrical Engineering, Keio University Hiyoshi Kouhoku-ku Yokohama 223, Japan

Department of Electrical Engineering, Keio University Hiyoshi Kouhoku-ku Yokohama 223, Japan Shape Modeling from Multiple View Images Using GAs Satoshi KIRIHARA and Hideo SAITO Department of Electrical Engineering, Keio University 3-14-1 Hiyoshi Kouhoku-ku Yokohama 223, Japan TEL +81-45-563-1141

More information

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet: Local qualitative shape from stereo without detailed correspondence Extended Abstract Shimon Edelman Center for Biological Information Processing MIT E25-201, Cambridge MA 02139 Internet: edelman@ai.mit.edu

More information

REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING. Robert L. Stevenson. usually degrade edge information in the original image.

REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING. Robert L. Stevenson. usually degrade edge information in the original image. REDUCTION OF CODING ARTIFACTS IN LOW-BIT-RATE VIDEO CODING Robert L. Stevenson Laboratory for Image and Signal Processing Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556

More information

RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY. Peter Eisert and Jürgen Rurainsky

RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY. Peter Eisert and Jürgen Rurainsky RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY Peter Eisert and Jürgen Rurainsky Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institute Image Processing Department

More information

STEREOSCOPIC IMAGE PROCESSING

STEREOSCOPIC IMAGE PROCESSING STEREOSCOPIC IMAGE PROCESSING Reginald L. Lagendijk, Ruggero E.H. Franich 1 and Emile A. Hendriks 2 Delft University of Technology Department of Electrical Engineering 4 Mekelweg, 2628 CD Delft, The Netherlands

More information

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding 344 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding Peter

More information

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION In this chapter we will discuss the process of disparity computation. It plays an important role in our caricature system because all 3D coordinates of nodes

More information

Rectification and Distortion Correction

Rectification and Distortion Correction Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification

More information

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Detecting Planar Homographies in an Image Pair. submission 335. all matches. identication as a rst step in an image analysis

Detecting Planar Homographies in an Image Pair. submission 335. all matches. identication as a rst step in an image analysis Detecting Planar Homographies in an Image Pair submission 335 Abstract This paper proposes an algorithm that detects the occurrence of planar homographies in an uncalibrated image pair. It then shows that

More information

Figure 1: Representation of moving images using layers Once a set of ane models has been found, similar models are grouped based in a mean-square dist

Figure 1: Representation of moving images using layers Once a set of ane models has been found, similar models are grouped based in a mean-square dist ON THE USE OF LAYERS FOR VIDEO CODING AND OBJECT MANIPULATION Luis Torres, David Garca and Anna Mates Dept. of Signal Theory and Communications Universitat Politecnica de Catalunya Gran Capita s/n, D5

More information

Phase2. Phase 1. Video Sequence. Frame Intensities. 1 Bi-ME Bi-ME Bi-ME. Motion Vectors. temporal training. Snake Images. Boundary Smoothing

Phase2. Phase 1. Video Sequence. Frame Intensities. 1 Bi-ME Bi-ME Bi-ME. Motion Vectors. temporal training. Snake Images. Boundary Smoothing CIRCULAR VITERBI BASED ADAPTIVE SYSTEM FOR AUTOMATIC VIDEO OBJECT SEGMENTATION I-Jong Lin, S.Y. Kung ijonglin@ee.princeton.edu Princeton University Abstract - Many future video standards such as MPEG-4

More information

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 1, JANUARY 2001 111 A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro TR2006-035 April 2006 Abstract

More information

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC Shuji Sakai, Koichi Ito, Takafumi Aoki Graduate School of Information Sciences, Tohoku University, Sendai, 980 8579, Japan Email: sakai@aoki.ecei.tohoku.ac.jp

More information

CHAPTER 5 MOTION DETECTION AND ANALYSIS

CHAPTER 5 MOTION DETECTION AND ANALYSIS CHAPTER 5 MOTION DETECTION AND ANALYSIS 5.1. Introduction: Motion processing is gaining an intense attention from the researchers with the progress in motion studies and processing competence. A series

More information

Region Segmentation for Facial Image Compression

Region Segmentation for Facial Image Compression Region Segmentation for Facial Image Compression Alexander Tropf and Douglas Chai Visual Information Processing Research Group School of Engineering and Mathematics, Edith Cowan University Perth, Australia

More information

Extensions of H.264/AVC for Multiview Video Compression

Extensions of H.264/AVC for Multiview Video Compression MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Extensions of H.264/AVC for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, Anthony Vetro, Huifang Sun TR2006-048 June

More information

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map Sriram Sethuraman 1 and M. W. Siegel 2 1 David Sarnoff Research Center, Princeton,

More information

PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1. We present some preliminary results on a system for tracking 3D motion using

PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1. We present some preliminary results on a system for tracking 3D motion using PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1 Tak-keung CHENG derek@cs.mu.oz.au Leslie KITCHEN ljk@cs.mu.oz.au Computer Vision and Pattern Recognition Laboratory, Department of Computer Science,

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Very Low Bit Rate Color Video

Very Low Bit Rate Color Video 1 Very Low Bit Rate Color Video Coding Using Adaptive Subband Vector Quantization with Dynamic Bit Allocation Stathis P. Voukelatos and John J. Soraghan This work was supported by the GEC-Marconi Hirst

More information

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A. An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering Rensselaer

More information

Stereo imaging ideal geometry

Stereo imaging ideal geometry Stereo imaging ideal geometry (X,Y,Z) Z f (x L,y L ) f (x R,y R ) Optical axes are parallel Optical axes separated by baseline, b. Line connecting lens centers is perpendicular to the optical axis, and

More information

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) 1 Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering

More information

Efficient Stereo Image Rectification Method Using Horizontal Baseline

Efficient Stereo Image Rectification Method Using Horizontal Baseline Efficient Stereo Image Rectification Method Using Horizontal Baseline Yun-Suk Kang and Yo-Sung Ho School of Information and Communicatitions Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro,

More information

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE K. Kaviya Selvi 1 and R. S. Sabeenian 2 1 Department of Electronics and Communication Engineering, Communication Systems, Sona College

More information

A Real Time System for Detecting and Tracking People. Ismail Haritaoglu, David Harwood and Larry S. Davis. University of Maryland

A Real Time System for Detecting and Tracking People. Ismail Haritaoglu, David Harwood and Larry S. Davis. University of Maryland W 4 : Who? When? Where? What? A Real Time System for Detecting and Tracking People Ismail Haritaoglu, David Harwood and Larry S. Davis Computer Vision Laboratory University of Maryland College Park, MD

More information

Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors

Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors Keith Forbes 1 Anthon Voigt 2 Ndimi Bodika 2 1 Digital Image Processing Group 2 Automation and Informatics Group Department of Electrical

More information

Scene Segmentation by Color and Depth Information and its Applications

Scene Segmentation by Color and Depth Information and its Applications Scene Segmentation by Color and Depth Information and its Applications Carlo Dal Mutto Pietro Zanuttigh Guido M. Cortelazzo Department of Information Engineering University of Padova Via Gradenigo 6/B,

More information

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Occlusion Detection of Real Objects using Contour Based Stereo Matching Occlusion Detection of Real Objects using Contour Based Stereo Matching Kenichi Hayashi, Hirokazu Kato, Shogo Nishida Graduate School of Engineering Science, Osaka University,1-3 Machikaneyama-cho, Toyonaka,

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /WIVC.1996.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /WIVC.1996. Czerepinski, P. J., & Bull, D. R. (1996). Coderoriented matching criteria for motion estimation. In Proc. 1st Intl workshop on Wireless Image and Video Communications (pp. 38 42). Institute of Electrical

More information

Image Segmentation Techniques for Object-Based Coding

Image Segmentation Techniques for Object-Based Coding Image Techniques for Object-Based Coding Junaid Ahmed, Joseph Bosworth, and Scott T. Acton The Oklahoma Imaging Laboratory School of Electrical and Computer Engineering Oklahoma State University {ajunaid,bosworj,sacton}@okstate.edu

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland New Zealand Tel: +64 9 3034116, Fax: +64 9 302 8106

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Geometric transform motion compensation for low bit. rate video coding. Sergio M. M. de Faria

Geometric transform motion compensation for low bit. rate video coding. Sergio M. M. de Faria Geometric transform motion compensation for low bit rate video coding Sergio M. M. de Faria Instituto de Telecomunicac~oes / Instituto Politecnico de Leiria Pinhal de Marrocos, Polo II-FCTUC 3000 Coimbra,

More information

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: , 3D Sensing and Reconstruction Readings: Ch 12: 12.5-6, Ch 13: 13.1-3, 13.9.4 Perspective Geometry Camera Model Stereo Triangulation 3D Reconstruction by Space Carving 3D Shape from X means getting 3D coordinates

More information

Realtime View Adaptation of Video Objects in 3-Dimensional Virtual Environments

Realtime View Adaptation of Video Objects in 3-Dimensional Virtual Environments Contact Details of Presenting Author Edward Cooke (cooke@hhi.de) Tel: +49-30-31002 613 Fax: +49-30-3927200 Summation Abstract o Examination of the representation of time-critical, arbitrary-shaped, video

More information

MOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE

MOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE Head-Eye Coordination: A Closed-Form Solution M. Xie School of Mechanical & Production Engineering Nanyang Technological University, Singapore 639798 Email: mmxie@ntuix.ntu.ac.sg ABSTRACT In this paper,

More information

3. International Conference on Face and Gesture Recognition, April 14-16, 1998, Nara, Japan 1. A Real Time System for Detecting and Tracking People

3. International Conference on Face and Gesture Recognition, April 14-16, 1998, Nara, Japan 1. A Real Time System for Detecting and Tracking People 3. International Conference on Face and Gesture Recognition, April 14-16, 1998, Nara, Japan 1 W 4 : Who? When? Where? What? A Real Time System for Detecting and Tracking People Ismail Haritaoglu, David

More information

Dimensional Imaging IWSNHC3DI'99, Santorini, Greece, September SYNTHETIC HYBRID OR NATURAL FIT?

Dimensional Imaging IWSNHC3DI'99, Santorini, Greece, September SYNTHETIC HYBRID OR NATURAL FIT? International Workshop on Synthetic Natural Hybrid Coding and Three Dimensional Imaging IWSNHC3DI'99, Santorini, Greece, September 1999. 3-D IMAGING AND COMPRESSION { SYNTHETIC HYBRID OR NATURAL FIT? Bernd

More information

A Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA -

A Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA - A hierarchical statistical framework for the segmentation of deformable objects in image sequences Charles Kervrann and Fabrice Heitz IRISA/INRIA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex,

More information

LOCALIZATION OF FACIAL REGIONS AND FEATURES IN COLOR IMAGES. Karin Sobottka Ioannis Pitas

LOCALIZATION OF FACIAL REGIONS AND FEATURES IN COLOR IMAGES. Karin Sobottka Ioannis Pitas LOCALIZATION OF FACIAL REGIONS AND FEATURES IN COLOR IMAGES Karin Sobottka Ioannis Pitas Department of Informatics, University of Thessaloniki 540 06, Greece e-mail:fsobottka, pitasg@zeus.csd.auth.gr Index

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad Video Compression by Three-dimensional Motion-Compensated Subband Coding Patrick Waldemar, Michael Rauth and Tor A. Ramstad Department of telecommunications, The Norwegian Institute of Technology, N-7034

More information

Platelet-based coding of depth maps for the transmission of multiview images

Platelet-based coding of depth maps for the transmission of multiview images Platelet-based coding of depth maps for the transmission of multiview images Yannick Morvan a, Peter H. N. de With a,b and Dirk Farin a a Eindhoven University of Technology, P.O. Box 513, The Netherlands;

More information

Multiple View Geometry

Multiple View Geometry Multiple View Geometry CS 6320, Spring 2013 Guest Lecture Marcel Prastawa adapted from Pollefeys, Shah, and Zisserman Single view computer vision Projective actions of cameras Camera callibration Photometric

More information

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM Daniel Grosu, Honorius G^almeanu Multimedia Group - Department of Electronics and Computers Transilvania University

More information

Fast Lighting Independent Background Subtraction

Fast Lighting Independent Background Subtraction Fast Lighting Independent Background Subtraction Yuri Ivanov Aaron Bobick John Liu [yivanov bobick johnliu]@media.mit.edu MIT Media Laboratory February 2, 2001 Abstract This paper describes a new method

More information