Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting.

Size: px

Start display at page:

Download "Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting."

Mervyn Albert Higgins
6 years ago
Views:

1 Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting. Nikos Grammalidis and Michael G.Strintzis Department of Electrical Engineering, University of Thessaloniki Thessaloniki , GREECE Abstract A novel procedure for segmenting a set of scattered 3- D data obtained from a head and shoulders multiview sequence is presented. The procedure consists of two steps. In the first step, two ellipses corresponding to the head and the body of the person are identified based on ellipse fitting of the outline of the person in each image. The fitting is based on a fast direct least squares method using the constraint that forces a general conic to be an ellipse. In order to achieve head/body segmentation, a K-means algorithm is used to minimise the fitting error between the points and the two ellipsoids. In the second step, a 3-D ellipsoid model corresponding to the head of the person is identified using an extension of the above method. Robustness and outlier removal can be achieved if 3-D ellipsoid model estimation technique is used in conjunction with the Median of Least Squares (MedLS) technique, which minimises the median of the errors corresponding to each 3-D point. An interesting application of the proposed method is the combination of the 3-D ellipsoid model with a generic face model which is adapted to the face images to provide information only for the high-detail front part of the head, while the 3-D ellipsoid is used for the back of the head, which is usually not visible. 1. Introduction Estimation of the position, shape and motion of the head from image sequences has been the focus of a significant amount of recent research [11, 8, 1]. In [11], a face segmentation and identification algorithm, utilising the elliptical structure of the human head is presented. An ellipse is fitted to a properly preprocessed edge map, in order to mark the boundary between the head and the background regions. A similar approach is used in [2], where a 2-D ellipse is fitted onto binary edge data and postprocessing techniques are used to eliminate detection errors. However, such approaches work well only with when the background is homogeneous and lightly to moderately cluttered. n [8], a technique is proposed for facial detection and coding. The detection of the face, as well as the facial features (eyes, nose, mouth, etc) is based on a parametric image model obtained using a Karhunen-Loève decomposition. The detected face image is then normalised by preprocessing, aiming to align the detected facial features with those of a standard model and also to compensate lighting and contrast variations. The normalised image is then coded by projecting it (by inner product) to a set of eigenfaces [13, 14], i.e. a subset of the eigenvectors obtained using a large database of normalised face images. If only a small number of primary eigenfaces are used, this technique results to successful face detection and coding using extremely low bitrates. Furthermore, the resulting face representations may additionally be used for face recognition applications. In [1], a 3-D ellipsoidal model is used for robust tracking of the rigid motion of the human head from a video sequence. This approach is based on the interpretation of the optical flow in terms of 3-D motion of the model. The method is seen to be robust to large head movements and to provide better results than those obtained using a simpler planar head model. Initialisation is based on predicting the shape of the ellipsoid from the positions of the face features (eyes, nose, mouth) which were previously estimated as in [8]. In [17], a hybrid system is presented using visual and sound cues for face localisation. The system exploits mainly luminance and motion information for locating and tracking the head edges from videoconferencing image sequences. In the present paper we shall assume a typical videoconferencing scene, with one person in front of the cameras. The initial set of 3-D points may be produced using depth estimation from stereoscopic or multiple views, provided that the camera geometry and calibration parameters are known. Furthermore, we assume that the outline of the person is available using some preprocessing technique. In the test

2 sequences, used for our experimental results, this outline is easily available since the background is homogeneous. A K-means algorithm, which is able to identify ellipses in the image, was then used for simultaneous ellipse fitting and segmentation of the outline. Specifically, a least squares fit of the data on a conic is performed, using an additional constraint, forcing the conic to be an ellipse. The algorithm is applied to estimate two ellipses, each corresponding to the head and body of the person. The basic ellipse fitting technique was also extended to the three dimensional case. This 3-D ellipsoid fitting technique was applied so as to estimate 3-D ellipsoids for the head of a person. Specifically, 3-D points that are projected inside each of the estimated ellipses in the previous step are used to estimate the 3-D ellipsoid model. Because of its high computational efficiency, this technique can be used for fast tracking of the head and body position and orientation. Since the segmentation of the nodes is only performed at the first time instant, only the estimation of the 3-D ellipsoid from the available data points has to be determined at subsequent instants. Finally, an interesting application of the proposed method is the combination of the 3-D ellipsoid model with a generic face model which is adapted to the face images to provide information only for the high-detail front part of the head, while the 3-D ellipsoid is used for the back of the head, which is usually not visible. 2. Estimating the initial set of 3-D points In order to estimate the initial 3-D point set, a multiocular system with N fully calibrated views is assumed. Each camera c k k =1 ::: N is modeled using a pinhole camera model based on perspective projection and we assume that accurate calibration information is available. A number of methods have been developed for depth estimation in stereoscopic [16, 15] and multiview [6, 4] image sequences. In this paper, we use the depth estimation and from trinocular image sequences developed in the ACTS 092 PANORAMA project. More specifically, depth is estimated from a fully calibrated trinocular image pair using the depth estimation algorithm in [4]. Furthermore, a reliability value is also estimated for depth estimate. A set of 3-D points where depth estimation with high reliability is then identified by selecting the 3-D points whose reliability exceeds a certain threshold. 3. Approximation of a 3-D data set using an 3- D Ellipsoid Model In [5], a new efficient method was presented for fitting ellipses to scattered 2-D data. The method is based on fitting the general equation of a conic to the given data, subject to a constraint which forces the conic to be an ellipse. In this paper, we develop an segmentation and estimation scheme using this technique in a K-means algorithm. Furthermore, we extend this technique for 3-D data, so that it can be used for approximating a general set of 3-D points using a 3-D ellipsoid. The general equation of a conic in the N-D space is: F (A b c x) =x T Ax + b T x + c =0 (1) where A is a symmetric matrix. In order for this general conic to be an ellipsoid in N-dimensions, the matrix A must be either positive or negative definite. As a result, the problem of fitting an ellipsoid into N data points x i can be solved by minimising the sum of squares of the algebraic distances NX d(a b c)= F (A b c x) 2 (2) i=1 subject to the constraint that A is either positive or negative definite. In general, this constrained problem is very difficult to solve since the Kuhn-Tucker conditions [9] do not guarantee a solution [5]. For the 2-D case, a direct solution to the problem was presented in [5]. The equation of the 2-D conic is F (a x) =ax = ax 2 + bxy + cy 2 + dx + ey + f =0 (3) where a = [a bcdef], x = [x 2 xy y 2 xy1] and the constraint to lead to an ellipse is 4ac ; b 2 > 0. Sincea is a free parameter, the equality constraint 4ac ; b 2 =1may be imposed instead. This in turn may be be written as ; a T Ca = a 26 T a =1 (4) 75 Using the above constraint, the following constrained fitting problem is formulated [5]: P N Problem 1 Minimise E = i=1 F 2 (a x i )=jjdajj 2 subject to the constraint a T Ca =1 where D = [x1 x2 x N ] T. This problem can be directly solved using a Lagrange multiplier, providing the following solution: a is the generalised eigenvector of D T Da = Ca (5) corresponding to the smallest positive eigenvalue.

3 In [5], it is proved that exactly one eigenvalue of eq. (5) is positive, thus a unique eigenvector a s is always found using the above method, which is non-iterative and thus extremely efficient. Even though this approach is not easily extended into the general N-D problem, a generalisation suitable for the 3-D case will now be formulated. The equation of a 3-D conic is: F (a x) =a x = = ax 2 +bxy +cy 2 +dxz +fz 2 +gz+hy +kz+l =0 (6) where a x are 10-dimensional vectors, can be written in the form of eq. (1), where A = U = 2 4 a b=2 d=2 b=2 c e=2 d=2 e=2 f a b=2 b=2 c 3 5 U v = v T f v =[d=2 e=2] T : The following Lemma provides the constraint forcing a general 3-D conic to be an ellipsoid. Lemma 1 A is positive definite or positive negative, provided that i. det(u) > 0 () 4ac ; b 2 > 0 ii. (a + c) det(a) > 0 Proof Let x =[xyz] T, u =[xy] T and u c = ;U ;1 vz. Then, x T Ax = u T Uu +2u T vz + fz 2 = (u ; u c ) T U(u ; u c )+z 2 (f ; v T U ;1 v)= (u ; u c ) T U(u ; u c )+ det(u) z2 (f det(u) ;v T c ;b=2 ;b=2 a v) = (u ; u c ) T U(u ; u c )+ z2 det(u) (acf ; fb2 =4 ;cd 2 =4 ; ae 2 =4+bde=4) = (u ; u c ) T U(u ; u c )+z 2 det(a) det(u) Assuming that U is positive definite, (u ; u c ) T U(u ; u c ) > 0 when x 6= 0. Furthermore, det(u) > 0 and tr(u) =a + c>0, thus if det(a) > 0, then eq. (7) yields x T Ax > 0 for x 6= 0, thus A is positive definite. Similarly, if U is negative definite, (u ; u c ) T U(u ; u c ) < 0, det(u) > 0 and tr(u) =a + c<0. Therefore, if det(a) < 0 then eq. (7) yields x T Ax < 0 for x 6= 0, thus A is negative definite, q.e.d. The first condition of Lemma 1 is very similar to the constraint 4ac ; b 2 > 0 used in the 2-D case. Therefore, we (7) can setup the problem of minimising the error E, asbefore using the constraint a T Ca =1,where: C = ; The solution is again obtained by solving Problem 1 using eq. (5) with the new C of (8). However, in this case, after obtaining the solution, we have to check whether the second condition of Lemma 1 is satisfied. If we select the sign of the solution eigenvector so that a + c>0, then the second condition is simplified to det(a) > Ellipse estimation and segmentation using K-means algorithm A significant advantage of the method described in the previous Section is its low computational requirements. This feature may be exploited by designing an algorithm for simultaneous ellipse estimation and segmentation of a set of 2-D points into ellipses. A simple algorithm that can be used for this purpose is a modified K-means algorithm, which is described below. In this version, we consider an algorithm using K =2,which is suitable for identifying the head and the body of a person from the person s outline in one of the available views (8) i. Initialisation: Divide the available points into K=2 sets, based on subsidiary information (e.g. relative position of head and body). ii. Fit an ellipse to each set, using the procedure described in the previous Section. iii. Reassign each point to the ellipse for which the distance F 2 (a x i ) in eq. (3) between the point x i and the ellipsoid is minimised. Points with distances larger than a given threshold are not assigned to any set. iv. End of the estimation procedure if the change in the estimated ellipsoid parameters is below a threshold else return to step ii.

4 5. Robustness and removal of outliers using Median of Least Squares When the initial 3-D point estimates are noisy, or contain outliers (e.g. points from the neck or the body of the person), the above method result to erroneous estimates of the 3-D ellipsoid parameters. Robustness can be achieved by using the MedLS technique which minimises the median of the errors corresponding to each 3-D point. Furthermore, this scheme allows the integration of the tests for the validity of the conditions forcing the estimated conic forms to be 3-D ellipsoids. More precisely, the procedure consists of the following steps: 1. Set E min to a large number. Randomly select a set of N 3-D points from the initial 3-D point set. For these points: 2. Estimate the ellipsoid parameters A b c. If the resulting conic is not an ellipsoid, discard the solution and return to Step Compute the Median of the Least Squares Fitting Error (E MedLS ) for each of point in the initial 3-D point set, using the estimated ellipsoid parameters. If E MedLS <E min,sete min = E MedLS and the corresponding parameters as the optimal parameters. 4. Iterate Steps 1-3 until E min is below a threshold or a maximum number of iterations is reached. 6 Combining 3-D Face Models with the 3-D Ellipsoid Another important application of the proposed methods is to combine a generic face model with a 3-D ellipsoid model estimated using the method described in Section 3. Various methods for adapting generic 3-D face models to face images have been proposed [12, 7, 3, 10]. In this paper, we have used the face adaptation method proposed in [10], where a generic 3D face model is transformed in a way that it adapts to the characteristics of an actual face observed in its front and profile image views. Assuming that a set of feature points have been detected on both views, the adaptation procedure initialises with a rigid transformation of the model aiming to minimise the distances of the 3D model feature nodes from the calculated 3D coordinates of the 2D feature points. Then, a non-rigid transformation ensures that the feature nodes are displaced optimally close to their exact calculated positions, dragging their neighbours in a way that does not deform the facial model in an unnatural way. In theory, any 3D model of the human face may be used. In practice however, the closer the characteristics of the model are to the characteristics of the image (e.g. model of female face for a female face) the fewer feature points will be needed for accurate adaptation. The estimated 3-D ellipsoid is then used to approximate the back of the head, while the adapted generic 3-D wireframe provides information for the high-detail front part of the head. In order to identify the part of the ellipsoid that corresponds to the back of the head, a check is performed as to whether the line that connects each node with the ellipsoid center intersects with any triangle of the wireframe corresponding to the front(face) wireframe. If so, the corresponding node is removed from the ellipsoid wireframe. Furthermore, each triangle in the ellipsoid wireframe containing this node is also removed. At this stage, two wireframe models corresponding to the front and the back part of the head have been generated. Since these parts are not connected, an algorithm was developed to automatically connect them. This is achieved by adding additional triangles between the list of nodes that define the largest wireframe boundary in each wireframe model. The largest wireframe boundary is simply the set of nodes corresponding to the largest path that can be defined using wireframe edges (edges belonging to only one triangle). 7. Experimental Results Results were obtained using the 3-view sequence Ludo The set of initial 3-D points is illustrated in Figure 1 (a triangulation was performed for display purposes). The approximation of the outline of the person with two ellipses corresponding to the head and the body area is shown in Figure 2. In this case a very simple initialisation was used: points lying higher (lower) than the mean height of the outline are attributed to the head (body) region. Convergence is achieved within a small number of iterations, typically 3, as shown in Figure 3, each requiring less than 0.5 sec at an SGI R4400 workstation. Then, the nodes of the 3-D wireframe of the head that are projected inside the head outline are used to estimate a 3-D ellipsoid model for the head. This model is illustrated in Figure 4. Table 1 shows the improvement at the performance of the proposed 3-D estimation technique when the MedLS technique is used to achieve robustness to noise. The comparison criterion used was the value of a fitting ratio r defined as follows: r = C intersection (9) C union where C intersection,c union are the number of pixels belonging to the intersection and the union respectively of the head area (defined manually after sketching the head contour) and the estimated ellipse obtained by projecting the 1 This sequence was provided by THOMPSON BROADCAST SYSTEMS for the the ACTS 092 project PANORAMA.

0.015 Error E 0.010 0.005 1 2 3 4 5 Iterations Figure 3. Convergence of mean error E Figure 1. The initial 3-D wireframe Figure 2.

5 0.015 Error E Iterations Figure 3. Convergence of mean error E Figure 1. The initial 3-D wireframe Figure 2. The outline of the person (left view) and the two estimated ellipses using the K- means algorithm 3-D ellipsoid on the image plane. Larger values of the ratio r 2 (0 1] indicate better approximation of the head area with the ellipsoid. All methods were applied to 10 consecutive frames from the Ludo sequence (left view), and the average fitting ratio was calculated and is presented in Table 1. Two views of generated head models produced when combining a generic face model with the estimated 3-D ellipsoid model, as described in Section 6, are presented in Figures 5(a-b)(front and rear view respectively). 8. Conclusions An efficient technique to segment the head and body parts obtained from a head and shoulders multiview sequence and to estimate a 3-D ellipsoid model corresponding to the head was presented. A coarse approximation of an initial wireframe requiring minimal bitrate was thus obtained. Robustness and outlier removal was achieved by augmenting the above basic methods with either the Me- Figure 4. Estimated 3-D ellipsoid model superimposed on the head of Ludo (left view) dian of Least Squares (MedLS) technique which minimises the median of the errors corresponding to each 3-D point. Also, an application of the proposed method for combining of the 3-D ellipsoid model with a generic face model which is adapted to the face images was examined. 9. Acknowledgement This work was supported by the EU project ACTS VI- DAS, the GSRT PENED project and the GSRT AMEA Project: LIP-PHONE - A Videophone for the Deaf. Method fitting ratio r One-step 3-D parameter estimation MedLS Table 1. Improvement of fitting ratio r when using the MedLS robust estimation technique.

References [1] S. Basu, I. Essa, and A. Pentland. Motion Regularization for Model-based Head Tracking. In Proceedings, International Conference on Pattern Recognition, Vienna, Austria, August 1996.

6 References [1] S. Basu, I. Essa, and A. Pentland. Motion Regularization for Model-based Head Tracking. In Proceedings, International Conference on Pattern Recognition, Vienna, Austria, August [2] A. Eleftheriadis and A. Jacquin. Automatic face location detection for model-assisted rate control in h.261 compatible coding of video. Signal Processing : Image Communication, 7: , November [3] M. Escher and N. Magnenat-Thalmann. Automatic 3d cloning and real-time animation of a human face. In Proceedings Computer Animation (CA97), Geneva, Switzerland, [4] L. Falkenhagen. Block-Based Depth Estimation from Image Triples with Unrestricted Camera Setup. In IEEE Workshop on Multimedia Image Processing, Princeton, NJ, June [5] A. W. Fitzgibbon, M. Pilu, and R. Fisher. Direct Least Squares Fitting of Ellipses. In Proc. International Conference on Pattern Recognition, Vienna, Austria, August [6] N. Grammalidis and M. G. Strintzis. Disparity and Occlusion Estimation in Multiocular Systems and their Coding for the Communication of Multiview Image Sequences. IEEE Trans. Circuits and Systems for Video Technology, 8(3): , June [7] N. Magnenat-Thalmann, P.Kalra, and M. Escher. Face to virtual face. Proc. IEEE, 86(5), [8] B. Moghaddam and A. Pentland. An Automatic System for Model-Based Coding of Faces. In IEEE Data Compression Conference, Snowbird, Utah, March [9] S. Rao. Optimization:Theory and Applications. Wiley Eastern, [10] N. Sarris and M. G. Strintzis. Three dimensional facial model adaptation. In submitted to ICIP 00, Vancouver, Canada, [11] S. Sirohey. Human Face Segmentation and Identification. Master s thesis, CV Laboratory, University of Maryland, College Park, MD, November [12] D. Terzopoulos and K. Waters. Physically-based facial modelling, analysis, and animation. The Journal of Visualization and Computer Animation, 1:73 80, [13] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuro Science, 3(1):71 86, [14] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages , Hawai, June [15] D. Tzovaras, N. Grammalidis, and M. G. Strintzis. Object- Based Coding of Stereo Image Sequences Using Joint 3D motion/disparity Compensation. IEEE Trans. on Ciscuits and Systems for Video Technology, 7(2): , April [16] D. Tzovaras, N. Grammalidis, and M. G. Strintzis. Disparity field and Depth Map Coding for Multiview 3D Image Generation. Signal Processing : Image Communication, 11(3): , January [17] C. Wang and M. Brandstein. A hybrid real-time face tracking system. In Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, Seattle, WA, May (a) (b) Figure 5. The combination of the generic face model and the 3-D ellipsoid model as viewed from two different viewpoints a) front view b) rear view.

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN Hopeld Network for Stereo Correspondence Using Block-Matching Techniques Dimitrios Tzovaras and Michael G. Strintzis Information Processing Laboratory, Electrical and Computer Engineering Department, Aristotle