2D to pseudo-3d conversion of "head and shoulder" images using feature based parametric disparity maps

Similar documents
Intermediate view synthesis considering occluded and ambiguously referenced image regions 1. Carnegie Mellon University, Pittsburgh, PA 15213

CIRCULAR MOIRÉ PATTERNS IN 3D COMPUTER VISION APPLICATIONS

Optimizing Monocular Cues for Depth Estimation from Indoor Images

Conversion of 2D Image into 3D and Face Recognition Based Attendance System

Miniature faking. In close-up photo, the depth of field is limited.

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

Previously... contour or image rendering in 2D

Stereo pairs from linear morphing

3D-OBJECT DETECTION METHOD BASED ON THE STEREO IMAGE TRANSFORMATION TO THE COMMON OBSERVATION POINT

Real-time facial feature point extraction

Object detection using non-redundant local Binary Patterns

Shweta Gandhi, Dr.D.M.Yadav JSPM S Bhivarabai sawant Institute of technology & research Electronics and telecom.dept, Wagholi, Pune

Light source estimation using feature points from specular highlights and cast shadows

A Robust and Efficient Algorithm for Eye Detection on Gray Intensity Face

Facial Animation System Design based on Image Processing DU Xueyan1, a

Why study Computer Vision?

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

New wavelet based ART network for texture classification

Face detection using generalised integral image features

CHAPTER 3 RETINAL OPTIC DISC SEGMENTATION

Detecting People in Images: An Edge Density Approach

Smoothing Region Boundaries in Variable Depth Mapping for Real Time Stereoscopic Images

Snakes, level sets and graphcuts. (Deformable models)

Classification of Upper and Lower Face Action Units and Facial Expressions using Hybrid Tracking System and Probabilistic Neural Networks

Active 3D Smartphone App

A Survey of Light Source Detection Methods

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

3D from Images - Assisted Modeling, Photogrammetry. Marco Callieri ISTI-CNR, Pisa, Italy

Head Pose Estimation by using Morphological Property of Disparity Map

Three methods that improve the visual quality of colour anaglyphs

FACIAL FEATURE EXTRACTION BASED ON THE SMALLEST UNIVALUE SEGMENT ASSIMILATING NUCLEUS (SUSAN) ALGORITHM. Mauricio Hess 1 Geovanni Martinez 2

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Recognition of Object Contours from Stereo Images: an Edge Combination Approach

A novel template matching method for human detection

Character Modeling IAT 343 Lab 6. Lanz Singbeil

What is Computer Vision?

Computer and Machine Vision

Simple, robust and accurate head-pose tracking using a single camera

A Qualitative Analysis of 3D Display Technology

Shape from Textures Phenomenon for 3D Modeling Applications with Big Data Images

Region Segmentation for Facial Image Compression

Computer and Machine Vision

COMP 102: Computers and Computing

Comment on Numerical shape from shading and occluding boundaries

Landmark Detection on 3D Face Scans by Facial Model Registration

Tecnologie per la ricostruzione di modelli 3D da immagini. Marco Callieri ISTI-CNR, Pisa, Italy

Three-Dimensional Viewing Hearn & Baker Chapter 7

Using Hierarchical Warp Stereo for Topography. Introduction

3-D Shape Reconstruction from Light Fields Using Voxel Back-Projection

Multi-View Stereo for Static and Dynamic Scenes

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

An Algorithm for Real-time Stereo Vision Implementation of Head Pose and Gaze Direction Measurement

Lecture 14: Computer Vision

An Abstraction Technique for Producing 3D Visual Contents

Mouse Pointer Tracking with Eyes

Realtime 3D Computer Graphics Virtual Reality

Computer Vision. CS664 Computer Vision. 1. Introduction. Course Requirements. Preparation. Applications. Applications of Computer Vision

Processing Framework Proposed by Marr. Image

Segmentation and Tracking of Partial Planar Templates

Robert Collins CSE486, Penn State Lecture 08: Introduction to Stereo

Facial Features Localisation

Announcements. Computer Vision I. Motion Field Equation. Revisiting the small motion assumption. Visual Tracking. CSE252A Lecture 19.

Abstract We present a system which automatically generates a 3D face model from a single frontal image of a face. Our system consists of two component

Human detection using local shape and nonredundant

SYNTHESIS OF 3D FACES

EE795: Computer Vision and Intelligent Systems

Think-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth?

Why study Computer Vision?

Optimal Grouping of Line Segments into Convex Sets 1

On the analysis of background subtraction techniques using Gaussian mixture models

Jump Stitch Metadata & Depth Maps Version 1.1

Face Recognition Using Long Haar-like Filters

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

3D Object Scanning to Support Computer-Aided Conceptual Design

Perceptual Quality Improvement of Stereoscopic Images

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Journal of Industrial Engineering Research

Multimedia Technology CHAPTER 4. Video and Animation

A deformable model driven method for handling clothes

Face View Synthesis Across Large Angles

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera

Color Image Segmentation Editor Based on the Integration of Edge-Linking, Region Labeling and Deformable Model

Dual-state Parametric Eye Tracking

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues

Face Detection System Based on Feature-Based Chrominance Colour Information

Binocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception

Visualization 2D-to-3D Photo Rendering for 3D Displays

Meticulously Detailed Eye Model and Its Application to Analysis of Facial Image

What is Computer Vision? Introduction. We all make mistakes. Why is this hard? What was happening. What do you see? Intro Computer Vision

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Human Upper Body Pose Estimation in Static Images

Hierarchical Matching Techiques for Automatic Image Mosaicing

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Car License Plate Detection Based on Line Segments

Line Drawing. Introduction to Computer Graphics Torsten Möller / Mike Phillips. Machiraju/Zhang/Möller

CSc I6716 Spring D Computer Vision. Introduction. Instructor: Zhigang Zhu City College of New York

Multi-view stereo. Many slides adapted from S. Seitz

MET71 COMPUTER AIDED DESIGN

Transcription:

University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2001 2D to pseudo-3d conversion of "head and shoulder" images using feature based parametric disparity maps Chaminda Weerasinghe Motorola Australian Research Centre Philip Ogunbona University of Wollongong, philipo@uow.edu.au Wanqing Li University of Wollongong, wanqing@uow.edu.au Publication Details Weerasinghe, C., Ogunbona, P. & Li, W. (2001). 2D to pseudo-3d conversion of "head and shoulder" images using feature based parametric disparity maps. IEEE International Conference on Image Processing (pp. 963-966). Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: research-pubs@uow.edu.au

2D to pseudo-3d conversion of "head and shoulder" images using feature based parametric disparity maps Abstract This paper presents a method of converting a 2D still photo containing the head & shoulders of a human (e.g. a passport photo) to pseudo-3d, so that the depth can be perceived via stereopsis. This technology has the potential to be included in self-serve photo booths and, also as an added accessory (i.e. software package) for digital still cameras and scanners. The basis of the algorithm is to exploit the ability of the human visual system in combining monoscopic and stereoscopic cues for depth perception. Common facial features are extracted from the 2D photograph, in order to create a parametric depth map that conforms to the available monoscopic depth cues. The original 2D photograph and the created depth map are used to generate left and right views for stereoscopic viewing. The algorithm is implemented in software, and promising results are obtained. Keywords conversion, feature, parametric, 3d, images, pseudo, 2d, disparity, shoulder, head, maps Disciplines Physical Sciences and Mathematics Publication Details Weerasinghe, C., Ogunbona, P. & Li, W. (2001). 2D to pseudo-3d conversion of "head and shoulder" images using feature based parametric disparity maps. IEEE International Conference on Image Processing (pp. 963-966). This conference paper is available at Research Online: http://ro.uow.edu.au/infopapers/2140

2D TO PSEUDO-3D CONVERSION OF HEAD AND SHOULDER IMAGES USING FEATURE BASED PARAMETRIC DISPARITY MAPS Chaminda Weerasinghe, Philip Ogunbona, Wanqing Li Motorola Australian Research Centre ABSTRACT This paper presents a method of converting a 2D still photo containing the head & shoulders of a human (e.g. a passport photo) to pseudo-3d, so that the depth can be perceived via stereopsis. This technology has the potential to be included in self-serve photo booths and, also as an added accessory (i.e. software package) for digital still cameras and scanners. The basis of the algorithm is to exploit the ability of the human visual system in combining monoscopic and stereoscopic cues for depth perception. Common facial features are extracted from the 2D photograph, in order to create a parametric depth map that conforms to the available monoscopic depth cues. The original 2D photograph and the created depth map are used to generate left and right views for stereoscopic viewing. The algorithm is implemented in software, and promising results are obtained. 1. INTRODUCTION Human vision is stereoscopic by nature. However, the majority of composed visual information, such as images and video content are presently conveyed in the twodimensional (2D) or monoscopic format. Monoscopic images lack realism due to the absence of depth cues. Artificially inserting depth cues in the forms of shading and perspective views is common practice especially in three-dimensional (3D) graphics arena. However, decades of research indicate that stereopsis is a more effective cue for depth perception. Generating left and right views based on an artificially created disparity map, for stereoscopic viewing is generally known as pseudo-3d. There are well-established methods for virtual view face image synthesis using 3D wire-frame [l] or 3D spring based face models [2]. However, these models require a generic 3D-face model to be deformed to match a set of known facial landmarks. Complex deformation of a 3D model is computationally intensive. Since the goal of pseudo-3d is limited to generating only 2 complementary synthesized views, use of a wire-frame with complex computations is unwarranted. A simple image based rendering technique using a lowcomplexity disparity map is adequate to generate the required views. It is shown in literature that a lowaccuracy disparity map has the ability to generate a visually acceptable complementary image of a given view, although it is difficult to generate an accurate disparity map from two given views [3]. The basis for this observation is the ability of the human visual system in combining monoscopic and stereoscopic cues for depth perception. Therefore, the depth does not have to be completely and accurately represented in the disparity map for visual purposes. However, it is also important not to produce conflicting depths from monoscopic cues [4] and stereoscopic cues (i.e. disparity). In this paper, a method of converting a 2D still photo containing the head & shoulders of a human (e.g. a passport photo) to pseudo-3d is presented. The algorithm is based on facial feature based low-complexity parametric disparity map. This method exploits the ability of the human vision in combining monoscopic and stereoscopic cues for depth perception. 2. PROPOSED METHOD Figure 1 illustrate the block diagram of the proposed method consisting of the following steps: 1. Locating the head and shoulders boundary of the subject to remove the existing background. 2. Parametric representation of head contour using multiple ellipsoids. 3. Locating main facial features (e.g. eyes, nose). 4. Parametric representation of eyes and nose shapes using simple geometry. 5. Facial feature based parametric depth map creation. 6. Estimation of left and right views, given the original photo of the subject, the parameterized depth map and any background image of choice. 7. Composition of the left and right views for stereoscopic viewing via an anaglyph. In order to simplify the background removal and facial feature extraction, the following assumptions are imposed: 1. The original (supplied) photo contains a human head and shoulders. 0-7803-6725-1/01/$10.00 02001 IEEE 963

- Supplied Image Image background removal Facial parameter estimation: Face contour, Eyes and Nose Fig. 2: Starting positions of search for snake control point initialization. 6 = l.s%xframeheight (1) Left and right view generation using a given background image Composed output Image for Fig. 1: Block diagram of the proposed 2D to pseudo-3d conversion system for head and shoulder photos. 2. The head orientation is such that the subject is looking directly towards the camera. 3. The photo should be taken with an uncluttered background. It is implicitly known from the scope of this paper that, only a single still photo per subject is available and no camera parameters or range data are available. 3. HEAD AND SHOULDER MASK Under the assumption of uncluttered background, the edge-image displays relatively low edge energy within the background. Therefore, a snake algorithm [5], which attracts to the edge energy, is chosen for the purpose of outlining the object mask of the head and shoulder images. The initial control points of the snake are assigned automatically. Each control point is moved inwards from the two vertical edges of the frame (Fig. 2) until an image edge is encountered. The separation of the initial points (6) can be fixed as a percentage of the image frame height, and is defined a prion. For the test images used in this work, 6 is defined as given in Equation 1. Iterative minimization of snake energy moves the control points towards the nearest edge contours. The search is performed via greedy algorithm [5]. Since the converged contours for this particular application are similar in shape, the snake-parameters (i.e. a, P,y, fdb) [5] can be fixed for the entire system. For the experiments in this work, a = 0.5, P = 0.85, y = 0.9 and fdb = 0.5 are used. Improvements to the algorithm in terms of robustness is possible using the knowledge of the approximate shape of the final contour, which can be incorporated into the snake model. It is also possible to implement the head and shoulder mask extraction stage to be interactive with an appropriate graphical user interface, which allows the user to adjust the final snake contour for improved accuracy. Converged snake control points are then connected with lines using linear interpolation. Smoothness of the resulting contour can be ensured by using more snake control points (i.e. small value for 6 ), however, this will impose a computation time penalty. The final object mask is created by assigning pixel value I,@ (= 255) for the pixels enclosed by the snake contour and the bottom edge of the image frame, and assigning I,, (= 0) to the remaining pixels. Although the head and shoulder mask do not have to match the shape of the head and shoulders exactly, a general shape conformity is assumed, as the parameter estimation for the depth map is based on this mask. The following section describes the method of constructing the parametric depth map and parameter estimation based on extracted facial features. 964

4. PARAMETRIC DEPTH MAP MODEL Where q = 2 is used in this work. I -Projection Fig. 3: Parameters of the facial feature based depth map model. The following parameters are used for the feature based parametric depth map creation: Maximum disparity of ellipsoids for face (d, ), ellipse center ( Cf ), ellipse 1 / ellipse 2 minor axis half-length (a ), ellipse 1 major axis half-length ( bl), ellipse 2 major axis half-length ( b,), height to the upper neck portion ( kr), maximum disparity of ellipsoids for both eyes (de), left eye ellipse center ( C, ), right eye ellipse center ( C, ), left I right minor axis half-length ( a, ), left / right major axis half-length ( b, ), maximum disparity of pyramid (3-sided) for nose (d, ), nose length ( h, ) and nose width ( w, ). Altogether there are 14 parameters assigned for the depth map. 4.1. Head and shoulders parameter estimation The maximum depth of the head is determined using the proportion of the area of the mask (A) compared to the total image frame area ( H XW ). Therefore, d, is given by Equation 2. rxa ra dh =- xw=- HxW H Where r =0.03 is a scaling factor. The head contour is represented using two ellipsoids. The ellipse parameters are automatically estimated using horizontal projection width information of the head and shoulder mask. Let erepresent the projection width at the ifh row, then a smooth contour (see Fig. 4) of the projection (PA) is obtained using Equation 3. Fig. 4: Smooth projection contour of the head and shoulder mask. hitially, the approximate neck position (Yh, ) is located using the projection width and its first derivative. The area above this point is searched for the maximum width of the head (Wh). Therefore Cf is computed using the mid-point of W,, and Yc,. If the width of the neck is Wneck, the required parameters are given by the following Equations. a = 0.5xw, (4) bl = YCf -ywl (5) 4.2. Eyes parameter estimation Using the head parameters determined from Section 4.1, and a face template proposed by Hara et al. [6], the initial ellipses for the eye positions are located. An iterative algorithm, which maximizes the edge energy within the ellipse, computes the best fitting ellipse position and size (C,, C,, a,, bc). Disparity of the eyes ( de ) is 25% of d,. 4.3. Nose parameter estimation The location of the nose is determined using the distance between the centers of the eyes, the face template and nose contour extraction algorithm proposed by Hara et al. w, and k, are directly computed using the contour span and the distance from Cf. Disparity of the nose (d, ) is 25 % 965

of d,. In an interactive package, these parameters can easily be computed, if the user can select the facial features interactively using a graphical user interface. 5. RESULTS AND DISCUSSION When the subject is having long hair, the algorithm failed to find the location of the neck. In this case, an additional mask was created by applying the snake energy minimization with higher feedback ( fdb = 1.1) [SI until a strong edge is encountered. The following left and right views are generated from each given head and shoulder photo captured by a digital camera, using the proposed 2D to pseudo-3d conversion method. Fig. 6: Case 3 - Head and shoulder image with subject having long hair: (a) Original photo; (b) Depth map on object mask 1; (c) Depth map on object mask 2; (d) Generated left view and (e) Generated right view. 6. CONCLUSIONS Fig. 4: Case 1 - A simple head and shoulder image: (a) Original photo; @) Disparity map overlaid on object mask; (c) Generated left view and (d) Generated right view. The acceptable quality of the results obtained using a simple depth map indicate that complex 3D face models are not required for conversion of 2D still photos in to pseudo-3d images. The proposed algorithm can easily be implemented in software as an interactive or noninteractive package. Use of an interactive graphical user interface will enable the constraints on image background to be released. 7. REFERENCES Fig. 5: Case 2.- Head and shoulder image with subject wearing spectacles: (a) Original photo; (b) Disparity map overlaid on object mask; (c) Generated left view and (d) Generated right view. [l] M.W. Lee and S. Ranganath, 3D deformable face model for pose determination and face synthesis, Proc. Int. Cor& on Image Analysis and Processing, pp. 260-265, 1999. 121 G.C. Feng et al., Virtual view face image synthesis using 3D spring-based face model from a single image, Proc. Int. Conf. on automatic face and gesture recognition, pp. 530-535, 2000. [3] G. Gimel farb, Experiments in probabilistic regularization of symmetric dynamic programming stereo, APRSlIEEE w/shop on stereo image and video processing, pp. 29-32,2000. [4] S.B. Xu, Qualitative depth from monoscopic cues, Int. Cor& on Image Proces. and its Applications, pp. 437-440.1992. [5] L. Ji and H. Yan, An Jntelligent and attractable snake model for contour extraction, Proc. ICASSP 99, pp. 3309-3312.1999. [6] F. Hara, K. Tanaka, H. Kobayashi and A. Tange, Automatic feature extraction of facial organs and contoui, IEEE Int. w/shop on robot and human communication, pp. 386-391,1997. 966