Lip Classification Using Fourier Descriptors

Similar documents
A New Technique of Extraction of Edge Detection Using Digital Image Processing

An Adaptive Threshold LBP Algorithm for Face Recognition

Pupil Localization Algorithm based on Hough Transform and Harris Corner Detection

Research on QR Code Image Pre-processing Algorithm under Complex Background

2 Proposed Methodology

AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article

Research on the Wood Cell Contour Extraction Method Based on Image Texture and Gray-scale Information.

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

Digital Image Processing Fundamentals

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Digital Image Processing

Application of partial differential equations in image processing. Xiaoke Cui 1, a *

Subpixel Corner Detection Using Spatial Moment 1)

Algorithm Optimization for the Edge Extraction of Thangka Images

242 KHEDR & AWAD, Mat. Sci. Res. India, Vol. 8(2), (2011), y 2

Facial Feature Extraction Based On FPD and GLCM Algorithms

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Forest Fire Smoke Recognition Based on Gray Bit Plane Technology

Temperature Calculation of Pellet Rotary Kiln Based on Texture

A Method of Face Detection Based On Improved YCBCR Skin Color Model Fan Jihui1, a, Wang Hongxing2, b

Image retrieval based on region shape similarity

CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2

A paralleled algorithm based on multimedia retrieval

Recognition of Human Body Movements Trajectory Based on the Three-dimensional Depth Data

Numerical Recognition in the Verification Process of Mechanical and Electronic Coal Mine Anemometer

AN ACCELERATED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

Defect Inspection of Liquid-Crystal-Display (LCD) Panels in Repetitive Pattern Images Using 2D Fourier Image Reconstruction

Fast Face Detection Assisted with Skin Color Detection

5th International Conference on Information Engineering for Mechanics and Materials (ICIMM 2015)

ROBUST IMAGE-BASED CRACK DETECTION IN CONCRETE STRUCTURE USING MULTI-SCALE ENHANCEMENT AND VISUAL FEATURES

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

Available online at AASRI Procedia 1 (2012 ) AASRI Conference on Computational Intelligence and Bioinformatics

Topic 6 Representation and Description

Image Processing, Analysis and Machine Vision

A Method of weld Edge Extraction in the X-ray Linear Diode Arrays. Real-time imaging

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS

Efficient Object Tracking Using K means and Radial Basis Function

Real-time Aerial Targets Detection Algorithm Based Background Subtraction

Genetic Fourier Descriptor for the Detection of Rotational Symmetry

The Prediction of Real estate Price Index based on Improved Neural Network Algorithm

Modeling Body Motion Posture Recognition Using 2D-Skeleton Angle Feature

An Adaptive Histogram Equalization Algorithm on the Image Gray Level Mapping *

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

The Population Density of Early Warning System Based On Video Image

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters

CoE4TN4 Image Processing

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Study on fabric density identification based on binary feature matrix

FAST RANDOMIZED ALGORITHM FOR CIRCLE DETECTION BY EFFICIENT SAMPLING

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Three-dimensional reconstruction of binocular stereo vision based on improved SURF algorithm and KD-Tree matching algorithm

A Kind of Fast Image Edge Detection Algorithm Based on Dynamic Threshold Value

Extracting Characters From Books Based On The OCR Technology

Inversion of Array Induction Logs and Its Application

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

EE640 FINAL PROJECT HEADS OR TAILS

North Asian International Research Journal of Sciences, Engineering & I.T.

Use of Shape Deformation to Seamlessly Stitch Historical Document Images

Background Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template

FACET SHIFT ALGORITHM BASED ON MINIMAL DISTANCE IN SIMPLIFICATION OF BUILDINGS WITH PARALLEL STRUCTURE

Study on Digitized Measuring Technique of Thrust Line for Rocket Nozzle

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

Keywords: Thresholding, Morphological operations, Image filtering, Adaptive histogram equalization, Ceramic tile.

result, it is very important to design a simulation system for dynamic laser scanning

COMPUTER AND ROBOT VISION

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Research on Evaluation Method of Product Style Semantics Based on Neural Network

QUANTITY INTELLIGENT RECKONING FOR PACKAGED GRANARY GRAIN BASED ON IMAGE PROCESSING

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation

Stripe Noise Removal from Remote Sensing Images Based on Stationary Wavelet Transform

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.

An Inspection Method of Rice Milling Degree Based on Machine Vision and Gray-Gradient Co-occurrence Matrix

Leaf Image Recognition Based on Wavelet and Fractal Dimension

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

A Real-time Detection for Traffic Surveillance Video Shaking

Algorithm research of 3D point cloud registration based on iterative closest point 1

Research on the Shape of Wheat Kernels Based on Fourier Describer

The design of medical image transfer function using multi-feature fusion and improved k-means clustering

Video annotation based on adaptive annular spatial partition scheme

High Resolution Remote Sensing Image Classification based on SVM and FCM Qin LI a, Wenxing BAO b, Xing LI c, Bin LI d

Texture Sensitive Image Inpainting after Object Morphing

Image Segmentation Based on Watershed and Edge Detection Techniques

Test Analysis of Serial Communication Extension in Mobile Nodes of Participatory Sensing System Xinqiang Tang 1, Huichun Peng 2

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

Fundamentals of Digital Image Processing

Infrared Image Stitching Based on Immune Memory Clonal Selection Algorithm

The Elimination Eyelash Iris Recognition Based on Local Median Frequency Gabor Filters

Study on Gear Chamfering Method based on Vision Measurement

A Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar

Applying Catastrophe Theory to Image Segmentation

Segmentation of Mushroom and Cap Width Measurement Using Modified K-Means Clustering Algorithm

Texture Segmentation by Windowed Projection

Meshing of flow and heat transfer problems

Transcription:

Lip Classification Using Fourier Descriptors Zeliang Zhang, Shaocheng Song*, Ming Ma, Qian Xu Beihua University, Institute of Information Technology and Media, Jilin, Jilin, 132013, China Abstract The ification of human lip shapes is very important for lip reading recognition. It provides a definite base in order to be sure of each status during lip reading recognition. This paper presents a method to ify lip shapes based on Fourier Descriptor. AdaBoost ification is used to localize the lip region. Firstly, lip contour is extracted using Canny s edge detection. Then, in order to find the best features of lip s shape, we use Fourier Descriptor to extract important features. Finally, we ify input lip features into one of the lip categories. Experimental results show that the system accuracy rate is 85%. Therefore, this system, which using Fourier Descriptor features, presents good results and advantages for lip ification. Keywords - Fourier Edge Descriptor; lip detection; lip ification; canny operator I. INTRODUCTION The ification of the lip shapes is very important for lip reading recognition. It provides a definite base in order to be sure of each status during lip reading recognition. At present, there are not many domestic and foreign studies on lip ification. In 1968, Fisher proposed the concept of Viseme to find the minimum division unit of voice in the visual sense. Some scholars put forward that image sequence may be ified directly. For instance, Professor Yao Hongxun of Harbin Institute of Technology proposed a clustering method to ify image sequence [1]; other scholars finished lip ification by using feature vector sequence to match hidden Marov model states directly [2]. This paper presents a method to ify lip shapes based on Fourier Descriptor and realizes a lip ification system, as shown in Fig. 1. II. FOURIER LIP CLASSIFICATION PRETREATMENT Before lip ification, we need to do some pretreatments, which mainly include face detection and lip localization. Face detection has obtained rather good development in recent years. Face detection accuracy is mainly influenced by position, structure, expression and influence of shelter and direction. Detection methods include position-based method, feature invariance method, template matching method and statistical model method [3]. Since face detection is mainly the inspection using geometry texture information of the face, which does not tae into account of image color difference, we must mae some process to the input color image to avoid misjudgment of the face detection. We adopted Gaussian filter and histogram equalization to smooth the images, which could not only mae the image loo suppler, but also remove noise in the image [4]. Figure 1. System flow chart When we input the images, single color image is converted to gray image via color space conversion and noise in the image is removed by Gaussian filter [5]. Gaussian smooth mathematical equation is as Equation (1): 2 2 1 x y G( x, y) exp[ ] 2 2 2 2 Gaussian filter is a smoothing filter and smoothing procedure is controlled by Gaussian distribution standard. The greater of the value, the higher of the degree of smoothing. When the mean value of Gaussian distribution equals 0 and is 1, we can get the following Gaussian distribution chart as Fig. 2. (1) DOI 10.5013/ IJSSST.a.17.25.33 33.1 ISSN: 1473-804x online, 1473-8031 print

the frame selected region after face detection, as in reference [7]. III. FOURIER EDGE DESCRIPTOR Fourier edge descriptor is a method for describing the edge. A series of Fourier coefficients is used to represent the shape feature of closed curve [8], it is only suitable for single closed curve, not composite closed curve, as shown in Fig. 4. Figure 2. Gaussian distribution chart The use of histogram equalization maes the overall image distribution relatively uniform, not too bright or too dar. The aim of histogram equalization is to use probability to mae the image pixel values uniformly distributed in a certain range, histogram equalization is as Equation (2): S n j 0 n (2) Wherein, S is the gray value after equalization, n total is the total pixel value, n j is the current total amount of pixels, 0,1,2,..., L 1, L is the total amount of gray images, usually 256. Fig. 3 is the setch map of lip image after histogram equalization. As can be seen from the figure, the overall contrast of the image increased a lot and the whole image is also brighter. total Histogram equalization Figure 4. Single closed curve and composite closed curve. Imaginary axis y y 0 y 1 x 1 x 0 Real number axis Figure 5. A digital boundary and its complex sequence indicates. Fig. 5 shows the coordinates of the N points on the xy plane, It begins at any point ( x, y 0 0 ) of the coordinates, and while scanning the border, moves along the curve in a certain direction, for example in a clocwise direction, passes the coordinates ( x 0, y0 ), ( x1, y1), ( x2, y2 ),..., ( x n 1, yn 1), then moves bac to the original position. This movement is cyclical. x( x和 y( y can be used to express these coordinates, thus these edges can be represented as coordinates sequence s(=[x(, y(],=0,1,2,,n-1, and each coordinate point can be treated as a complex number, as in Equation (3). s( x( y(, 0,1,2,..., N 1 (3) Figure 3. Histogram equalization After the above pretreatment, we can detect the face. Detection method is based on AdaBoost algorithm and the Haar-Lie rectangle feature [6]. Meanwhile this method can be applied to lip detection. The only difference is that the image used in lip detection is the lower part of the image of In Equation (3), the X sequence is the real part of the complex coordinates and Y sequence is the imaginary part of the complex coordinates. The discrete Fourier transformation of s ( can be expressed as Equation (4). 1 a( u) N N 1 0 s( exp j 2 u / N, u 0,1,2,..., N 1; 1 (4) DOI 10.5013/ IJSSST.a.17.25.33 33.2 ISSN: 1473-804x online, 1473-8031 print

The complex coefficient a (u) get from Equation (4) is nown as Fourier Edge descriptor. The formula for the inverse Fourier transforms a (u) into s( is as follows: N 1 ( ) s u 0 a( u)exp j 2 u / N, 0,1,2,..., N 1; 1 Assume that when we mae inverse transformation, without using the entire coefficients, we merely use the first M coefficients, that is to say, in Equation (5), when u M 1, and mae a ( u) 0, the result of the formula is as Equation (6): M 1 2 / ˆ( ) u N s a( u)exp, 0,1,2,..., N 1; 1 (6) u0 Although we only use the first M coefficients to reconstruct each component of s ˆ(, K is still from 0 to N- 1. Therefore, when we redraw coordinate points, the same coordinate points exist. After the Fourier conversion, highfrequency components represent the details of the object, while the low-frequency components represent the overall shape of the object, so when we use less M coefficients to reconstruct the components of s ˆ(, the edge loses more details. Tae a square as an example. We use different M coefficients to reconstruct the square, when the value of M is small (M<8), the reconstructed shape is similar to a circle, and when the value of M is large (M>48), the reconstructed shape is close to a square. By using Fourier descriptor we can use only a small amount of coefficients to get a general edge shape, but if angles or more accurate edge shape is needed, more higher order terms must be used. IV. THE RREALIZATION OF LIP CLASSIFICATION Fig. 6 is a more refined flow chart of lip ification system. The input is a rectangular area contains lips and the output is the ified result. (5) The lips we get from figure 1 can not be used as features, because of the limited position and size information the lips contain. We need more information, especially the shape of lips. We can get the edge information after the processes of gray-scale equalization, canny operator and binaryzation. The edge information is quite messy, morphology is used to remove the noise and segment independent image or combine adjacent elements in the image. After that the contour is extracted. A too thic contour line is not needed, which will disturb the obtainment of the coordinate point. Therefore in order to ensure maximum pixel connectivity between eight directions is not more than three, refining is needed. Then obtain the leftmost point position on the image, starting from this position, with eight connected way, get all coordinate points clocwise, as shown in Fig. 7. a. Original image b. Red point as the hitting-point c. Detection between directions, the figures is the order of detection d. The point becomes red after be detected e. Detection between eight directions from the detected points f. Continue to turn red the detected points Figure 7. Obtaining lip contour coordinate points After collecting the ordered coordinate points, the Fourier transformation starts. The numerical obtained from the transformation are called Fourier Descriptors, which are used as features of lip ification. Figure 6. Lip ification flow chart. DOI 10.5013/ IJSSST.a.17.25.33 33.3 ISSN: 1473-804x online, 1473-8031 print

testing results into 7 es: the first (normal), the second (closed), the third (slightly open), the fourth (open not grinning), the fifth (half open with lower ), the sixth (open with lower ) and the seventh (pout). After the lip contour is extracted from the image, the next phase is the training movement by Fourier Descriptor [11]. In the artificial neural networ, the input and output are nown. The main tas is training the weight value. When the weight value is determined, lip ifier is completed and can be applied to other unified images. We use the amount 10, 20, 30 Fourier descriptors for ification and comparison [12]. Table 1 is ification result of training samples taing 10, 20, 30 Fourier descriptors. Table 2, 3 and 4 are the ification results of training samples taing 10, 20, 30 Fourier descriptors by using different ifiers [13]. Figure 8. Lip categories. The ifier used is artificial neural networ (ANN) [9]. ANN is one of the best ifier at present. It is slow in training phase, because it uses gradient descent method to adjust the weights between the neurons to minimize the error, but it is rapid in the phase of ification or recognition. In the experiment, the artificial neural networ used belongs to the bac propagation neural networ, which consists of 2 hidden layers and each layer has 100 nodes. The input layer nodes are decided by the lip feature dimension, and output layer consists of 7 nodes. V. EXPERIMENT RESULTS AND ANALYSIS In this experiment, 200 images are chosen, including 160 AVSP images [10] and 40 captured images by the author of this paper. The lips on these images are divided into seven categories: normal lip, closed lip, slightly open lip, open not grinning lip, half open with lower lip, open with lower lip and pout lip. In training phase the 160 AVSP images are used. The changes of the lips are extracted and trained to be the ifier. In the ification phase, 40 captured images are used. We implemented lip ification tests on the 40 images by ifier, and divided the final TABLE 1. THE RESULTS (TRAINING PHASE) TO A FOURIER DESCRIPTORS 10,20,30 CLASSIFICATION A normal 38 0 0 0 0 0 0 B closed 0 19 0 0 0 0 0 C slightly open 0 0 19 0 0 0 0 D open not grinning 0 0 0 19 0 0 0 lower 0 0 0 0 19 0 0 0 0 0 0 0 27 0 G pout 0 0 0 0 0 0 19 TABLE 2. THE TEST RESULTS (CLASSIFICATION PHASE) TO A FOURIER DESCRIPTORS 10 CLASSIFICATION A normal 8 0 0 0 0 0 0 lower 2 0 0 0 0 3 1 In Table 2, we can see that the misjudgment rate reached 50% in the open lower category. Through careful analysis we find the main reason. In the morphological process, due to the edges on both sides are thin, when the edge is removed, only the upper and lower lips are left, therefore, this category may be mistaen for normal lip and pout lip. In Table 3, we can also see that the open with lower category exists ification error. After the analysis, we find that this is because the error of morphology. However, Fourier descriptors increase from 10 to 20, which maes the parameters of neurons in the networ be corrected to some extent, so the misjudgment is less than that in table 2. DOI 10.5013/ IJSSST.a.17.25.33 33.4 ISSN: 1473-804x online, 1473-8031 print

TABLE 3. THE TEST RESULTS (CLASSIFICATION PHASE) TO A FOURIER DESCRIPTORS 20 CLASSIFICATION A normal 8 0 0 0 0 0 0 lower 0 0 0 0 0 4 2 TABLE 4. THE TEST RESULTS (CLASSIFICATION PHASE) TO A FOURIER DESCRIPTORS 30 CLASSIFICATION A normal 7 0 0 0 1 0 0 lower 4 0 0 0 1 1 0 In Table 4, there is one normal lip category be misidentified to the half open with lower category. It is not a serious problem; the most serious problem is that more open with lower teeth lip category be misidentified to be normal lip category. After careful analysis, we can find that this is the sample problem. Chins in some samples were cut off, thus in the morphological process, the lower lips disappeared. As a result these samples can only be ified to normal lip category. The experimental results showed that the accuracy rate of 10, 20, 30 Fourier descriptors for ification were: 92.5%, 95% and 85%. Open with lower teeth lip category exposed a lot of problems. The reason is that the images underwent the same treatment procedure, which ignored the fact that some images required different treatments. Thus some necessary information of the open with lower teeth lip category was removed before recognition. In the study, AVSP image trained ifier is used to the images captured, the accuracy rate is not high, but the accuracy rate of 95% is fairly good, which proves that the application of Fourier descriptor to lip ification is feasible. VI. CONCLUSIONS This paper presents a lip ification method based on Fourier descriptor. The input images undergo face detection and lip localization, then the spatial domain is converted into frequency domain by using Fourier descriptor, finally lip ifier is established by using artificial neural networ. The accuracy rate of more than 30 Fourier descriptor is generally low, therefore it is not discussed here. When Fourier descriptor is 20, the accuracy rate is better, but the data amount of training and ification still need to be increased. For future studies, put multiple lip ification results together by combining time sequence and form meaningful lip reading action will be needed. ACKNOWLEDGMENTS This wor was financially supported by Jilin Scientific and Technological Development Program (20140101185JC, 20140101206JC-16), Jilin City Scientific and Technological Development Program (201464048), Scientific and Technological Research Program of Jilin Educational Committee during the 12th Five-Year Plan (2015-149, 2015-137) and National Social Science Foundation (15BTQ02). The authors wish to than my colleagues, they are Sun Yan, Bi Xinwen. REFERENCES [1] Chai Xiujuan, Yao Hongxun, Gao Wen, Basic Lip Classification in Lip-reading Recognition, Computer Science, vol. 29, pp. 130-133, 2002. [2] Zeliang Zhang, Wenliang Qu, Review of the Lip-reading Recognition, ICSESS 2014, pp. 593-596, 2014. [3] Tomoai Yoshinaga, Satoshi Tamura, Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images, EURASIP Journal on Audio, Speech, and Music Processing, 2007. [4] Zhang Zeliang, Li Xiongfei, Yang Chengjia, An effective parameter estimation algorithm of the visual language features, International Journal of Digital Content Technology and its Applications, vol. 6, pp. 69-76, 2012. [5] Xu GuoQing, Mu ZhiChun, Fourier shape descriptor based on multi-level triangular area functions, Journal of University of Science and Technology Beijing, vol. 35, pp. 1201-1206, 2013. [6] Bear, Helen L., et al, Which phoneme-to-viseme maps best improve visual-only computer lip-reading, Advances in Visual Computing: Proceedings, International Symposium, vol. 8888, pp. 230-239, 2014. [7] F. J. Huang, T. Chen, Real-Time Lip-Synch Face Animation driven by human voice, IEEE Second Worshop on Multimedia Signal Processing, pp. 352-357, 1998. [8] Zeliang Zhang, Fumei Liu, Review of the Visual Feature Extraction Research, ICSESS 2014, pp. 449-452, 2014. [9] Kavousi-Fard, Abdollah, A new fuzzy-based feature selection and hybrid TLA-ANN modelling for short-term load forecasting, Journal of Experimental and Theoretical Artificial Intelligence, vol. 25, pp. 543-557, 2013. [10] Zhang Zeliang, Li Xiongfei, Yang Chengjia, Visual Speech Recognition based on Improved type of Hidden Marov Model, Journal of Convergence Information Technology, vol. 7, pp. 119-126, 2012. [11] Keetels, M, M. Pecoraro, J. Vroomen, Recalibration of auditory phonemes by lipread speech is ear-specific, Cognition, vol. 141, pp. 121-126, 2015. [12] Strand, J., Cooperman, A., Rowe, Individual differences in susceptibility to the mcgur effect: lins with lipreading and detecting audiovisual incongruity, Journal of Speech Language & Hearing Research, vol. 57, pp. 2322-2331, 2014. [13] MZ Ibrahim,DJ Mulvaney, Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping, Journal of Visual Communication & Image Representation, vol. 30, pp. 219-233, 2015. DOI 10.5013/ IJSSST.a.17.25.33 33.5 ISSN: 1473-804x online, 1473-8031 print