Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect

Size: px
Start display at page:

Download "Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect"

Transcription

1 Journal of Computational Information Systems 11: 7 (2015) Available at Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect Jianrong WANG 1, Yongchun GAO 1, Ju ZHANG 1, Jianguo WEI 2,, Jianwu DANG 1 1 School of Computer Science and Technology, Tianjin University, Tianjin , China 2 School of Computer Software, Tianjin University, Tianjin , China Abstract Lipreading plays an important role in understanding the fluently spoken speech for hearing-impaired people, and the majority studies of lipreading assume the frontal images of the speaker s face, which is easily affected by the variations of each speaker s lip size and illumination. This paper concentrates on the efforts of the 3D data captured by the Kinect to the robustness of the lipreading system. In order to supplement the information of frontal lip, left profile lips and right profile lips are rebuilt by these 3D coordinates. As a result, the experiment adopts the feature integrating with the profile lips yield superior performance over the one with visual-only feature, and the recognition rate relatively increase by 7%. Keywords: Lipreading; 3D Data; Kinect; Profile Lip 1 Introduction Lipreading improves the robustness of speech recognition by means of establishing and analysising the parameters of mouth movement, or directly using the sequence of image to classification and identification. The lipreading system using image sequence of the speakers lips has recently attracted significant interest, and a great deal of progress has been achieved [1, 2, 11]. In the study of the lipreading, it gives emphasis to lip detection and feature extraction. The feature extraction is a crucial part for a lipreading system, and various visual features have been proposed in the literature. In general, they can be categorized into three kinds: 1. pixel based, where the entire image containing the speaker s lip is considered as informative; 2. lip contour based, in which lip contour model is obtained as the visual feature; and 3. the combination of 1 and 2. Among these approaches, the one based on pixels is assumed to be the most efficient [12, 13, 16]. However, some difference may occur when collecting the data, such as, different sizes of each speaker s lip, local and global changes in illumination and the variations in head pose, in addition, the poor mouth ROI localisation when lip detection may present. These differences can significantly degrade the performance of lipreading system. Project part of the National Natural Science Foundation (surface project No , National Key Basic Research Program No. 2013CB and key projects No ). Corresponding author. address: jianguo.fr@gmail.com (Jianguo WEI) / Copyright 2015 Binary Information Press DOI: /jcis13691 April 1, 2015

2 2430 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) To alleviate those above problems, few dataset and experimental results have been published by utilizing some sort of 3D information from the speaker s face. For example, [10] developed a lip tracking system that allows the speaker s head to move in 3D and rotate up to 30 degrees away from the camera. In [19], they used three-dimensional characteristics for word recognition, and the result indicated that the recognition rate for three-dimensional characteristics was higher than that for two-dimensional characteristics. And the in-car spanish database AV@CAR was captured from six different angles in order to reconstruct a 3D textured mesh of the speaker s face [14]. Recently, with the development of the MS Kinect, whose sensor is supported by SDK which providing tools for real-time face tracking, and predefining the face with 121 3D coordinate points. As a result, some scholars concentrated their attention on the multi-model AVSR system, the University of Texas utilized their own recorded BAVCD database [4], built a multi-modal AVSR system investigating the use of the facial depth information [5, 6]. A Turkish university employed the angles computed by the 3D coordinate points as the feature, and KNN classifier was used to classify the words [22]. However, most researchs in lipreading were confined to frontal face, no matter with the visual feature or the 3D data, but in the real world, it is hard for everyone to keep frontal view all along. Consequently, some work gave emphasis to non-frontal video data for AVASR [23, 24], and the results demonstrated that useful speech information can be gained from non-frontal visual features, the profile lip feature even yield superior results in [8]. The main purpose of this paper is to lead to a new lipreading system integrating the speech information extracted from the visual data with the profile lips, which rebuilt by the 3D data. This constituents the first attempt in lipreading system using this novel feature. In this paper, a Chinese audio-visual corpus with 3D data was collected, and a projection technique using 3D coordinates to locate lip was introduced. Considering the changes of the speaker s head pose may make different information contained in different profile lips, the main contribute of this paper is to rebuild two sides of profile lips based on 3D coordinate which captured by the Kinect to supplement the information of frontal lip. Following those work, the remainder of this paper is organized as follows: Section 2 introduce the new lipreading system, which include locating the lip with 3D projection, rebuilding the profile lips and integrating them with visual information. The result will be presented in Section 3. Finally, Section 4 concludes this work. 2 The Lipreading System This part consists of the lip location completed by 3D projection, rebuilding of the profile lips, feature extraction applying to visual data as well as 3D data, and the model training and testing on the feature which integrating visual feature with profile lip feature. These will be discussed in more detail in the following subsection. The lipreading system overview is depicted in Fig Lip location by 3D projection Before feature extraction, the primary task is lip location, this paper adopts 3D projection rather than traditional method based on image processing [13, 20]. The 3D projection utilize the 3D data captured by the Kinect and imaging principle, which is shown in Fig. 2, to estimate the coordinate of the center pixel of the lip, then take this center as the midpoint, extend circumferentially to

3 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) Fig. 1: Overview of the lipreading system integrating the visual data with 3D data obtain the lip portion containing the 32*32 pixels area of the lip. Fig. 2: The schematic of Kinect imaging principle The schematic of Kinect imaging principle defines three coordinates. x1o1y1 is the camera coordinates, x2o2y2 represents imaging plane coordinates, and upv stands for the image coordinates. It is clearly to see that the center of the camera coordinates in the same straight line with that of the imaging plane coordinates. Assuming the distance between them is m1, the coordinate of o2 will be (0, 0, m1). Let the horizontal angle be α, the vertical angle be β. If m1 is known, the real length and width of the imaging plane respectively is Length : L = 2 m1 tan(α/2) (1) W idth : S = 2 m1 tan(β/2) (2) Suppose that there is a point in the space, its coordinate is (x1, y1, z1), its imaging point in the imaging plane is a, whose coordinate is (x2, y2, z2), and z2 = m1. As they are in the same plane, the following formula can be obtained as a result of similar triangles. x1 x2 = y1 y2 = z1 z2 (3)

4 2432 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) In addition, assume the pixel coordinate of a in the imaging plane is (x3, y3, z3) with o2 is the center, if the resolution of the image is m*n, the formulas listed below can be obtained according to the fact that the pixel is proportional to the imaging plane. x2 x3 = L m (4) y2 y3 = s n (5) Since the horizontal angel of Kinect is 57, the vertical angel is 43, together with the principle of coordinate transformation in the Kinect SDK, the real pixel coordinate of a in the picture is (320 + x3, y3). A combination of those formulas leads to the result that m1 is the only variant, which is the distance between imaging plane with the camera. This paper take m1 as 0.7m, then transform these color image into grayscale. The lips obtained from above steps are listed in Fig. 3. Fig. 3: The gray image of lip area for all the speakers 2.2 Profile lips rebuilt by 3D data Kinect is supported by Face Tracking SDK which predefines 121 lip points with 3D coordinates,18 of them represent lip, and each lip point is assigned an integer ID value to identify them. Take the right profile lip rebuilding for example. With the purpose to locate lip region and identify the graphics border, the first step is to generate the grid map of right lip. In case of the interference of left lip, here only choose 11 3D coordinate points corresponding to the right lip. Fig. 4(a) is the right lip contour plotted by these 11 3D points, which only give the information about the two-dimensional plane of the lip. Then interpolation the lip contour to a grid map according to the corresponding relationship between z-axis and x-axis, y-axis, as exhibit in Fig. 4(b); The second step is filling the grid map with color. Fig. 4(c) shows clearly that the color shading has corresponded to z-axis, color is deepened with the distance goes closer; The final step is projection and rotation, what we have obtained so far is solely the right lip from front view. Projection by changing perspective is a necessity to generate the profile lip, which is the right profile lip from right side view of the speaker. Due to the view of the profile lip is downward, it need to rotate 90 degree to get the right profile lip from normal view. Finally, save the rebuilding profile lip into a 60*60 pixels picture in BMP format. Figure 3 provides a flow chart of these steps. Since the right lip and the left lip of each speaker contain different information, this paper rebuild the right profile lips and the left profile lips following the same procedures as in flowchart. Fig. 4 present the right profile lip and the left profile lip rebuilt from 3D data of the same frame.

5 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) Fig. 4: The flowchart of rebuilding the profile lip from 3D data Fig. 5: The left profile lip(a) and the right profile lip(b) rebuilt from the 3D data of the same frame 2.3 Feature extraction After obtaining the gray level image of ROI and the profile lips rebuilt form previous processing, the next step is transforming these image information into feature vector to capture the speech information. This paper apply the same method to feature extraction for ROI image as well as the profile lips image, the task is illustrated in Fig. 5. Fig. 6: The block diagram of feature extraction Motivated by some previous work [13, 17], this paper choose DCT transform, it involve onedimensional DCT two-dimensional DCT and block-based DCT. This paper apply two-dimensional DCT as they work similarly for lip reading task [19], using Zig-Zag method to draw the DCT matrix into a 1*1024 row vector. However, before applying DCT to the profile lip rebuilt by the 3D data, the image needs to be compressed into 32*32 pixels, on account of DCT allows fast implementations when the coefficients are powers of 2 [13, 18]. In case of dimensions disaster, this paper apply PCA to reduce dimensions in view of its excellent ability for information compression. This combination is assumed to take the advantages of these

6 2434 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) two transforms. DCT is preferable to differentiate frequencies while PCA is beneficial to select the most important components [21]. This combination method outperforms the traditional Zig-Zag [7]. This paper project the features down to 52. Normalization is necessary for the purpose to improve the robustness of features. This paper apply feature mean normalization (FMN) by simply subtracting the feature mean computed over the entire utterance length T. x i = x i T x i i=1 T, i = 1, 2,..., T (6) Where i is the time frame, T is the total number of frames in one word, x t is the vector of visual feature. In order to get the information represent the lip movement, take J as the window length, H as the overlap, concatenating the J-frame feature within a window, which is similar to the windowing of audio signal processing, to get the lip dynamic information. C T t = [x ( t [J/2]) T,..., x T t,..., x ( t + [J/2] + 1) T ] (7) Where x i is the feature vector of the Jth frame.this paper take J = 3, H = 1. Fig. 7: The schematic of windowing for visual feature 3 Experiments and Results This paper perform speech recognition on the basic of isolated Chinese words, the baseline experiment implemented solely with one single feature (i.e., the visual-only feature and each side of the profile lips). Then compare the visual-only lipreading with the one fusioned with 3D data, and compare the lipreading with a single side of profile lip with the one with both sides of the profile lips. The HTK toolkit is utilized for both system training and testing [25], implementing three-state phoneme HMMs with a mixture of two Gaussians per state. As this paper conduct windowing to the integrated feature, which make the dimensions increase and easily cause dimensions disaster, PCA apply to reduce the dimensionality of the integrated feature consequently. When integrating the right profile lip with the left profile lip (this paper call it RL ), their dimension are all 39. While integrate the visual feature with the one rebuilt by the 3D data, the visual feature dimension is 52, and the 3D feature dimension is 26. In addition to these, the dimension of all integrate feature are reduced to 78 after PCA. 3.1 Database To allow experiments on Chinese audio visual corpus with 3D data, a suitable database was collected in the recording studio of computer and technology department at TianJin University,

7 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) which provides clean acoustics and controlled illumination. Every speaker distanced camera 0.9m with a solid blue background. It consists of audio and full-face frontal video with 3D data of 10 speakers, with an equal number of men and women. 40 Chinese words are complied to guarantee the phoneme balance, and each of them is pronounced by each speaker 10 times. The device employed in capturing the data is the Microsoft Kinect, which is a novel device. The Kinect utilizes 4 microphones to capture audio, in conjunction with a color camera to get the color video images, the 3D data is collected by a laser and an IR camera. This guarantee the Kinect can capture the audio, visual and 3D data at the same time. Among them, the audio is two-tracks of 16-bit, 44.1 khz, PCM format, the color video are 640*480 pixel, 24-bit RGB at 30 fps. Corresponding to each frame of an color image, Kinect yield 3D data whose full format is shown in Fig. 7, which provide 121 3D points to describe the face contour, the first and second column are the timestamp of the 3D data, the third column is the ID number to these 121 3D data, the next three columns are respectively the x-axis, y-axis and z-axis of the points. Fig. 8: The format of the 3D coordinate data captured by the Kinect 3.2 Experimental results The experiment results are given in the following tables, Table 1 list the baseline results, Table 2 for the integrated feature experiment, including left profile fusion with the right profile lip, the visual feature integrate respectively with the left profile lip, right profile lip and RL. Table 1: Word recognition accuracy based on visual-only and profile lip rebuilded by 3D data Feature Recognition accuracy Visual-only Left profile Right profile Table 2: Word recognition accuracy on integrated feature Integrated feature Integrated feature before PCA Integrated after PCA left profile lip+right profile lip(rl) Visual+left profile lip Visual+right profile lip Visual+RL feature Checking the database, it is easily to find that some speaker s head turn left slightly in some words, owing to the fact that it is hard to keep frontal view continuously, this make the result

8 2436 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) that the rebuilding right profile lip outperforming the left profile lip as show in Table 1. This is consistent with the result in [3, 9, 15], that the recognition accuracy degrades as the speakers head pose deviates from the frontal pose. Table 2 presents the efforts of using the 3D data integrated with the visual data, as well as the beneficial effects of feature transformation adopting PCA. It is obviously that the integrated feature dose improve the recognition accuracy compareing with the one using signal feature. Furthermore, the PCA significantly improve the lipreading accuracy. The feature fusioning the left profile lip with the right profile lip outperform the one with signal profile lip reflects that the information contained in one single profile lip is limited, which can not be well represent the side lip information of the speaker. It can also be noted that the feature integrate the visual with the 3D data obtains better performance than the visual-only, this demonstrates that the 3D data provides great efforts to the robustness of lipreading. 4 Conclusion This paper explores a new lipreading system which adopt 3D data captured by Kinect, integrating the profile lips rebuilt by the 3D data with visual feature to improve the performance of the traditional lipreading. In addition, this paper employ 3D projection to locate the lip, and apply the same method to extract the visual feature and 3D data. Finally, the result reveal that the 3D data dose improve the robustness of the visual-only lipreading. The result also indicates that the left profile lips rebuilt by the 3D data outperform the right one, and the result exactly consistent with the conclusion in [3, 9, 15], that the performance will be degrade as the head pose deviates from the frontal view. However, most work have neglected the efforts of 3D data to the profile lip lipreading, and there is no database to allow these work, so the future work of this paper is to build a database involving 3D data as well as audio and visual data, to explore whether the 3D data provide sufficient information to improve the robustness of multi-pose view lipreading. Acknowledgement The research was supported by part of the National Natural Science Foundation (surface project No , National Key Basic Research Program No. 2013CB and key projects No ). References [1] C. Bregler and Y. Konig. eigenlips for robust speech recognition. In Acoustics, Speech, and Signal Processing, ICASSP-94., 1994 IEEE International Conference on, vol. 2, pp. II-669. IEEE, [2] G. I. Chiou and J.-N. Hwang. Lipreading from color video. Image Processing, IEEE Transactions on, 6 (8): , 1997.

9 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) [3] V. Estellers and J.-P. Thiran. Multipose audio-visual speech recognition. In EUSIPCO Proceedings, number EPFL-CONF , [4] G. Galatas, G. Potamianos, D. I. Kosmopoulos, C. McMurrough, and F. Makedon. Bilingual corpus for avasr using multiple sensors and depth information. In AVSP, pp , [5] G. Galatas, G. Potamianos, and F. Makedon. Audio-visual speech recognition incorporating facial depth information captured by the kinect. In Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp IEEE, [6] G. Galatas, G. Potamianos, and F. Makedon. Audio-visual speech recognition using depth information from the kinect in noisy video conditions. In Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 2. ACM, [7] X. Hong, H. Yao, Y. Wan, and R. Chen. A pca based visual dct feature extraction method for lipreading. In Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 06. International Conference on, pp IEEE, [8] K. Kumar, T. Chen, and R. M. Stern. Profile view lip reading. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on, vol. 4, pp. IV-429. IEEE, [9] K. Kumatani and R. Stiefelhagen. State synchronous modeling on phone boundary for audio visual speech recognition and application to muti-view face images. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on, vol. 4, pp. IV-417. IEEE, [10] G. Loy, E.-J. Holden, and R. Owens. 3d head tracker for an automatic lipreading system. In Proc. Australian Conf. on Robotics and Automation (ACRA2000), [11] J. Luettin, N. A. Thacker, and S. W. Beet. Speechreading using shape and intensity information. In Spoken Language, ICSLP 96. Proceedings., Fourth International Conference on, vol. 1, pp IEEE, [12] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey. Extraction of visual features for lipreading. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24 (2): , [13] C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, and J. Zhou. Audio-visual speech recognition. In Final Workshop 2000 Report, vol. 764, [14] A. Ortega, F. Sukno, E. Lleida, A. F. Frangi, A. Miguel, L. Buera, and E. Zacur. Av@ car: A spanish multichannel multimodal corpus for in-vehicle automatic audio-visual speech recognition. In LREC, [15] A. Pass, J. Zhang, and D. Stewart. An investigation into features for multi-view lipreading. In Image Processing (ICIP), th IEEE International Conference on, pp IEEE, [16] G. Potamianos, H. P. Graf, and E. Cosatto. An image transform approach for hmm based automatic lipreading. In Image Processing, ICIP 98. Proceedings International Conference on, pp IEEE, [17] G. Potamianos, C. Neti, G. Iyengar, A. W. Senior, and A. Verma. A cascade visual front end for speaker independent automatic speechreading. International Journal of Speech Technology, 4 (3-4): , [18] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical recipes in c: the art of scientific computing, Cité en, pp. 92, [19] K. Uda, N. Tagawa, A. Minagawa, and T. Moriya. Effectiveness evaluation of word characteristics obtained from 3d image information for lipreading. In Image Analysis and Processing, Proceedings. 11th International Conference on, pp IEEE, 2001.

10 2438 J. Wang et al. /Journal of Computational Information Systems 11: 7 (2015) [20] S. Werda, W. Mahdi, and A. B. Hamadou. Lip localization and viseme classification for visual speech recognition. arxiv preprint arxiv: , [21] Q. YANG and X. CHEN. An improved grid search algorithm and its application in pca and svm based face recognition. Journal of Computational Information Systems, 10 (3): , [22] A. Yargic and M. Dogan. A lip reading application on ms kinect camera. In Innovations in Intelligent Systems and Applications (INISTA), 2013 IEEE International Symposium on, pp IEEE, [23] T. Yoshinaga, S. Tamura, K. Iwano, and S. Furui. Audio-visual speech recognition using lip movement extracted from side-face images. In AVSP 2003-International Conference on Audio-Visual Speech Processing, [24] T. Yoshinaga, S. Tamura, K. Iwano, and S. Furui. Audio-visual speech recognition using new lip features extracted from side-face images. In COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, [25] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, et al. The htk book (for htk version 3.4). Cambridge university engineering department, 2 (2): 2-3, 2006.

A New Manifold Representation for Visual Speech Recognition

A New Manifold Representation for Visual Speech Recognition A New Manifold Representation for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan School of Computing & Electronic Engineering, Vision Systems Group Dublin City University,

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique

More information

Sparse Coding Based Lip Texture Representation For Visual Speaker Identification

Sparse Coding Based Lip Texture Representation For Visual Speaker Identification Sparse Coding Based Lip Texture Representation For Visual Speaker Identification Author Lai, Jun-Yao, Wang, Shi-Lin, Shi, Xing-Jian, Liew, Alan Wee-Chung Published 2014 Conference Title Proceedings of

More information

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using

More information

Multi-pose lipreading and audio-visual speech recognition

Multi-pose lipreading and audio-visual speech recognition RESEARCH Open Access Multi-pose lipreading and audio-visual speech recognition Virginia Estellers * and Jean-Philippe Thiran Abstract In this article, we study the adaptation of visual and audio-visual

More information

ISCA Archive

ISCA Archive ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 AUDIO-VISUAL SPEAKER IDENTIFICATION USING THE CUAVE DATABASE David

More information

Automatic speech reading by oral motion tracking for user authentication system

Automatic speech reading by oral motion tracking for user authentication system 2012 2013 Third International Conference on Advanced Computing & Communication Technologies Automatic speech reading by oral motion tracking for user authentication system Vibhanshu Gupta, M.E. (EXTC,

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS

END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS Stavros Petridis, Zuwei Li Imperial College London Dept. of Computing, London, UK {sp14;zl461}@imperial.ac.uk Maja Pantic Imperial College London / Univ.

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2 International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 015) An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng

More information

Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area

Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area MIRU 7 AAM 657 81 1 1 657 81 1 1 E-mail: {komai,miyamoto}@me.cs.scitec.kobe-u.ac.jp, {takigu,ariki}@kobe-u.ac.jp Active Appearance Model Active Appearance Model combined DCT Active Appearance Model combined

More information

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition N. Ahmad, S. Datta, D. Mulvaney and O. Farooq Loughborough Univ, LE11 3TU Leicestershire, UK n.ahmad@lboro.ac.uk 6445 Abstract

More information

A New Feature Local Binary Patterns (FLBP) Method

A New Feature Local Binary Patterns (FLBP) Method A New Feature Local Binary Patterns (FLBP) Method Jiayu Gu and Chengjun Liu The Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA Abstract - This paper presents

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Mouth Center Detection under Active Near Infrared Illumination

Mouth Center Detection under Active Near Infrared Illumination Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 173 Mouth Center Detection under Active Near Infrared Illumination THORSTEN GERNOTH, RALPH

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Pedestrian Detection with Improved LBP and Hog Algorithm

Pedestrian Detection with Improved LBP and Hog Algorithm Open Access Library Journal 2018, Volume 5, e4573 ISSN Online: 2333-9721 ISSN Print: 2333-9705 Pedestrian Detection with Improved LBP and Hog Algorithm Wei Zhou, Suyun Luo Automotive Engineering College,

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Locating 1-D Bar Codes in DCT-Domain

Locating 1-D Bar Codes in DCT-Domain Edith Cowan University Research Online ECU Publications Pre. 2011 2006 Locating 1-D Bar Codes in DCT-Domain Alexander Tropf Edith Cowan University Douglas Chai Edith Cowan University 10.1109/ICASSP.2006.1660449

More information

Diagonal Principal Component Analysis for Face Recognition

Diagonal Principal Component Analysis for Face Recognition Diagonal Principal Component nalysis for Face Recognition Daoqiang Zhang,2, Zhi-Hua Zhou * and Songcan Chen 2 National Laboratory for Novel Software echnology Nanjing University, Nanjing 20093, China 2

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Text Area Detection from Video Frames

Text Area Detection from Video Frames Text Area Detection from Video Frames 1 Text Area Detection from Video Frames Xiangrong Chen, Hongjiang Zhang Microsoft Research China chxr@yahoo.com, hjzhang@microsoft.com Abstract. Text area detection

More information

Linear Discriminant Analysis for 3D Face Recognition System

Linear Discriminant Analysis for 3D Face Recognition System Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.

More information

DECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey

DECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey DECODING VISEMES: IMPROVING MACHINE LIP-READING Helen L. Bear and Richard Harvey School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, United Kingdom ABSTRACT To undertake machine

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth Common Classification Tasks Recognition of individual objects/faces Analyze object-specific features (e.g., key points) Train with images from different viewing angles Recognition of object classes Analyze

More information

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION Hazim Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs, Universität Karlsruhe (TH) 76131 Karlsruhe, Germany

More information

Xing Fan, Carlos Busso and John H.L. Hansen

Xing Fan, Carlos Busso and John H.L. Hansen Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

An Adaptive Threshold LBP Algorithm for Face Recognition

An Adaptive Threshold LBP Algorithm for Face Recognition An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent

More information

Combining Dynamic Texture and Structural Features for Speaker Identification

Combining Dynamic Texture and Structural Features for Speaker Identification Combining Dynamic Texture and Structural Features for Speaker Identification Guoying Zhao Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P. O. Box 4500 FI-90014

More information

Visual Front-End Wars: Viola-Jones Face Detector vs Fourier Lucas-Kanade

Visual Front-End Wars: Viola-Jones Face Detector vs Fourier Lucas-Kanade ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Visual Front-End Wars: Viola-Jones Face Detector vs Fourier Lucas-Kanade

More information

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai

More information

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM Hazim Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs, University of Karlsruhe Am Fasanengarten 5, 76131, Karlsruhe, Germany

More information

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for

More information

Multifactor Fusion for Audio-Visual Speaker Recognition

Multifactor Fusion for Audio-Visual Speaker Recognition Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Component-based Face Recognition with 3D Morphable Models

Component-based Face Recognition with 3D Morphable Models Component-based Face Recognition with 3D Morphable Models Jennifer Huang 1, Bernd Heisele 1,2, and Volker Blanz 3 1 Center for Biological and Computational Learning, M.I.T., Cambridge, MA, USA 2 Honda

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

Research on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model

Research on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Research on Emotion Recognition for

More information

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Yongying Gao and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48823 email:

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

Face Quality Assessment System in Video Sequences

Face Quality Assessment System in Video Sequences Face Quality Assessment System in Video Sequences Kamal Nasrollahi, Thomas B. Moeslund Laboratory of Computer Vision and Media Technology, Aalborg University Niels Jernes Vej 14, 9220 Aalborg Øst, Denmark

More information

Human pose estimation using Active Shape Models

Human pose estimation using Active Shape Models Human pose estimation using Active Shape Models Changhyuk Jang and Keechul Jung Abstract Human pose estimation can be executed using Active Shape Models. The existing techniques for applying to human-body

More information

On Modeling Variations for Face Authentication

On Modeling Variations for Face Authentication On Modeling Variations for Face Authentication Xiaoming Liu Tsuhan Chen B.V.K. Vijaya Kumar Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 xiaoming@andrew.cmu.edu

More information

Hand gesture recognition with Leap Motion and Kinect devices

Hand gesture recognition with Leap Motion and Kinect devices Hand gesture recognition with Leap Motion and devices Giulio Marin, Fabio Dominio and Pietro Zanuttigh Department of Information Engineering University of Padova, Italy Abstract The recent introduction

More information

Iris Recognition for Eyelash Detection Using Gabor Filter

Iris Recognition for Eyelash Detection Using Gabor Filter Iris Recognition for Eyelash Detection Using Gabor Filter Rupesh Mude 1, Meenakshi R Patel 2 Computer Science and Engineering Rungta College of Engineering and Technology, Bhilai Abstract :- Iris recognition

More information

Head Frontal-View Identification Using Extended LLE

Head Frontal-View Identification Using Extended LLE Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging

More information

3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION

3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION 3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION Roland Goecke 1,2 1 Autonomous System and Sensing Technologies, National ICT Australia, Canberra,

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

arxiv: v1 [cs.cv] 3 Oct 2017

arxiv: v1 [cs.cv] 3 Oct 2017 Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Helen L. Bear, Richard W. Harvey, Barry-John Theobald and Yuxuan Lan School of Computing Sciences, University of East Anglia,

More information

Mouth Region Localization Method Based on Gaussian Mixture Model

Mouth Region Localization Method Based on Gaussian Mixture Model Mouth Region Localization Method Based on Gaussian Mixture Model Kenichi Kumatani and Rainer Stiefelhagen Universitaet Karlsruhe (TH), Interactive Systems Labs, Am Fasanengarten 5, 76131 Karlsruhe, Germany

More information

Robust Steganography Using Texture Synthesis

Robust Steganography Using Texture Synthesis Robust Steganography Using Texture Synthesis Zhenxing Qian 1, Hang Zhou 2, Weiming Zhang 2, Xinpeng Zhang 1 1. School of Communication and Information Engineering, Shanghai University, Shanghai, 200444,

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

Face Alignment Under Various Poses and Expressions

Face Alignment Under Various Poses and Expressions Face Alignment Under Various Poses and Expressions Shengjun Xin and Haizhou Ai Computer Science and Technology Department, Tsinghua University, Beijing 100084, China ahz@mail.tsinghua.edu.cn Abstract.

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

A Study on Similarity Computations in Template Matching Technique for Identity Verification

A Study on Similarity Computations in Template Matching Technique for Identity Verification A Study on Similarity Computations in Template Matching Technique for Identity Verification Lam, S. K., Yeong, C. Y., Yew, C. T., Chai, W. S., Suandi, S. A. Intelligent Biometric Group, School of Electrical

More information

Towards Lipreading Sentences with Active Appearance Models

Towards Lipreading Sentences with Active Appearance Models Towards Lipreading Sentences with Active Appearance Models George Sterpu, Naomi Harte Sigmedia, ADAPT Centre, School of Engineering, Trinity College Dublin, Ireland sterpug@tcd.ie, nharte@tcd.ie Abstract

More information

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Proceedings of APSIPA Annual Summit and Conference 215 16-19 December 215 Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA, Hiroshi NINOMIYA,

More information

Performance analysis of robust road sign identification

Performance analysis of robust road sign identification IOP Conference Series: Materials Science and Engineering OPEN ACCESS Performance analysis of robust road sign identification To cite this article: Nursabillilah M Ali et al 2013 IOP Conf. Ser.: Mater.

More information

Automatic Shadow Removal by Illuminance in HSV Color Space

Automatic Shadow Removal by Illuminance in HSV Color Space Computer Science and Information Technology 3(3): 70-75, 2015 DOI: 10.13189/csit.2015.030303 http://www.hrpub.org Automatic Shadow Removal by Illuminance in HSV Color Space Wenbo Huang 1, KyoungYeon Kim

More information

SOME stereo image-matching methods require a user-selected

SOME stereo image-matching methods require a user-selected IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 3, NO. 2, APRIL 2006 207 Seed Point Selection Method for Triangle Constrained Image Matching Propagation Qing Zhu, Bo Wu, and Zhi-Xiang Xu Abstract In order

More information

Multidirectional 2DPCA Based Face Recognition System

Multidirectional 2DPCA Based Face Recognition System Multidirectional 2DPCA Based Face Recognition System Shilpi Soni 1, Raj Kumar Sahu 2 1 M.E. Scholar, Department of E&Tc Engg, CSIT, Durg 2 Associate Professor, Department of E&Tc Engg, CSIT, Durg Email:

More information

Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT

Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT Conference on Advances in Communication and Control Systems 2013 (CAC2S 2013) Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT Astik Biswas Department of Electrical Engineering, NIT Rourkela,Orrisa

More information

1. INTRODUCTION ABSTRACT

1. INTRODUCTION ABSTRACT Weighted Fusion of Depth and Inertial Data to Improve View Invariance for Human Action Recognition Chen Chen a, Huiyan Hao a,b, Roozbeh Jafari c, Nasser Kehtarnavaz a a Center for Research in Computer

More information

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated

More information

Algorithm research of 3D point cloud registration based on iterative closest point 1

Algorithm research of 3D point cloud registration based on iterative closest point 1 Acta Technica 62, No. 3B/2017, 189 196 c 2017 Institute of Thermomechanics CAS, v.v.i. Algorithm research of 3D point cloud registration based on iterative closest point 1 Qian Gao 2, Yujian Wang 2,3,

More information

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method Parvin Aminnejad 1, Ahmad Ayatollahi 2, Siamak Aminnejad 3, Reihaneh Asghari Abstract In this work, we presented a novel approach

More information

Image Inpainting by Hyperbolic Selection of Pixels for Two Dimensional Bicubic Interpolations

Image Inpainting by Hyperbolic Selection of Pixels for Two Dimensional Bicubic Interpolations Image Inpainting by Hyperbolic Selection of Pixels for Two Dimensional Bicubic Interpolations Mehran Motmaen motmaen73@gmail.com Majid Mohrekesh mmohrekesh@yahoo.com Mojtaba Akbari mojtaba.akbari@ec.iut.ac.ir

More information

Deduction and Logic Implementation of the Fractal Scan Algorithm

Deduction and Logic Implementation of the Fractal Scan Algorithm Deduction and Logic Implementation of the Fractal Scan Algorithm Zhangjin Chen, Feng Ran, Zheming Jin Microelectronic R&D center, Shanghai University Shanghai, China and Meihua Xu School of Mechatronical

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of

More information

SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION

SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION IEE Colloquium on Integrated Audio-Visual Processing for Recognition, Synthesis and Communication, pp 8/1 8/7, 1996 1 SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION I A Matthews, J A Bangham and

More information

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models Proceedings of the orld Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain, July 13-14, 2015 Paper No. 343 Audio-Visual Speech Processing System for Polish

More information

AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION

AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION Iain Matthews, J. Andrew Bangham and Stephen Cox School of Information Systems, University of East Anglia, Norwich, NR4 7TJ,

More information

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

An Approach for Real Time Moving Object Extraction based on Edge Region Determination An Approach for Real Time Moving Object Extraction based on Edge Region Determination Sabrina Hoque Tuli Department of Computer Science and Engineering, Chittagong University of Engineering and Technology,

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON WITH S.Shanmugaprabha PG Scholar, Dept of Computer Science & Engineering VMKV Engineering College, Salem India N.Malmurugan Director Sri Ranganathar Institute

More information

Measurement of pinna flare angle and its effect on individualized head-related transfer functions

Measurement of pinna flare angle and its effect on individualized head-related transfer functions PROCEEDINGS of the 22 nd International Congress on Acoustics Free-Field Virtual Psychoacoustics and Hearing Impairment: Paper ICA2016-53 Measurement of pinna flare angle and its effect on individualized

More information

EUSIPCO A SPACE-VARIANT CUBIC-SPLINE INTERPOLATION

EUSIPCO A SPACE-VARIANT CUBIC-SPLINE INTERPOLATION EUSIPCO 213 1569744341 A SPACE-VARIAN CUBIC-SPLINE INERPOLAION Jianxing Jiang, Shaohua Hong, Lin Wang Department of Communication Engineering, Xiamen University, Xiamen, Fujian, 3615, P.R. China. ABSRAC

More information

3-D MRI Brain Scan Classification Using A Point Series Based Representation

3-D MRI Brain Scan Classification Using A Point Series Based Representation 3-D MRI Brain Scan Classification Using A Point Series Based Representation Akadej Udomchaiporn 1, Frans Coenen 1, Marta García-Fiñana 2, and Vanessa Sluming 3 1 Department of Computer Science, University

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Adaptive Skin Color Classifier for Face Outline Models

Adaptive Skin Color Classifier for Face Outline Models Adaptive Skin Color Classifier for Face Outline Models M. Wimmer, B. Radig, M. Beetz Informatik IX, Technische Universität München, Germany Boltzmannstr. 3, 87548 Garching, Germany [wimmerm, radig, beetz]@informatik.tu-muenchen.de

More information

A QR code identification technology in package auto-sorting system

A QR code identification technology in package auto-sorting system Modern Physics Letters B Vol. 31, Nos. 19 21 (2017) 1740035 (5 pages) c World Scientific Publishing Company DOI: 10.1142/S0217984917400358 A QR code identification technology in package auto-sorting system

More information

REAL-TIME FACE SWAPPING IN VIDEO SEQUENCES: MAGIC MIRROR

REAL-TIME FACE SWAPPING IN VIDEO SEQUENCES: MAGIC MIRROR REAL-TIME FACE SWAPPING IN VIDEO SEQUENCES: MAGIC MIRROR Nuri Murat Arar1, Fatma Gu ney1, Nasuh Kaan Bekmezci1, Hua Gao2 and Hazım Kemal Ekenel1,2,3 1 Department of Computer Engineering, Bogazici University,

More information

Intensity-Depth Face Alignment Using Cascade Shape Regression

Intensity-Depth Face Alignment Using Cascade Shape Regression Intensity-Depth Face Alignment Using Cascade Shape Regression Yang Cao 1 and Bao-Liang Lu 1,2 1 Center for Brain-like Computing and Machine Intelligence Department of Computer Science and Engineering Shanghai

More information

Articulatory Features for Robust Visual Speech Recognition

Articulatory Features for Robust Visual Speech Recognition Articulatory Features for Robust Visual Speech Recognition Kate Saenko, Trevor Darrell, and James Glass MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street Cambridge, Massachusetts,

More information

Robust biometric image watermarking for fingerprint and face template protection

Robust biometric image watermarking for fingerprint and face template protection Robust biometric image watermarking for fingerprint and face template protection Mayank Vatsa 1, Richa Singh 1, Afzel Noore 1a),MaxM.Houck 2, and Keith Morris 2 1 West Virginia University, Morgantown,

More information

IRIS SEGMENTATION OF NON-IDEAL IMAGES

IRIS SEGMENTATION OF NON-IDEAL IMAGES IRIS SEGMENTATION OF NON-IDEAL IMAGES William S. Weld St. Lawrence University Computer Science Department Canton, NY 13617 Xiaojun Qi, Ph.D Utah State University Computer Science Department Logan, UT 84322

More information

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation * A. H. M. Al-Helali, * W. A. Mahmmoud, and * H. A. Ali * Al- Isra Private University Email: adnan_hadi@yahoo.com Abstract:

More information

Moving Object Detection and Tracking for Video Survelliance

Moving Object Detection and Tracking for Video Survelliance Moving Object Detection and Tracking for Video Survelliance Ms Jyoti J. Jadhav 1 E&TC Department, Dr.D.Y.Patil College of Engineering, Pune University, Ambi-Pune E-mail- Jyotijadhav48@gmail.com, Contact

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study H.J. Nock, G. Iyengar, and C. Neti IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598. USA. Abstract. This paper

More information

Conversion of 2D Image into 3D and Face Recognition Based Attendance System

Conversion of 2D Image into 3D and Face Recognition Based Attendance System Conversion of 2D Image into 3D and Face Recognition Based Attendance System Warsha Kandlikar, Toradmal Savita Laxman, Deshmukh Sonali Jagannath Scientist C, Electronics Design and Technology, NIELIT Aurangabad,

More information