SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

Size: px
Start display at page:

Download "SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION"

Transcription

1 Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages Published Online: September 14, 2009 This paper is available online at Pushpa Publishing House SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION YASUO ARIKI, TETSUYA TAKIGUCHI, TAKASHI MUROI and RYOICHI TAKASHIMA Department of Computer and System Engineering Kobe University, Kobe, Japan Abstract In this paper, we propose a new feature extraction method based on higher-order local auto-correlation (HLAC) and Fisher weight maps (FWM). Widely used MFCC features lack temporal dynamics. To solve this problem, 35 types of local auto-correlation features are computed within two-dimensional local regions. These local features are accumulated over more global regions by placing high scores on the discriminative areas, where the typical features among all phonemes are well expressed. This score map is called a Fisher weight map. We verified the effectiveness of the HLAC and FWM through speaker-independent phoneme recognition. The proposed method showed an 82.1% recognition rate, 8.9 points higher than the result by MFCC. Furthermore, by combining the proposed feature with MFCC, ΔMFCC and ΔΔMFCC, the recognition rate improved to 86.6%. Experimental results also demonstrate the robustness of the proposed method in noisy environments in comparison with MFCC-based features. Keywords and phrases: phoneme recognition, feature extraction, Fisher weight map, local auto-correlation. Received June 16, 2009

2 126 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA I. Introduction In speech recognition, MFCC (Mel-Frequency Cepstrum Coefficient) is widely used, which is a cepstrum conversion of a sub-band mel-frequency spectrum within a short time. Due to the characteristic of short time spectrum, MFCC lacks temporal dynamic features and degrades the recognition rate. To overcome this shortcoming, the regression coefficients of MFCC (delta, delta-delta MFCC) are usually utilized, but they are an indirect expression of temporal frequency changes such as formant transition or plosives. A more direct expression of the temporal frequency changes would be a geometrical feature in a two-dimensional local area [4, 7, 8, 9, 11]; for example, within a three-frame by three-frequency band area, on the temporal frequency domain. Figure 1 shows the time wave and spectrogram of the word democrats. On the lower frequency band, several formant transitions are observed, and on a high-frequency band, the plosive is observed. In order to locate such two-dimensional geometrical features, auto-correlation within a local area is effective because it can enhance the geometrical features. Originally, this type of feature extraction was proposed in the field of facial emotion recognition [10]. They computed 35 types of local auto-correlation features within a two-dimensional local area at each pixel on an image, and accumulated them within some discriminative areas where the typical features among all emotions were well expressed. The map showing the discriminative areas is called a Fisher weight map. Some other data-driven techniques have been also described (e.g., [2], [6]). We propose, in this paper, a method to find the geometrical discriminative features and discriminative areas of phonemes in the temporal-frequency domain of speech signals by using Fisher weight maps [1]. In [1], a simple experiment of speaker-dependent vowel recognition showed that the formant features were proved to be a discriminative feature by investigating the resultant Fisher weight maps. In this paper, the effectiveness of the proposed discriminative feature is verified through speaker independent phoneme-recognition experiments. To evaluate the noise robustness of the proposed feature extraction method, white Gaussian noise and real noises were added to the clean speech data at SNRs of 20, 10, and 0 db, because the higher-order local auto-correlation used in the proposed method is thought to be robust for noisy environments. In Sections II and III, auto-correlation coefficients based on the local features

3 SPEECH FEATURE EXTRACTION and the Fisher weight maps are described. Section IV describes an extraction flow of the geometrical discriminative features for phoneme recognition. In Section V, phoneme recognition experiments are shown. Figure 1. Example of a spectrogram of speech signal. II. Local Features and Weighted Higher-order Local Auto-correlations A. Local features Two-dimensional geometrical and local features are observed on the timefrequency matrix shown on the left in Figure 2. On the right-hand side, 3 3 local patterns are shown to capture the local features. The upper pattern is for continuation in a time direction, the middle for continuation in a frequency direction and the lower for transition. The flag 1 indicates the multiplication of the spectrum on the position. A local feature within the k-th local pattern at a position r is defined as follows: ( k ) ( ) ( ( k ) ) ( ( k h = I r + I r + a + + I r + a ) ), (1) r 1 where I ( r) is the log-power spectrum at position r on a time-frequency matrix composed of time t and frequency f. k-th local pattern. If I ( r) is the linear-power spectrum, multiplication, rather than addition, in Eq. (1). N ( k ) r + a i indicates the other position within the ( k h ) will be calculated by

4 128 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA We can calculate local features using any local-pattern sizes and any order sizes ( N ) in the time-frequency domain. But a large size for local patterns will result in a very high computational complexity. Therefore, a 3 3 local pattern is often used [10]. By limiting local patterns within a three-frame by three-band area at reference position r, setting the order N to be 2, and omitting the equivalence of translation, the number of displacement sets ( a 1,..., a N ) is set to 35. Namely, 35 types of local patterns are obtained at each position r on the time-frequency matrix as shown in Figure 3, according to [10]. For example, in the case of the top pattern in Figure 2 and pattern No. 16 in Figure 3 (the continuation in a time direction), the local feature is calculated by ( ) ( ) ( ) ( ), 16 h ( t ) S t, f S t 1, f S t 1, f, f = (2) where S ( t, f ) is the log-power spectrum at time t and frequency f. Figure 2. Example of local features. B. Weighted higher-order local auto-correlations The higher-order local auto-correlation x k for the k-th local pattern is obtained by summing the local features shown in Eq. (1) on the time-frequency matrix. It is formalized as follows: ( k x = ) k h r. (3) r

5 SPEECH FEATURE EXTRACTION Figure types of local patterns. In order to express the higher-order local auto-correlation in the matrix form, all the local features shown in Eq. (1) for the k-th local pattern are collected on the timefrequency matrix and presented as the following vector: ( k ) [ ( k ) ( k ) ( k ) t h = h h h ]. (4) 2, 2 2, T 1 F 1, T 1 Here the dimension of the vector is M = ( T 2)[ time] ( F 2)[ frequency]. The higher-order local auto-correlation x k for the k-th local pattern is expressed as follows using the M-dimensional vector ( k h ) : t ( k x h ) k = 1. (5) Here, 1 is the M-dimensional vector, in which each element is set to 1, and t represents the transpose. A local feature matrix is obtained as follows by placing the M-dimensional vectors ( k h ) in the horizontal direction one by one for all of the 35 local patterns, [ ( 1 ) ( K H = h h ) ]. (6)

6 130 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA The higher-order local auto-correlation vector x is obtained by packing the x k and is expressed as follows: t t x = [ x ] = H. (7) 1 x K 1 Figure 4 shows an example computing the local feature matrix H. Here, moving 35 local patterns on the windowed time-frequency matrix, the local features are computed. These local features are packed into the local feature matrix H. The higher-order local auto-correlation vector x presents the existence of the local patterns all over the time-frequency matrix. Therefore, it is not a discriminative vector. In order to make the higher-order local auto-correlation vector x have discriminative ability, local features of the same local pattern are summed over the windowed time-frequency matrix by putting a high weight on the local features where class difference appears clearly. This is done by replacing vector 1, consisting of M 1 s, by the weighting vector w. Then the weighted higher-order local autocorrelation vector x is obtained as follows: t x = H w. (8) Here w is called a Fisher weight map because it is computed based on linear discriminant analysis. Figure 4. Local feature matrix.

7 SPEECH FEATURE EXTRACTION III. Fisher Weight Map In order to find the Fisher weight map, Fisher s discriminative criterion is utilized [10]. Let P be the number of training data. Then the local feature matrix for M K the training data is denoted as { H }. The corresponding weighted P j R j = 1 higher-order local auto-correlation vectors, the within-class covariance matrix and the between-class covariance matrix are denoted as { }, Σ ~ and Σ ~, P x j j= 1 W respectively. In this paper, each phoneme class is used for training the Fisher weight map. Then the Fisher discriminative criterion J ( w) is expressed as follows using those denotations: ~ t trσ w w ( w) B Σ J = B ~ =, (9) trσ t W w ΣW w where Σ W and Σ B is the within-class covariance matrix and the between-class matrix of the local feature matrices (training data). The Fisher weight map is obtained as eigenvector w based on the following generalized eigenvalue decomposition derived by maximizing the Fisher t discriminative criterion under the constraint such that w Σ w = 1: W Σ w = λσw w. (10) B Since the Fisher weight map is composed of several eigenvectors, the number of eigenvectors is optimized in the phoneme recognition process. However, if the number of eigenvectors is set to 20, then the weighted higherorder local auto-correlation vector x shown in Eq. (8) equals 700 ( 35 20) dimensional vector. It is a large dimension, so the HMM used in the phoneme recognition cannot be estimated accurately and stably. To solve this problem, PCA (Principal Component Analysis) is used to reduce the dimension effectively in this paper. IV. Extraction Flow of Geometrical Discriminative Features Figure 5 shows an extraction flow of geometrical discriminative features and phoneme recognition. At first, speech waveforms are converted into the timefrequency domain by taking short-time Fourier transformation. At this point, a time sequence of short-time spectra (frames) is obtained. Then a moving window with B

8 132 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA several consecutive frames is put on the time sequence of short-time log spectra, forming a windowed time-frequency matrix. Local features of 35 types are computed at each position (time, frequency) within this window, forming a local feature matrix H with the number of positions the 35 types of local features. Finally, a Fisher weight map w is produced by applying linear discriminant analysis (LDA) to the local feature matrix H. Geometrical discriminative features are obtained as higher-order local auto-correlation by summing up the local features weighted by the Fisher weight map for each type of local feature, forming a 35-dimensional vector x for a window. By moving this window, the sequence of a 35-dimensional vector is obtained as a geometrical discriminative feature. In phoneme recognition, phoneme HMMs are trained first. Then, the test speech data is converted into a 35-dimensional vector sequence of geometrical discriminative features, and phoneme likelihood is computed using the trained phoneme HMMs. V. Phoneme Recognition Experiments The new feature extraction technique was evaluated on speaker-independent phoneme recognition tasks in both clean and noisy environments. Figure 5. New feature extraction flowchart.

9 SPEECH FEATURE EXTRACTION Figure 6. Phoneme recognition as a function of the number of eigenvectors (white noise, an SNR of 10 db). A. Experimental conditions We carried out speaker-independent Japanese 25-phoneme recognition under clean and noisy environments. Speech material was continuous speech data spoken by six male speakers and four female speakers and was manually segmented into phoneme sections. When carrying out speaker-independent phoneme recognition, 2,578 data (about 100 data for each phoneme segmented by hand for all phonemes) were collected together and used for phoneme training (Fisher weight map and phoneme HMMs). The Training data were also used to calculate the eigenvectors of PCA. Another 2,578 phoneme data from an individual speaker were tested. The phoneme recognition rate was computed by averaging the results from ten speakers. To evaluate the noise robustness of the proposed feature extraction method, white Gaussian noise was added to the clean speech data at SNRs of 20, 10, and 0 db, respectively, and real noise data were also used from the CENSREC-1-C database [5]. The speech waveform was transformed into a time frequency matrix using short-time Fourier transformation with a 25 ms frame width and 10 ms frame shift. Then, the frequency was converted into mel-scale using a mel-filter bank (64 dimensions: F in Eq. (4)). A window with a 5-frame width (T in Eq. (4)) and 1-frame shift was moved on the time-frequency matrix and the windowed time-frequency matrix was generated. The number of dimensions D of the weighted higher-order local auto-correlation vector x reduced by PCA was also experimentally optimized. B. Optimal eigenvector number Figure 6 shows the variation of system performance as a function of the number

10 134 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA of eigenvectors in Eq. (10), where the blue line represents the recognition rate of the clean-speech test set and the red line represents the recognition rate of the noisy speech test set (white noise, an SNR of 10 db). The experiment results show that no improvement results with a larger number of eigenvectors (more than 20), where the recognition rates of clean speech and noisy speech (white noise) are 75.2% and 64.5%, respectively. So, in the following experiments, the number of eigenvectors in the Fisher weight map is set to 20. Figure 7. Dimension reduction using PCA. The total number of dimensions for the proposed speech feature (with the number of eigenvectors W = ( 20) comes to ( local patterns) = 700, and PCA (Principal Component Analysis) can be effectively used for the reduction of the total number of the feature dimensions. Figure 7 shows the phoneme recognition results as a function of the number of feature dimensions using PCA. The experiment results show that the recognition rate is improved to 83.7% for clean speech and 72.2% for noisy speech (white noise, an SNR of 10 db), where the total number of feature dimensions is reduced to 50 using PCA. C. Comparison of each single feature in clean and noisy (white noise) environments Figure 8 shows the results of speaker-independent phoneme recognition using the proposed feature FWM (Fisher Weight Map), compared with the recognition results using MFCC. The recognition rate using the proposed feature FWM with the number of eigenvectors W = 20 ( = 700 dimensions) in the Fisher weight map is 75.19%, 70.82%, 62.33%, and 42.83%, at SNRs of. 20 db, 10 db, and 0 db, respectively, better than with MFCC.

11 SPEECH FEATURE EXTRACTION Figure 8. Results of speaker-independent phoneme recognition using each single feature. The highest phoneme recognition rate is obtained by using the weighted higherorder local auto-correlation vector reduced with PCA (FWM PCA (50 dim)). Compared with MFCC and ΔMFCC, the recognition rate is improved with the proposed method for all kinds of noise levels due to accumulation of the direct expression of temporal features of 10 persons. The recognition rate for clean speech is improved to 8.9 and 7.0 points compared with MFCC and ΔMFCC. As the noise level increases, the difference of the recognition rate between FWM and MFCC is the highest 20.5 points at an SNR of 10 db. When PCA was not applied, since the dimension is so high (700), the recognition rate was almost the same as that of MFCC.

12 136 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA Figure 9. Results of speaker-independent phoneme recognition using feature integration. D. Recognition results with feature integration in clean and noisy (white noise) environments Since FWM showed the highest phoneme recognition rate using the single feature for each kind of white Gaussian noise level, it was combined with MFCC (with power), ΔMFCC (with Δ power) and ΔΔMFCC (with ΔΔ power) in the phoneme recognition. The result is shown in Figure 9. FWM improved the recognition rate by 3.0 points and 4.0 points after combined with MFCC and ΔMFCC, respectively, compared with the original speaker-independent FWM (82.1% in Figure 8). When four features FWM, MFCC, ΔMFCC and ΔΔMFCC were combined together, the

13 SPEECH FEATURE EXTRACTION recognition rate showed the highest score 86.6% that was 0.5 points higher than the result of MFCC + ΔMFCC + ΔΔMFCC. This indicates that the FWM has information to improve the recognition rate obtained by MFCC, ΔMFCC and ΔΔMFCC combination. Furthermore, in the noisy speech at an SNR of 0 db, the recognition rate of FWM + MFCC + ΔMFCC + ΔΔMFCC was 6.4 points higher than that of MFCC + ΔMFCC + ΔΔMFCC. This shows robustness of FWM in noisy environments in comparison with MFCC. Figure 10. Recognition rates in real noisy environments using each single feature (%). E. Recognition results in real noisy environments To evaluate our method with real noise data, two kinds of noise data (restaurant and street) were used from the CENSREC-1-C database [5], which were recorded in low-snr ( 5 SNR 10 db) and high-snr ( 10dB SNR) environments. Figure 10 shows the recognition rates for each single feature. The proposed feature (FWM), which the number of dimensions is reduced to 50 with PCA, improves the recognition rate to 65.4% and 39.2% from 46.5% and 27.1% at the high SNR and the low SNR in the restaurant environment, respectively, compared with the recognition results using MFCC (with power). Also, the proposed feature (FWM) improves the recognition rate to 60.1% and 48.9% from 39.9% and 31.3% at the high SNR and the low SNR in the street environment, respectively, compared with the recognition result using MFCC (with power).

14 138 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA Figure 11 shows the recognition rates for feature integration. The integration feature of FWM + MFCC improves the recognition rate by 2.7% and 3.1% at the high and low SNR, respectively, in the restaurant environment, compared with MFCC + ΔMFCC. A similar result is obtained for the street environment, as shown in Figure 11. So, the experiment results show that the FWM performs better than the ΔMFCC as dynamic information in feature integration. Also, the integration of FWM + MFCC + ΔMFCC + ΔΔMFCC could improve the recognition rate by 3.8 points and 2.4 points at the high SNR and low SNR in the restaurant environment, respectively, and by 5.8 points and 5.5 points at the high SNR and low SNR in the street environment, respectively. Figure 11. Recognition rates in real noisy environments using feature integration (%). VI. Conclusion We described a new feature extraction method based on higher-order local autocorrelation and Fisher weight maps (FWM), where the new method enhances geometrical features in a two-dimensional local area. For speaker-independent phoneme recognition, the proposed FWM showed 82.1% recognition rate, by 8.9 and 7.0 points higher than the result obtained using MFCC and ΔMFCC for clean speech. Furthermore, by combining FWM with MFCC, ΔMFCC and ΔΔMFCC, the recognition improved to 86.6%. For noisy speech in a restaurant and street

15 SPEECH FEATURE EXTRACTION environment, the proposed FWM showed 65.4% and 60.1% recognition rates, or 18.9 and 20.2 points higher than the results obtained using MFCC, respectively. By combining MFCC, ΔMFCC, and ΔΔMFCC, the recognition rates improved to 70.7% and 64.0%, respectively. Experimental comparisons of the proposed feature extraction method FWM and MFCC in noisy environments have demonstrated that the FWM is more noise robust and offers better encoding of temporal dynamics than MFCC. In future research, we will extend the method into HMM expression, and to apply it to continuous phoneme recognition. Some problems with the method will be the lack of normalization like CMN and composition of HMM with noise components. We will investigate these problems theoretically as studied in [3]. References [1] Y. Ariki, S. Kato and T. Takiguchi, Phoneme recognition based on Fisher weight map to higher-order local auto-correlation, Proceedings of the 9th International Conference on Spoken Language Processing (Interspeech ICSLP), September 2006, pp [2] I. M.-Chagnolleau, G. Durou and F. Bimbot, Application of time-frequency principal component analysis to text-independent speaker identification, IEEE Trans. on Speech and Audio Processing 10(6) (2002), [3] M. P. Cooke, P. D. Green, L. B. Josifovski and A. Vizinho, Robust automatic speech recognition with missing and uncertain acoustic data, Speech Commun. 24 (2001), [4] M. Heckmann, X. Domont, F. Joublin and C. Goerick, A closer look on hierarchical spectro-temporal features (HIST), Proceedings of the Interspeech 2008, September 2008, pp [5] N. Kitaoka, K. Yamamoto, T. Kusamizu, S. Nakagawa, T. Yamada, S. Tsuge, C. Miyajima, T. Nishiura, M. Nakayama, Y. Denda, M. Fujimoto, T. Takiguchi, S. Tamura, S. Kuroiwa, K. Takeda and S. Nakamura, Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance, Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, December 2007, pp [6] N. Malayath, H. Hermansky, S. Kajarekar and B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification, Digital Signal Process. 10 (2000), [7] B. T. Meyer and B. Kollmeier, Optimization and evaluation of Gabor feature sets for ASR, Proceedings if the Interspeech 2008, September 2008, pp

16 140 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA [8] T. Nitta, Feature extraction for speech recognition based on orthogonal acoustic feature planes and LDA, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1999, pp [9] K. Schutte and J. Glass, Speech recognition with localized time-frequency pattern detectors, Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, December 2007, pp [10] Y. Shinohara and N. Otsu, Facial expression recognition using Fisher weight maps, Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, May 2004, pp [11] S. Y. Zhao and N. Morgan, Multi-stream spectro-temporal features for robust speech recognition, Proceedings of the Interspeech 2008, September 2008, pp

Reverberant Speech Recognition Based on Denoising Autoencoder

Reverberant Speech Recognition Based on Denoising Autoencoder INTERSPEECH 2013 Reverberant Speech Recognition Based on Denoising Autoencoder Takaaki Ishii 1, Hiroki Komiyama 1, Takahiro Shinozaki 2, Yasuo Horiuchi 1, Shingo Kuroiwa 1 1 Division of Information Sciences,

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR. Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan

NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR. Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan Department of Electrical Engineering, University of California Los Angeles, USA {hiteshag,

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Proceedings of APSIPA Annual Summit and Conference 215 16-19 December 215 Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA, Hiroshi NINOMIYA,

More information

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI JaLCDOI URL Multimodal voice conversion based on non-negative

More information

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro

More information

Similarity Image Retrieval System Using Hierarchical Classification

Similarity Image Retrieval System Using Hierarchical Classification Similarity Image Retrieval System Using Hierarchical Classification Experimental System on Mobile Internet with Cellular Phone Masahiro Tada 1, Toshikazu Kato 1, and Isao Shinohara 2 1 Department of Industrial

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Xing Fan, Carlos Busso and John H.L. Hansen

Xing Fan, Carlos Busso and John H.L. Hansen Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas

More information

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on TAN Zhili & MAK Man-Wai APSIPA 2015 Department of Electronic and Informa2on Engineering The Hong Kong Polytechnic

More information

The Pre-Image Problem and Kernel PCA for Speech Enhancement

The Pre-Image Problem and Kernel PCA for Speech Enhancement The Pre-Image Problem and Kernel PCA for Speech Enhancement Christina Leitner and Franz Pernkopf Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 6c, 8

More information

Linear Discriminant Analysis for 3D Face Recognition System

Linear Discriminant Analysis for 3D Face Recognition System Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.

More information

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using

More information

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE 1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and

More information

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition N. Ahmad, S. Datta, D. Mulvaney and O. Farooq Loughborough Univ, LE11 3TU Leicestershire, UK n.ahmad@lboro.ac.uk 6445 Abstract

More information

Variable-Component Deep Neural Network for Robust Speech Recognition

Variable-Component Deep Neural Network for Robust Speech Recognition Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft

More information

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing

More information

Speaker Verification with Adaptive Spectral Subband Centroids

Speaker Verification with Adaptive Spectral Subband Centroids Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21

More information

AN EXAMINING FACE RECOGNITION BY LOCAL DIRECTIONAL NUMBER PATTERN (Image Processing)

AN EXAMINING FACE RECOGNITION BY LOCAL DIRECTIONAL NUMBER PATTERN (Image Processing) AN EXAMINING FACE RECOGNITION BY LOCAL DIRECTIONAL NUMBER PATTERN (Image Processing) J.Nithya 1, P.Sathyasutha2 1,2 Assistant Professor,Gnanamani College of Engineering, Namakkal, Tamil Nadu, India ABSTRACT

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

Multi-Modal Human Verification Using Face and Speech

Multi-Modal Human Verification Using Face and Speech 22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 38 CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 3.1 PRINCIPAL COMPONENT ANALYSIS (PCA) 3.1.1 Introduction In the previous chapter, a brief literature review on conventional

More information

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,

More information

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models Proceedings of the orld Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain, July 13-14, 2015 Paper No. 343 Audio-Visual Speech Processing System for Polish

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

Classification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009

Classification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009 1 Introduction Classification of Hyperspectral Breast Images for Cancer Detection Sander Parawira December 4, 2009 parawira@stanford.edu In 2009 approximately one out of eight women has breast cancer.

More information

IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei

IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES Mitchell McLaren, Yun Lei Speech Technology and Research Laboratory, SRI International, California, USA {mitch,yunlei}@speech.sri.com ABSTRACT

More information

Modeling the Spectral Envelope of Musical Instruments

Modeling the Spectral Envelope of Musical Instruments Modeling the Spectral Envelope of Musical Instruments Juan José Burred burred@nue.tu-berlin.de IRCAM Équipe Analyse/Synthèse Axel Röbel / Xavier Rodet Technical University of Berlin Communication Systems

More information

Research on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model

Research on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Research on Emotion Recognition for

More information

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM Hazim Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs, University of Karlsruhe Am Fasanengarten 5, 76131, Karlsruhe, Germany

More information

DESIGN OF SPATIAL FILTER FOR WATER LEVEL DETECTION

DESIGN OF SPATIAL FILTER FOR WATER LEVEL DETECTION Far East Journal of Electronics and Communications Volume 3, Number 3, 2009, Pages 203-216 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House DESIGN OF SPATIAL FILTER FOR

More information

Multifactor Fusion for Audio-Visual Speaker Recognition

Multifactor Fusion for Audio-Visual Speaker Recognition Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/94752

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Face Detection and Recognition in an Image Sequence using Eigenedginess

Face Detection and Recognition in an Image Sequence using Eigenedginess Face Detection and Recognition in an Image Sequence using Eigenedginess B S Venkatesh, S Palanivel and B Yegnanarayana Department of Computer Science and Engineering. Indian Institute of Technology, Madras

More information

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

Data preprocessing Functional Programming and Intelligent Algorithms

Data preprocessing Functional Programming and Intelligent Algorithms Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute

More information

FACE RECOGNITION USING INDEPENDENT COMPONENT

FACE RECOGNITION USING INDEPENDENT COMPONENT Chapter 5 FACE RECOGNITION USING INDEPENDENT COMPONENT ANALYSIS OF GABORJET (GABORJET-ICA) 5.1 INTRODUCTION PCA is probably the most widely used subspace projection technique for face recognition. A major

More information

Multidirectional 2DPCA Based Face Recognition System

Multidirectional 2DPCA Based Face Recognition System Multidirectional 2DPCA Based Face Recognition System Shilpi Soni 1, Raj Kumar Sahu 2 1 M.E. Scholar, Department of E&Tc Engg, CSIT, Durg 2 Associate Professor, Department of E&Tc Engg, CSIT, Durg Email:

More information

Mobile Face Recognization

Mobile Face Recognization Mobile Face Recognization CS4670 Final Project Cooper Bills and Jason Yosinski {csb88,jy495}@cornell.edu December 12, 2010 Abstract We created a mobile based system for detecting faces within a picture

More information

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances

More information

Video Editing Based on Situation Awareness from Voice Information and Face Emotion

Video Editing Based on Situation Awareness from Voice Information and Face Emotion 18 Video Editing Based on Situation Awareness from Voice Information and Face Emotion Tetsuya Takiguchi, Jun Adachi and Yasuo Ariki Kobe University Japan 1. Introduction Video camera systems are becoming

More information

Gaussian Mixture Model Coupled with Independent Component Analysis for Palmprint Verification

Gaussian Mixture Model Coupled with Independent Component Analysis for Palmprint Verification Gaussian Mixture Model Coupled with Independent Component Analysis for Palmprint Verification Raghavendra.R, Bernadette Dorizzi, Ashok Rao, Hemantha Kumar G Abstract In this paper we present a new scheme

More information

Upper-limit Evaluation of Robot Audition based on ICA-BSS in Multi-source, Barge-in and Highly Reverberant Conditions

Upper-limit Evaluation of Robot Audition based on ICA-BSS in Multi-source, Barge-in and Highly Reverberant Conditions 21 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 21, Anchorage, Alaska, USA Upper-limit Evaluation of Robot Audition based on ICA-BSS in Multi-source,

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON WITH S.Shanmugaprabha PG Scholar, Dept of Computer Science & Engineering VMKV Engineering College, Salem India N.Malmurugan Director Sri Ranganathar Institute

More information

On Pre-Image Iterations for Speech Enhancement

On Pre-Image Iterations for Speech Enhancement Leitner and Pernkopf RESEARCH On Pre-Image Iterations for Speech Enhancement Christina Leitner 1* and Franz Pernkopf 2 * Correspondence: christina.leitner@joanneum.at 1 JOANNEUM RESEARCH Forschungsgesellschaft

More information

A Gaussian Mixture Model Spectral Representation for Speech Recognition

A Gaussian Mixture Model Spectral Representation for Speech Recognition A Gaussian Mixture Model Spectral Representation for Speech Recognition Matthew Nicholas Stuttle Hughes Hall and Cambridge University Engineering Department PSfrag replacements July 2003 Dissertation submitted

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Face Recognition Using Long Haar-like Filters

Face Recognition Using Long Haar-like Filters Face Recognition Using Long Haar-like Filters Y. Higashijima 1, S. Takano 1, and K. Niijima 1 1 Department of Informatics, Kyushu University, Japan. Email: {y-higasi, takano, niijima}@i.kyushu-u.ac.jp

More information

Discriminate Analysis

Discriminate Analysis Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive

More information

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

More information

FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION. Sachin S. Kajarekar

FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION. Sachin S. Kajarekar FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION Sachin S. Kajarekar Speech Technology and Research Laboratory SRI International, Menlo Park, CA, USA sachin@speech.sri.com ABSTRACT

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School

More information

THE PERFORMANCE of automatic speech recognition

THE PERFORMANCE of automatic speech recognition IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 2109 Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments Michael L. Seltzer,

More information

FACE RECOGNITION BASED ON GENDER USING A MODIFIED METHOD OF 2D-LINEAR DISCRIMINANT ANALYSIS

FACE RECOGNITION BASED ON GENDER USING A MODIFIED METHOD OF 2D-LINEAR DISCRIMINANT ANALYSIS FACE RECOGNITION BASED ON GENDER USING A MODIFIED METHOD OF 2D-LINEAR DISCRIMINANT ANALYSIS 1 Fitri Damayanti, 2 Wahyudi Setiawan, 3 Sri Herawati, 4 Aeri Rachmad 1,2,3,4 Faculty of Engineering, University

More information

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp

More information

A long, deep and wide artificial neural net for robust speech recognition in unknown noise

A long, deep and wide artificial neural net for robust speech recognition in unknown noise A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,

More information

LOCAL FEATURE EXTRACTION METHODS FOR FACIAL EXPRESSION RECOGNITION

LOCAL FEATURE EXTRACTION METHODS FOR FACIAL EXPRESSION RECOGNITION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 LOCAL FEATURE EXTRACTION METHODS FOR FACIAL EXPRESSION RECOGNITION Seyed Mehdi Lajevardi, Zahir M. Hussain

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Face Hallucination Based on Eigentransformation Learning

Face Hallucination Based on Eigentransformation Learning Advanced Science and Technology etters, pp.32-37 http://dx.doi.org/10.14257/astl.2016. Face allucination Based on Eigentransformation earning Guohua Zou School of software, East China University of Technology,

More information

Image and speech recognition Exercises

Image and speech recognition Exercises Image and speech recognition Exercises Włodzimierz Kasprzak v.3, 5 Project is co-financed by European Union within European Social Fund Exercises. Content. Pattern analysis (3). Pattern transformation

More information

Manifold Constrained Deep Neural Networks for ASR

Manifold Constrained Deep Neural Networks for ASR 1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as

More information

EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition

EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition Yan Han and Lou Boves Department of Language and Speech, Radboud University Nijmegen, The Netherlands {Y.Han,

More information

Optimizing feature representation for speaker diarization using PCA and LDA

Optimizing feature representation for speaker diarization using PCA and LDA Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?

More information

A text-independent speaker verification model: A comparative analysis

A text-independent speaker verification model: A comparative analysis A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India

More information

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Roma Bharti Mtech, Manav rachna international university Faridabad ABSTRACT This paper represents a very strong mathematical

More information

A Study on Different Challenges in Facial Recognition Methods

A Study on Different Challenges in Facial Recognition Methods Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.521

More information

2014, IJARCSSE All Rights Reserved Page 461

2014, IJARCSSE All Rights Reserved Page 461 Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

A MAXIMUM NOISE FRACTION TRANSFORM BASED ON A SENSOR NOISE MODEL FOR HYPERSPECTRAL DATA. Naoto Yokoya 1 and Akira Iwasaki 2

A MAXIMUM NOISE FRACTION TRANSFORM BASED ON A SENSOR NOISE MODEL FOR HYPERSPECTRAL DATA. Naoto Yokoya 1 and Akira Iwasaki 2 A MAXIMUM NOISE FRACTION TRANSFORM BASED ON A SENSOR NOISE MODEL FOR HYPERSPECTRAL DATA Naoto Yokoya 1 and Akira Iwasaki 1 Graduate Student, Department of Aeronautics and Astronautics, The University of

More information

[N569] Wavelet speech enhancement based on voiced/unvoiced decision

[N569] Wavelet speech enhancement based on voiced/unvoiced decision The 32nd International Congress and Exposition on Noise Control Engineering Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003 [N569] Wavelet speech enhancement based on voiced/unvoiced

More information

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR

More information

Biometrics Technology: Multi-modal (Part 2)

Biometrics Technology: Multi-modal (Part 2) Biometrics Technology: Multi-modal (Part 2) References: At the Level: [M7] U. Dieckmann, P. Plankensteiner and T. Wagner, "SESAM: A biometric person identification system using sensor fusion ", Pattern

More information

/HFWXUH1RWHVLQ&RPSXWHU6FLHQFH$XGLR DQG9LGHREDVHG%LRPHWULF3HUVRQ$XWKHQWLFDWLRQ Springer LNCS, Bigün, et. al., Eds., 1997.

/HFWXUH1RWHVLQ&RPSXWHU6FLHQFH$XGLR DQG9LGHREDVHG%LRPHWULF3HUVRQ$XWKHQWLFDWLRQ Springer LNCS, Bigün, et. al., Eds., 1997. /HFWXUHRWHVLQ&RPSXWHU6FLHQFH$XGLR DQG9LGHREDVHG%LRPHWULF3HUVRQ$XWKHQWLFDWLRQ Springer LNCS, Bigün, et. al., Eds., 997. 6XEEDQG$SSURDFK)RU$XWRPDWLF6SHDNHU5HFRJQLWLRQ 2SWLPDO'LYLVLRQ2I7KH)UHTXHQF\'RPDLQ

More information

GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search

GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search Wonkyum Lee Jungsuk Kim Ian Lane Electrical and Computer Engineering Carnegie Mellon University March 26, 2014 @GTC2014

More information

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 122 CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 5.1 INTRODUCTION Face recognition, means checking for the presence of a face from a database that contains many faces and could be performed

More information

Face Recognition for Mobile Devices

Face Recognition for Mobile Devices Face Recognition for Mobile Devices Aditya Pabbaraju (adisrinu@umich.edu), Srujankumar Puchakayala (psrujan@umich.edu) INTRODUCTION Face recognition is an application used for identifying a person from

More information

RLAT Rapid Language Adaptation Toolkit

RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction

More information

Face recognition using Singular Value Decomposition and Hidden Markov Models

Face recognition using Singular Value Decomposition and Hidden Markov Models Face recognition using Singular Value Decomposition and Hidden Markov Models PETYA DINKOVA 1, PETIA GEORGIEVA 2, MARIOFANNA MILANOVA 3 1 Technical University of Sofia, Bulgaria 2 DETI, University of Aveiro,

More information

Voice Conversion Using Dynamic Kernel. Partial Least Squares Regression

Voice Conversion Using Dynamic Kernel. Partial Least Squares Regression Voice Conversion Using Dynamic Kernel 1 Partial Least Squares Regression Elina Helander, Hanna Silén, Tuomas Virtanen, Member, IEEE, and Moncef Gabbouj, Fellow, IEEE Abstract A drawback of many voice conversion

More information

Local Filter Selection Boosts Performance of Automatic Speechreading

Local Filter Selection Boosts Performance of Automatic Speechreading 52 5th Joint Symposium on Neural Computation Proceedings, UCSD (1998) Local Filter Selection Boosts Performance of Automatic Speechreading Michael S. Gray, Terrence J. Sejnowski Computational Neurobiology

More information

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model TAE IN SEOL*, SUN-TAE CHUNG*, SUNHO KI**, SEONGWON CHO**, YUN-KWANG HONG*** *School of Electronic Engineering

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression

Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression INTERSPEECH 2013 Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj Department of Signal Processing,

More information

Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient)

Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient) Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient) Kamalpreet kaur #1, Jatinder Kaur *2 #1, *2 Department of Electronics and Communication Engineering, CGCTC,

More information

Mengjiao Zhao, Wei-Ping Zhu

Mengjiao Zhao, Wei-Ping Zhu ADAPTIVE WAVELET PACKET THRESHOLDING WITH ITERATIVE KALMAN FILTER FOR SPEECH ENHANCEMENT Mengjiao Zhao, Wei-Ping Zhu Department of Electrical and Computer Engineering Concordia University, Montreal, Quebec,

More information

Separating Speech From Noise Challenge

Separating Speech From Noise Challenge Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins

More information

Confidence Measures: how much we can trust our speech recognizers

Confidence Measures: how much we can trust our speech recognizers Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition

More information

Enhancing Face Recognition from Video Sequences using Robust Statistics

Enhancing Face Recognition from Video Sequences using Robust Statistics Enhancing Face Recognition from Video Sequences using Robust Statistics Sid-Ahmed Berrani Christophe Garcia France Telecom R&D TECH/IRIS France Telecom R&D TECH/IRIS 4, rue du Clos Courtel BP 91226 4,

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

Diagonal Principal Component Analysis for Face Recognition

Diagonal Principal Component Analysis for Face Recognition Diagonal Principal Component nalysis for Face Recognition Daoqiang Zhang,2, Zhi-Hua Zhou * and Songcan Chen 2 National Laboratory for Novel Software echnology Nanjing University, Nanjing 20093, China 2

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information