SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
|
|
- Stewart McDonald
- 5 years ago
- Views:
Transcription
1 Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages Published Online: September 14, 2009 This paper is available online at Pushpa Publishing House SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION YASUO ARIKI, TETSUYA TAKIGUCHI, TAKASHI MUROI and RYOICHI TAKASHIMA Department of Computer and System Engineering Kobe University, Kobe, Japan Abstract In this paper, we propose a new feature extraction method based on higher-order local auto-correlation (HLAC) and Fisher weight maps (FWM). Widely used MFCC features lack temporal dynamics. To solve this problem, 35 types of local auto-correlation features are computed within two-dimensional local regions. These local features are accumulated over more global regions by placing high scores on the discriminative areas, where the typical features among all phonemes are well expressed. This score map is called a Fisher weight map. We verified the effectiveness of the HLAC and FWM through speaker-independent phoneme recognition. The proposed method showed an 82.1% recognition rate, 8.9 points higher than the result by MFCC. Furthermore, by combining the proposed feature with MFCC, ΔMFCC and ΔΔMFCC, the recognition rate improved to 86.6%. Experimental results also demonstrate the robustness of the proposed method in noisy environments in comparison with MFCC-based features. Keywords and phrases: phoneme recognition, feature extraction, Fisher weight map, local auto-correlation. Received June 16, 2009
2 126 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA I. Introduction In speech recognition, MFCC (Mel-Frequency Cepstrum Coefficient) is widely used, which is a cepstrum conversion of a sub-band mel-frequency spectrum within a short time. Due to the characteristic of short time spectrum, MFCC lacks temporal dynamic features and degrades the recognition rate. To overcome this shortcoming, the regression coefficients of MFCC (delta, delta-delta MFCC) are usually utilized, but they are an indirect expression of temporal frequency changes such as formant transition or plosives. A more direct expression of the temporal frequency changes would be a geometrical feature in a two-dimensional local area [4, 7, 8, 9, 11]; for example, within a three-frame by three-frequency band area, on the temporal frequency domain. Figure 1 shows the time wave and spectrogram of the word democrats. On the lower frequency band, several formant transitions are observed, and on a high-frequency band, the plosive is observed. In order to locate such two-dimensional geometrical features, auto-correlation within a local area is effective because it can enhance the geometrical features. Originally, this type of feature extraction was proposed in the field of facial emotion recognition [10]. They computed 35 types of local auto-correlation features within a two-dimensional local area at each pixel on an image, and accumulated them within some discriminative areas where the typical features among all emotions were well expressed. The map showing the discriminative areas is called a Fisher weight map. Some other data-driven techniques have been also described (e.g., [2], [6]). We propose, in this paper, a method to find the geometrical discriminative features and discriminative areas of phonemes in the temporal-frequency domain of speech signals by using Fisher weight maps [1]. In [1], a simple experiment of speaker-dependent vowel recognition showed that the formant features were proved to be a discriminative feature by investigating the resultant Fisher weight maps. In this paper, the effectiveness of the proposed discriminative feature is verified through speaker independent phoneme-recognition experiments. To evaluate the noise robustness of the proposed feature extraction method, white Gaussian noise and real noises were added to the clean speech data at SNRs of 20, 10, and 0 db, because the higher-order local auto-correlation used in the proposed method is thought to be robust for noisy environments. In Sections II and III, auto-correlation coefficients based on the local features
3 SPEECH FEATURE EXTRACTION and the Fisher weight maps are described. Section IV describes an extraction flow of the geometrical discriminative features for phoneme recognition. In Section V, phoneme recognition experiments are shown. Figure 1. Example of a spectrogram of speech signal. II. Local Features and Weighted Higher-order Local Auto-correlations A. Local features Two-dimensional geometrical and local features are observed on the timefrequency matrix shown on the left in Figure 2. On the right-hand side, 3 3 local patterns are shown to capture the local features. The upper pattern is for continuation in a time direction, the middle for continuation in a frequency direction and the lower for transition. The flag 1 indicates the multiplication of the spectrum on the position. A local feature within the k-th local pattern at a position r is defined as follows: ( k ) ( ) ( ( k ) ) ( ( k h = I r + I r + a + + I r + a ) ), (1) r 1 where I ( r) is the log-power spectrum at position r on a time-frequency matrix composed of time t and frequency f. k-th local pattern. If I ( r) is the linear-power spectrum, multiplication, rather than addition, in Eq. (1). N ( k ) r + a i indicates the other position within the ( k h ) will be calculated by
4 128 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA We can calculate local features using any local-pattern sizes and any order sizes ( N ) in the time-frequency domain. But a large size for local patterns will result in a very high computational complexity. Therefore, a 3 3 local pattern is often used [10]. By limiting local patterns within a three-frame by three-band area at reference position r, setting the order N to be 2, and omitting the equivalence of translation, the number of displacement sets ( a 1,..., a N ) is set to 35. Namely, 35 types of local patterns are obtained at each position r on the time-frequency matrix as shown in Figure 3, according to [10]. For example, in the case of the top pattern in Figure 2 and pattern No. 16 in Figure 3 (the continuation in a time direction), the local feature is calculated by ( ) ( ) ( ) ( ), 16 h ( t ) S t, f S t 1, f S t 1, f, f = (2) where S ( t, f ) is the log-power spectrum at time t and frequency f. Figure 2. Example of local features. B. Weighted higher-order local auto-correlations The higher-order local auto-correlation x k for the k-th local pattern is obtained by summing the local features shown in Eq. (1) on the time-frequency matrix. It is formalized as follows: ( k x = ) k h r. (3) r
5 SPEECH FEATURE EXTRACTION Figure types of local patterns. In order to express the higher-order local auto-correlation in the matrix form, all the local features shown in Eq. (1) for the k-th local pattern are collected on the timefrequency matrix and presented as the following vector: ( k ) [ ( k ) ( k ) ( k ) t h = h h h ]. (4) 2, 2 2, T 1 F 1, T 1 Here the dimension of the vector is M = ( T 2)[ time] ( F 2)[ frequency]. The higher-order local auto-correlation x k for the k-th local pattern is expressed as follows using the M-dimensional vector ( k h ) : t ( k x h ) k = 1. (5) Here, 1 is the M-dimensional vector, in which each element is set to 1, and t represents the transpose. A local feature matrix is obtained as follows by placing the M-dimensional vectors ( k h ) in the horizontal direction one by one for all of the 35 local patterns, [ ( 1 ) ( K H = h h ) ]. (6)
6 130 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA The higher-order local auto-correlation vector x is obtained by packing the x k and is expressed as follows: t t x = [ x ] = H. (7) 1 x K 1 Figure 4 shows an example computing the local feature matrix H. Here, moving 35 local patterns on the windowed time-frequency matrix, the local features are computed. These local features are packed into the local feature matrix H. The higher-order local auto-correlation vector x presents the existence of the local patterns all over the time-frequency matrix. Therefore, it is not a discriminative vector. In order to make the higher-order local auto-correlation vector x have discriminative ability, local features of the same local pattern are summed over the windowed time-frequency matrix by putting a high weight on the local features where class difference appears clearly. This is done by replacing vector 1, consisting of M 1 s, by the weighting vector w. Then the weighted higher-order local autocorrelation vector x is obtained as follows: t x = H w. (8) Here w is called a Fisher weight map because it is computed based on linear discriminant analysis. Figure 4. Local feature matrix.
7 SPEECH FEATURE EXTRACTION III. Fisher Weight Map In order to find the Fisher weight map, Fisher s discriminative criterion is utilized [10]. Let P be the number of training data. Then the local feature matrix for M K the training data is denoted as { H }. The corresponding weighted P j R j = 1 higher-order local auto-correlation vectors, the within-class covariance matrix and the between-class covariance matrix are denoted as { }, Σ ~ and Σ ~, P x j j= 1 W respectively. In this paper, each phoneme class is used for training the Fisher weight map. Then the Fisher discriminative criterion J ( w) is expressed as follows using those denotations: ~ t trσ w w ( w) B Σ J = B ~ =, (9) trσ t W w ΣW w where Σ W and Σ B is the within-class covariance matrix and the between-class matrix of the local feature matrices (training data). The Fisher weight map is obtained as eigenvector w based on the following generalized eigenvalue decomposition derived by maximizing the Fisher t discriminative criterion under the constraint such that w Σ w = 1: W Σ w = λσw w. (10) B Since the Fisher weight map is composed of several eigenvectors, the number of eigenvectors is optimized in the phoneme recognition process. However, if the number of eigenvectors is set to 20, then the weighted higherorder local auto-correlation vector x shown in Eq. (8) equals 700 ( 35 20) dimensional vector. It is a large dimension, so the HMM used in the phoneme recognition cannot be estimated accurately and stably. To solve this problem, PCA (Principal Component Analysis) is used to reduce the dimension effectively in this paper. IV. Extraction Flow of Geometrical Discriminative Features Figure 5 shows an extraction flow of geometrical discriminative features and phoneme recognition. At first, speech waveforms are converted into the timefrequency domain by taking short-time Fourier transformation. At this point, a time sequence of short-time spectra (frames) is obtained. Then a moving window with B
8 132 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA several consecutive frames is put on the time sequence of short-time log spectra, forming a windowed time-frequency matrix. Local features of 35 types are computed at each position (time, frequency) within this window, forming a local feature matrix H with the number of positions the 35 types of local features. Finally, a Fisher weight map w is produced by applying linear discriminant analysis (LDA) to the local feature matrix H. Geometrical discriminative features are obtained as higher-order local auto-correlation by summing up the local features weighted by the Fisher weight map for each type of local feature, forming a 35-dimensional vector x for a window. By moving this window, the sequence of a 35-dimensional vector is obtained as a geometrical discriminative feature. In phoneme recognition, phoneme HMMs are trained first. Then, the test speech data is converted into a 35-dimensional vector sequence of geometrical discriminative features, and phoneme likelihood is computed using the trained phoneme HMMs. V. Phoneme Recognition Experiments The new feature extraction technique was evaluated on speaker-independent phoneme recognition tasks in both clean and noisy environments. Figure 5. New feature extraction flowchart.
9 SPEECH FEATURE EXTRACTION Figure 6. Phoneme recognition as a function of the number of eigenvectors (white noise, an SNR of 10 db). A. Experimental conditions We carried out speaker-independent Japanese 25-phoneme recognition under clean and noisy environments. Speech material was continuous speech data spoken by six male speakers and four female speakers and was manually segmented into phoneme sections. When carrying out speaker-independent phoneme recognition, 2,578 data (about 100 data for each phoneme segmented by hand for all phonemes) were collected together and used for phoneme training (Fisher weight map and phoneme HMMs). The Training data were also used to calculate the eigenvectors of PCA. Another 2,578 phoneme data from an individual speaker were tested. The phoneme recognition rate was computed by averaging the results from ten speakers. To evaluate the noise robustness of the proposed feature extraction method, white Gaussian noise was added to the clean speech data at SNRs of 20, 10, and 0 db, respectively, and real noise data were also used from the CENSREC-1-C database [5]. The speech waveform was transformed into a time frequency matrix using short-time Fourier transformation with a 25 ms frame width and 10 ms frame shift. Then, the frequency was converted into mel-scale using a mel-filter bank (64 dimensions: F in Eq. (4)). A window with a 5-frame width (T in Eq. (4)) and 1-frame shift was moved on the time-frequency matrix and the windowed time-frequency matrix was generated. The number of dimensions D of the weighted higher-order local auto-correlation vector x reduced by PCA was also experimentally optimized. B. Optimal eigenvector number Figure 6 shows the variation of system performance as a function of the number
10 134 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA of eigenvectors in Eq. (10), where the blue line represents the recognition rate of the clean-speech test set and the red line represents the recognition rate of the noisy speech test set (white noise, an SNR of 10 db). The experiment results show that no improvement results with a larger number of eigenvectors (more than 20), where the recognition rates of clean speech and noisy speech (white noise) are 75.2% and 64.5%, respectively. So, in the following experiments, the number of eigenvectors in the Fisher weight map is set to 20. Figure 7. Dimension reduction using PCA. The total number of dimensions for the proposed speech feature (with the number of eigenvectors W = ( 20) comes to ( local patterns) = 700, and PCA (Principal Component Analysis) can be effectively used for the reduction of the total number of the feature dimensions. Figure 7 shows the phoneme recognition results as a function of the number of feature dimensions using PCA. The experiment results show that the recognition rate is improved to 83.7% for clean speech and 72.2% for noisy speech (white noise, an SNR of 10 db), where the total number of feature dimensions is reduced to 50 using PCA. C. Comparison of each single feature in clean and noisy (white noise) environments Figure 8 shows the results of speaker-independent phoneme recognition using the proposed feature FWM (Fisher Weight Map), compared with the recognition results using MFCC. The recognition rate using the proposed feature FWM with the number of eigenvectors W = 20 ( = 700 dimensions) in the Fisher weight map is 75.19%, 70.82%, 62.33%, and 42.83%, at SNRs of. 20 db, 10 db, and 0 db, respectively, better than with MFCC.
11 SPEECH FEATURE EXTRACTION Figure 8. Results of speaker-independent phoneme recognition using each single feature. The highest phoneme recognition rate is obtained by using the weighted higherorder local auto-correlation vector reduced with PCA (FWM PCA (50 dim)). Compared with MFCC and ΔMFCC, the recognition rate is improved with the proposed method for all kinds of noise levels due to accumulation of the direct expression of temporal features of 10 persons. The recognition rate for clean speech is improved to 8.9 and 7.0 points compared with MFCC and ΔMFCC. As the noise level increases, the difference of the recognition rate between FWM and MFCC is the highest 20.5 points at an SNR of 10 db. When PCA was not applied, since the dimension is so high (700), the recognition rate was almost the same as that of MFCC.
12 136 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA Figure 9. Results of speaker-independent phoneme recognition using feature integration. D. Recognition results with feature integration in clean and noisy (white noise) environments Since FWM showed the highest phoneme recognition rate using the single feature for each kind of white Gaussian noise level, it was combined with MFCC (with power), ΔMFCC (with Δ power) and ΔΔMFCC (with ΔΔ power) in the phoneme recognition. The result is shown in Figure 9. FWM improved the recognition rate by 3.0 points and 4.0 points after combined with MFCC and ΔMFCC, respectively, compared with the original speaker-independent FWM (82.1% in Figure 8). When four features FWM, MFCC, ΔMFCC and ΔΔMFCC were combined together, the
13 SPEECH FEATURE EXTRACTION recognition rate showed the highest score 86.6% that was 0.5 points higher than the result of MFCC + ΔMFCC + ΔΔMFCC. This indicates that the FWM has information to improve the recognition rate obtained by MFCC, ΔMFCC and ΔΔMFCC combination. Furthermore, in the noisy speech at an SNR of 0 db, the recognition rate of FWM + MFCC + ΔMFCC + ΔΔMFCC was 6.4 points higher than that of MFCC + ΔMFCC + ΔΔMFCC. This shows robustness of FWM in noisy environments in comparison with MFCC. Figure 10. Recognition rates in real noisy environments using each single feature (%). E. Recognition results in real noisy environments To evaluate our method with real noise data, two kinds of noise data (restaurant and street) were used from the CENSREC-1-C database [5], which were recorded in low-snr ( 5 SNR 10 db) and high-snr ( 10dB SNR) environments. Figure 10 shows the recognition rates for each single feature. The proposed feature (FWM), which the number of dimensions is reduced to 50 with PCA, improves the recognition rate to 65.4% and 39.2% from 46.5% and 27.1% at the high SNR and the low SNR in the restaurant environment, respectively, compared with the recognition results using MFCC (with power). Also, the proposed feature (FWM) improves the recognition rate to 60.1% and 48.9% from 39.9% and 31.3% at the high SNR and the low SNR in the street environment, respectively, compared with the recognition result using MFCC (with power).
14 138 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA Figure 11 shows the recognition rates for feature integration. The integration feature of FWM + MFCC improves the recognition rate by 2.7% and 3.1% at the high and low SNR, respectively, in the restaurant environment, compared with MFCC + ΔMFCC. A similar result is obtained for the street environment, as shown in Figure 11. So, the experiment results show that the FWM performs better than the ΔMFCC as dynamic information in feature integration. Also, the integration of FWM + MFCC + ΔMFCC + ΔΔMFCC could improve the recognition rate by 3.8 points and 2.4 points at the high SNR and low SNR in the restaurant environment, respectively, and by 5.8 points and 5.5 points at the high SNR and low SNR in the street environment, respectively. Figure 11. Recognition rates in real noisy environments using feature integration (%). VI. Conclusion We described a new feature extraction method based on higher-order local autocorrelation and Fisher weight maps (FWM), where the new method enhances geometrical features in a two-dimensional local area. For speaker-independent phoneme recognition, the proposed FWM showed 82.1% recognition rate, by 8.9 and 7.0 points higher than the result obtained using MFCC and ΔMFCC for clean speech. Furthermore, by combining FWM with MFCC, ΔMFCC and ΔΔMFCC, the recognition improved to 86.6%. For noisy speech in a restaurant and street
15 SPEECH FEATURE EXTRACTION environment, the proposed FWM showed 65.4% and 60.1% recognition rates, or 18.9 and 20.2 points higher than the results obtained using MFCC, respectively. By combining MFCC, ΔMFCC, and ΔΔMFCC, the recognition rates improved to 70.7% and 64.0%, respectively. Experimental comparisons of the proposed feature extraction method FWM and MFCC in noisy environments have demonstrated that the FWM is more noise robust and offers better encoding of temporal dynamics than MFCC. In future research, we will extend the method into HMM expression, and to apply it to continuous phoneme recognition. Some problems with the method will be the lack of normalization like CMN and composition of HMM with noise components. We will investigate these problems theoretically as studied in [3]. References [1] Y. Ariki, S. Kato and T. Takiguchi, Phoneme recognition based on Fisher weight map to higher-order local auto-correlation, Proceedings of the 9th International Conference on Spoken Language Processing (Interspeech ICSLP), September 2006, pp [2] I. M.-Chagnolleau, G. Durou and F. Bimbot, Application of time-frequency principal component analysis to text-independent speaker identification, IEEE Trans. on Speech and Audio Processing 10(6) (2002), [3] M. P. Cooke, P. D. Green, L. B. Josifovski and A. Vizinho, Robust automatic speech recognition with missing and uncertain acoustic data, Speech Commun. 24 (2001), [4] M. Heckmann, X. Domont, F. Joublin and C. Goerick, A closer look on hierarchical spectro-temporal features (HIST), Proceedings of the Interspeech 2008, September 2008, pp [5] N. Kitaoka, K. Yamamoto, T. Kusamizu, S. Nakagawa, T. Yamada, S. Tsuge, C. Miyajima, T. Nishiura, M. Nakayama, Y. Denda, M. Fujimoto, T. Takiguchi, S. Tamura, S. Kuroiwa, K. Takeda and S. Nakamura, Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance, Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, December 2007, pp [6] N. Malayath, H. Hermansky, S. Kajarekar and B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification, Digital Signal Process. 10 (2000), [7] B. T. Meyer and B. Kollmeier, Optimization and evaluation of Gabor feature sets for ASR, Proceedings if the Interspeech 2008, September 2008, pp
16 140 Y. ARIKI, T. TAKIGUCHI, T. MUROI and R. TAKASHIMA [8] T. Nitta, Feature extraction for speech recognition based on orthogonal acoustic feature planes and LDA, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1999, pp [9] K. Schutte and J. Glass, Speech recognition with localized time-frequency pattern detectors, Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, December 2007, pp [10] Y. Shinohara and N. Otsu, Facial expression recognition using Fisher weight maps, Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, May 2004, pp [11] S. Y. Zhao and N. Morgan, Multi-stream spectro-temporal features for robust speech recognition, Proceedings of the Interspeech 2008, September 2008, pp
Reverberant Speech Recognition Based on Denoising Autoencoder
INTERSPEECH 2013 Reverberant Speech Recognition Based on Denoising Autoencoder Takaaki Ishii 1, Hiroki Komiyama 1, Takahiro Shinozaki 2, Yasuo Horiuchi 1, Shingo Kuroiwa 1 1 Division of Information Sciences,
More informationAudio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for
More informationNON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR. Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan
NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan Department of Electrical Engineering, University of California Los Angeles, USA {hiteshag,
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationAudio-visual speech recognition using deep bottleneck features and high-performance lipreading
Proceedings of APSIPA Annual Summit and Conference 215 16-19 December 215 Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA, Hiroshi NINOMIYA,
More informationComparative Evaluation of Feature Normalization Techniques for Speaker Verification
Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,
More information2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology
ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research
More informationKobe University Repository : Kernel
Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI JaLCDOI URL Multimodal voice conversion based on non-negative
More informationTwo-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments
The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro
More informationSimilarity Image Retrieval System Using Hierarchical Classification
Similarity Image Retrieval System Using Hierarchical Classification Experimental System on Mobile Internet with Cellular Phone Masahiro Tada 1, Toshikazu Kato 1, and Isao Shinohara 2 1 Department of Industrial
More informationProbabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information
Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural
More informationXing Fan, Carlos Busso and John H.L. Hansen
Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas
More informationBo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on
Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on TAN Zhili & MAK Man-Wai APSIPA 2015 Department of Electronic and Informa2on Engineering The Hong Kong Polytechnic
More informationThe Pre-Image Problem and Kernel PCA for Speech Enhancement
The Pre-Image Problem and Kernel PCA for Speech Enhancement Christina Leitner and Franz Pernkopf Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 6c, 8
More informationLinear Discriminant Analysis for 3D Face Recognition System
Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.
More informationResearch Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using
More informationBengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE
1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and
More informationA Comparison of Visual Features for Audio-Visual Automatic Speech Recognition
A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition N. Ahmad, S. Datta, D. Mulvaney and O. Farooq Loughborough Univ, LE11 3TU Leicestershire, UK n.ahmad@lboro.ac.uk 6445 Abstract
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationOptimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing
More informationSpeaker Verification with Adaptive Spectral Subband Centroids
Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21
More informationAN EXAMINING FACE RECOGNITION BY LOCAL DIRECTIONAL NUMBER PATTERN (Image Processing)
AN EXAMINING FACE RECOGNITION BY LOCAL DIRECTIONAL NUMBER PATTERN (Image Processing) J.Nithya 1, P.Sathyasutha2 1,2 Assistant Professor,Gnanamani College of Engineering, Namakkal, Tamil Nadu, India ABSTRACT
More informationDetection of goal event in soccer videos
Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,
More informationMulti-Modal Human Verification Using Face and Speech
22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationCHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS
38 CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 3.1 PRINCIPAL COMPONENT ANALYSIS (PCA) 3.1.1 Introduction In the previous chapter, a brief literature review on conventional
More informationPitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,
More informationAudio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models
Proceedings of the orld Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain, July 13-14, 2015 Paper No. 343 Audio-Visual Speech Processing System for Polish
More informationCombining Audio and Video for Detection of Spontaneous Emotions
Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University
More informationClassification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009
1 Introduction Classification of Hyperspectral Breast Images for Cancer Detection Sander Parawira December 4, 2009 parawira@stanford.edu In 2009 approximately one out of eight women has breast cancer.
More informationIMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei
IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES Mitchell McLaren, Yun Lei Speech Technology and Research Laboratory, SRI International, California, USA {mitch,yunlei}@speech.sri.com ABSTRACT
More informationModeling the Spectral Envelope of Musical Instruments
Modeling the Spectral Envelope of Musical Instruments Juan José Burred burred@nue.tu-berlin.de IRCAM Équipe Analyse/Synthèse Axel Röbel / Xavier Rodet Technical University of Berlin Communication Systems
More informationResearch on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model
e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Research on Emotion Recognition for
More informationLOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM
LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM Hazim Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs, University of Karlsruhe Am Fasanengarten 5, 76131, Karlsruhe, Germany
More informationDESIGN OF SPATIAL FILTER FOR WATER LEVEL DETECTION
Far East Journal of Electronics and Communications Volume 3, Number 3, 2009, Pages 203-216 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House DESIGN OF SPATIAL FILTER FOR
More informationMultifactor Fusion for Audio-Visual Speaker Recognition
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/94752
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationFace Detection and Recognition in an Image Sequence using Eigenedginess
Face Detection and Recognition in an Image Sequence using Eigenedginess B S Venkatesh, S Palanivel and B Yegnanarayana Department of Computer Science and Engineering. Indian Institute of Technology, Madras
More informationProduction of Video Images by Computer Controlled Cameras and Its Application to TV Conference System
Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More informationFACE RECOGNITION USING INDEPENDENT COMPONENT
Chapter 5 FACE RECOGNITION USING INDEPENDENT COMPONENT ANALYSIS OF GABORJET (GABORJET-ICA) 5.1 INTRODUCTION PCA is probably the most widely used subspace projection technique for face recognition. A major
More informationMultidirectional 2DPCA Based Face Recognition System
Multidirectional 2DPCA Based Face Recognition System Shilpi Soni 1, Raj Kumar Sahu 2 1 M.E. Scholar, Department of E&Tc Engg, CSIT, Durg 2 Associate Professor, Department of E&Tc Engg, CSIT, Durg Email:
More informationMobile Face Recognization
Mobile Face Recognization CS4670 Final Project Cooper Bills and Jason Yosinski {csb88,jy495}@cornell.edu December 12, 2010 Abstract We created a mobile based system for detecting faces within a picture
More informationFace detection and recognition. Many slides adapted from K. Grauman and D. Lowe
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances
More informationVideo Editing Based on Situation Awareness from Voice Information and Face Emotion
18 Video Editing Based on Situation Awareness from Voice Information and Face Emotion Tetsuya Takiguchi, Jun Adachi and Yasuo Ariki Kobe University Japan 1. Introduction Video camera systems are becoming
More informationGaussian Mixture Model Coupled with Independent Component Analysis for Palmprint Verification
Gaussian Mixture Model Coupled with Independent Component Analysis for Palmprint Verification Raghavendra.R, Bernadette Dorizzi, Ashok Rao, Hemantha Kumar G Abstract In this paper we present a new scheme
More informationUpper-limit Evaluation of Robot Audition based on ICA-BSS in Multi-source, Barge-in and Highly Reverberant Conditions
21 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 21, Anchorage, Alaska, USA Upper-limit Evaluation of Robot Audition based on ICA-BSS in Multi-source,
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationA NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD
A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON WITH S.Shanmugaprabha PG Scholar, Dept of Computer Science & Engineering VMKV Engineering College, Salem India N.Malmurugan Director Sri Ranganathar Institute
More informationOn Pre-Image Iterations for Speech Enhancement
Leitner and Pernkopf RESEARCH On Pre-Image Iterations for Speech Enhancement Christina Leitner 1* and Franz Pernkopf 2 * Correspondence: christina.leitner@joanneum.at 1 JOANNEUM RESEARCH Forschungsgesellschaft
More informationA Gaussian Mixture Model Spectral Representation for Speech Recognition
A Gaussian Mixture Model Spectral Representation for Speech Recognition Matthew Nicholas Stuttle Hughes Hall and Cambridge University Engineering Department PSfrag replacements July 2003 Dissertation submitted
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationFace Recognition Using Long Haar-like Filters
Face Recognition Using Long Haar-like Filters Y. Higashijima 1, S. Takano 1, and K. Niijima 1 1 Department of Informatics, Kyushu University, Japan. Email: {y-higasi, takano, niijima}@i.kyushu-u.ac.jp
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationComparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification
More informationFOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION. Sachin S. Kajarekar
FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION Sachin S. Kajarekar Speech Technology and Research Laboratory SRI International, Menlo Park, CA, USA sachin@speech.sri.com ABSTRACT
More informationCHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri
1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?
More informationApplication of Principal Components Analysis and Gaussian Mixture Models to Printer Identification
Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School
More informationTHE PERFORMANCE of automatic speech recognition
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 2109 Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments Michael L. Seltzer,
More informationFACE RECOGNITION BASED ON GENDER USING A MODIFIED METHOD OF 2D-LINEAR DISCRIMINANT ANALYSIS
FACE RECOGNITION BASED ON GENDER USING A MODIFIED METHOD OF 2D-LINEAR DISCRIMINANT ANALYSIS 1 Fitri Damayanti, 2 Wahyudi Setiawan, 3 Sri Herawati, 4 Aeri Rachmad 1,2,3,4 Faculty of Engineering, University
More informationPerspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony
Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp
More informationA long, deep and wide artificial neural net for robust speech recognition in unknown noise
A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,
More informationLOCAL FEATURE EXTRACTION METHODS FOR FACIAL EXPRESSION RECOGNITION
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 LOCAL FEATURE EXTRACTION METHODS FOR FACIAL EXPRESSION RECOGNITION Seyed Mehdi Lajevardi, Zahir M. Hussain
More informationA Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models
A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University
More informationFace Hallucination Based on Eigentransformation Learning
Advanced Science and Technology etters, pp.32-37 http://dx.doi.org/10.14257/astl.2016. Face allucination Based on Eigentransformation earning Guohua Zou School of software, East China University of Technology,
More informationImage and speech recognition Exercises
Image and speech recognition Exercises Włodzimierz Kasprzak v.3, 5 Project is co-financed by European Union within European Social Fund Exercises. Content. Pattern analysis (3). Pattern transformation
More informationManifold Constrained Deep Neural Networks for ASR
1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as
More informationEM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition
EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition Yan Han and Lou Boves Department of Language and Speech, Radboud University Nijmegen, The Netherlands {Y.Han,
More informationOptimizing feature representation for speaker diarization using PCA and LDA
Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?
More informationA text-independent speaker verification model: A comparative analysis
A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India
More informationReal Time Speaker Recognition System using MFCC and Vector Quantization Technique
Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Roma Bharti Mtech, Manav rachna international university Faridabad ABSTRACT This paper represents a very strong mathematical
More informationA Study on Different Challenges in Facial Recognition Methods
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.521
More information2014, IJARCSSE All Rights Reserved Page 461
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech
More informationMULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER
MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More informationA MAXIMUM NOISE FRACTION TRANSFORM BASED ON A SENSOR NOISE MODEL FOR HYPERSPECTRAL DATA. Naoto Yokoya 1 and Akira Iwasaki 2
A MAXIMUM NOISE FRACTION TRANSFORM BASED ON A SENSOR NOISE MODEL FOR HYPERSPECTRAL DATA Naoto Yokoya 1 and Akira Iwasaki 1 Graduate Student, Department of Aeronautics and Astronautics, The University of
More information[N569] Wavelet speech enhancement based on voiced/unvoiced decision
The 32nd International Congress and Exposition on Noise Control Engineering Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003 [N569] Wavelet speech enhancement based on voiced/unvoiced
More informationMaximum Likelihood Beamforming for Robust Automatic Speech Recognition
Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR
More informationBiometrics Technology: Multi-modal (Part 2)
Biometrics Technology: Multi-modal (Part 2) References: At the Level: [M7] U. Dieckmann, P. Plankensteiner and T. Wagner, "SESAM: A biometric person identification system using sensor fusion ", Pattern
More information/HFWXUH1RWHVLQ&RPSXWHU6FLHQFH$XGLR DQG9LGHREDVHG%LRPHWULF3HUVRQ$XWKHQWLFDWLRQ Springer LNCS, Bigün, et. al., Eds., 1997.
/HFWXUHRWHVLQ&RPSXWHU6FLHQFH$XGLR DQG9LGHREDVHG%LRPHWULF3HUVRQ$XWKHQWLFDWLRQ Springer LNCS, Bigün, et. al., Eds., 997. 6XEEDQG$SSURDFK)RU$XWRPDWLF6SHDNHU5HFRJQLWLRQ 2SWLPDO'LYLVLRQ2I7KH)UHTXHQF\'RPDLQ
More informationGPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search
GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search Wonkyum Lee Jungsuk Kim Ian Lane Electrical and Computer Engineering Carnegie Mellon University March 26, 2014 @GTC2014
More informationCHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION
122 CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 5.1 INTRODUCTION Face recognition, means checking for the presence of a face from a database that contains many faces and could be performed
More informationFace Recognition for Mobile Devices
Face Recognition for Mobile Devices Aditya Pabbaraju (adisrinu@umich.edu), Srujankumar Puchakayala (psrujan@umich.edu) INTRODUCTION Face recognition is an application used for identifying a person from
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationFace recognition using Singular Value Decomposition and Hidden Markov Models
Face recognition using Singular Value Decomposition and Hidden Markov Models PETYA DINKOVA 1, PETIA GEORGIEVA 2, MARIOFANNA MILANOVA 3 1 Technical University of Sofia, Bulgaria 2 DETI, University of Aveiro,
More informationVoice Conversion Using Dynamic Kernel. Partial Least Squares Regression
Voice Conversion Using Dynamic Kernel 1 Partial Least Squares Regression Elina Helander, Hanna Silén, Tuomas Virtanen, Member, IEEE, and Moncef Gabbouj, Fellow, IEEE Abstract A drawback of many voice conversion
More informationLocal Filter Selection Boosts Performance of Automatic Speechreading
52 5th Joint Symposium on Neural Computation Proceedings, UCSD (1998) Local Filter Selection Boosts Performance of Automatic Speechreading Michael S. Gray, Terrence J. Sejnowski Computational Neurobiology
More informationIllumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model
Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model TAE IN SEOL*, SUN-TAE CHUNG*, SUNHO KI**, SEONGWON CHO**, YUN-KWANG HONG*** *School of Electronic Engineering
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationVoice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression
INTERSPEECH 2013 Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj Department of Signal Processing,
More informationEnvironment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient)
Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient) Kamalpreet kaur #1, Jatinder Kaur *2 #1, *2 Department of Electronics and Communication Engineering, CGCTC,
More informationMengjiao Zhao, Wei-Ping Zhu
ADAPTIVE WAVELET PACKET THRESHOLDING WITH ITERATIVE KALMAN FILTER FOR SPEECH ENHANCEMENT Mengjiao Zhao, Wei-Ping Zhu Department of Electrical and Computer Engineering Concordia University, Montreal, Quebec,
More informationSeparating Speech From Noise Challenge
Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationEnhancing Face Recognition from Video Sequences using Robust Statistics
Enhancing Face Recognition from Video Sequences using Robust Statistics Sid-Ahmed Berrani Christophe Garcia France Telecom R&D TECH/IRIS France Telecom R&D TECH/IRIS 4, rue du Clos Courtel BP 91226 4,
More informationLinear Discriminant Analysis in Ottoman Alphabet Character Recognition
Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /
More informationDiagonal Principal Component Analysis for Face Recognition
Diagonal Principal Component nalysis for Face Recognition Daoqiang Zhang,2, Zhi-Hua Zhou * and Songcan Chen 2 National Laboratory for Novel Software echnology Nanjing University, Nanjing 20093, China 2
More informationRobust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma
Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected
More information