Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

Similar documents
ISAR IMAGING OF MULTIPLE TARGETS BASED ON PARTICLE SWARM OPTIMIZATION AND HOUGH TRANSFORM

Multifactor Fusion for Audio-Visual Speaker Recognition

Note Set 4: Finite Mixture Models and the EM Algorithm

Client Dependent GMM-SVM Models for Speaker Verification

Inference and Representation

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

Reconfiguration Optimization for Loss Reduction in Distribution Networks using Hybrid PSO algorithm and Fuzzy logic

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Speaker Verification with Adaptive Spectral Subband Centroids

Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization

Text-Independent Speaker Identification

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

Clustering Lecture 5: Mixture Model

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Unsupervised Learning: Clustering

The Pennsylvania State University. The Graduate School. Department of Electrical Engineering COMPARISON OF CAT SWARM OPTIMIZATION WITH PARTICLE SWARM

Fingerprint Mosaicking by Rolling with Sliding

QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION

Feature weighting using particle swarm optimization for learning vector quantization classifier

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Cell-to-switch assignment in. cellular networks. barebones particle swarm optimization

A MULTI-SWARM PARTICLE SWARM OPTIMIZATION WITH LOCAL SEARCH ON MULTI-ROBOT SEARCH SYSTEM

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on

Some questions of consensus building using co-association

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

[N569] Wavelet speech enhancement based on voiced/unvoiced decision

An improved PID neural network controller for long time delay systems using particle swarm optimization algorithm

Clustering web search results

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Mobile Robot Path Planning in Static Environments using Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Feeder Reconfiguration Using Binary Coding Particle Swarm Optimization

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass

Speaker Diarization System Based on GMM and BIC

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification

Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

Convolutional Code Optimization for Various Constraint Lengths using PSO

SIMULTANEOUS COMPUTATION OF MODEL ORDER AND PARAMETER ESTIMATION FOR ARX MODEL BASED ON MULTI- SWARM PARTICLE SWARM OPTIMIZATION

Constraints in Particle Swarm Optimization of Hidden Markov Models

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING

Object Recognition using Particle Swarm Optimization on Fourier Descriptors

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Network and Fuzzy Simulation

Multi-Modal Human Verification Using Face and Speech

PSO-BP ALGORITHM IMPLEMENTATION FOR MATERIAL SURFACE IMAGE IDENTIFICATION

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

A Multiobjective Memetic Algorithm Based on Particle Swarm Optimization

Classification of Soil and Vegetation by Fuzzy K-means Classification and Particle Swarm Optimization

Particle swarm optimization for mobile network design

Particle Swarm Optimization

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Introduction to Trajectory Clustering. By YONGLI ZHANG

K-Means and Gaussian Mixture Models

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique

Human Skill Transfer System via Novint Falcon

HIDDEN Markov model (HMM)-based statistical parametric

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23,

Fall 09, Homework 5

An Optimization of Association Rule Mining Algorithm using Weighted Quantum behaved PSO

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Mixture Models and EM

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

Inertia Weight. v i = ωv i +φ 1 R(0,1)(p i x i )+φ 2 R(0,1)(p g x i ) The new velocity update equation:

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Najiya P Fathima, C. V. Vipin Kishnan; International Journal of Advance Research, Ideas and Innovations in Technology

GRID SCHEDULING USING ENHANCED PSO ALGORITHM

Optimization of HMM by the Tabu Search Algorithm

Modified Particle Swarm Optimization

A ROBUST SPEAKER CLUSTERING ALGORITHM

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Fast Hybrid PSO and Tabu Search Approach for Optimization of a Fuzzy Controller

Content-based image and video analysis. Machine learning

A new way to optimize LDPC code in gaussian multiple access channel

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Speech Signal Filters based on Soft Computing Techniques: A Comparison

Reducing FMR of Fingerprint Verification by Using the Partial Band of Similarity

Hybrid Quasi-Monte Carlo Method for the Simulation of State Space Models

KEYWORDS: Mobile Ad hoc Networks (MANETs), Swarm Intelligence, Particle Swarm Optimization (PSO), Multi Point Relay (MPR), Throughput.

Small World Particle Swarm Optimizer for Global Optimization Problems

Automatic Shadow Removal by Illuminance in HSV Color Space

Mixture Models and the EM Algorithm

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Xing Fan, Carlos Busso and John H.L. Hansen

Latent Variable Models and Expectation Maximization

Introduction to Machine Learning CMU-10701

Implementation of the USB Token System for Fingerprint Verification

1 Lab + Hwk 5: Particle Swarm Optimization

Designing of Optimized Combinational Circuits Using Particle Swarm Optimization Algorithm

Developing a Data Driven System for Computational Neuroscience

Mixture Models and EM

A NEW APPROACH TO SOLVE ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATION

Transcription:

Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification *Jin Young Kim, *So Hee Min, *Seung You Na, **Seung Ho Choi, *School of Electronics and Computer Engineering Chonnam National University Yongbong-Dong 300, Puk-gu, Gwangju 500-757, South Korea **Dept. of Computer Engineering Dongshin University Daeho-Dong 252 Naju-Si, Jeollanam-Do, 520-74 South Korea Abstract: - The performance of speaker identification is severely degraded in noisy environments. Kim and et al suggested the concept of observation membership for enhancing performances of speaker identification with noisy speech []. The method is to weight observation probabilities with observation membership values decided by. In the paper [], the authors suggested heuristic parameter values for observation membership. In this paper we apply particle swarm optimization (PSO) for obtaining the optimal parameters. With the speaker identification experiments using the VidTimit database we verify that the optimization approach can achieve better performances than the original membership function. Key-Words: - Speaker identification, observation membership, particle swarm optimization Introduction The need of automatic speaker recognition (ASR) based on speech signal has been increased in Internet and mobile applications. But until now ASR is not successfully adopted in real service domain, because ASR has severe performance degradation in noise environment. For coping with noise problem, many algorithms have been studied [2-6]. And they can be classified in two approaches. One is to extract parameters robust to noise as like CMS (cepstrum mean subtraction) [2] and RASTA [3]. The other is model adaptation adjusting speaker s model to noisy environments [4-5]. Recently, a new concept of observation confidence was introduced. This method weights observation probabilities depending on noise quantity. In the paper [] Kim and et al proposed a method of GMM training and recognition based on observation confidence. According to the paper [], an observation confidence function was adopted and the function was determined by the author s heuristic knowledge. Thus the authors proposed a new possible approach, but they did not generalize the approach in the point of optimization. In this paper we introduce particle swarm optimization (PSO) for maximizing speaker identification rate using the observation membership function. For optimization we assume that the training speech is clean, for, we think, generally clean speech is used in training aspect. To verify our approach we perform a speaker identification experiments based on Gaussian mixture model (GMM). We evaluate out method with VidTimit database generally used in audio-visual speech recognition. 2 Speaker Identification Based on Observation Membership Figure shows the process of speaker identification based on observation membership function. In the figure parts drawn by dotted line represents the module, which estimate observation membership and consider it in probability calculation as a weight. A general

Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 53 training aspect of speaker identification process is described as follows. ) Extract feature parameters of input speech. In this paper we use Mel-Cepstrum, which is known as the best in speech processing. 2) Perform CMS (cepstrum mean subtraction) on Mel-cepstrum. CMS can eliminate the effects of channel distortion and partially overcome noise problem. 3) Train GMM models with input Mel-Cepstrum parameters filtered by CMS. Input speech where Pk ( X ) is the probability of k-th speaker for given observation sequence x n, px ( xn M k) is the probability of n-th parameter and k-th speaker model, and μ n is the value of observation membership function. Observation membership as observation confidence represents how an observation is exact. According to the reference [], OMF is represented with. In this paper we adopt Sigmoid function as a mapping function from to observation membership. That is, ( ) =, (Eq.2) + e μ a( b) Mel-Cepstrum where a is a scaling parameter and b is a shift parameter. In the paper [], 0.25 and 2.5 are used respectively for a and b. Perform CMS Filtering observation membership function 0.9 0.8 Train GMM models probability confidence 0.7 0.6 0.5 0.4 0.3 Speaker Models Identification t 0.2 0. 0-5 0 5 0 5 20 25 30 resul Figure. Speaker identification based on observation membership function. Figure 2. Example of observation membership function (a= 0.25 and b=-2.5) In the testing aspect the speaker, having the highest probability of his/her GMM model, is selected as identified speaker. In Figure, the modules represented by dotted line show algorithms proposed in the paper []. First, is calculated from input speech. Second, the value of observation membership function (OMF) is calculated by value. And then the probability of input speech is calculated by (Eq. ). N μn k x n k n= P ( X) = Π p ( x M ), (Eq.) According to the paper [], the parameter values of a and b was determined empirically. And the authors showed that the performance of speaker identification could be enhanced with that observation membership function. But, there is no any evidence that the parameter values used in the paper [] were optimal in the point of identification rate. So the scaling and shift parameters of OMF should be obtained by some optimization theory. In this paper we adopt a method of particle swarm optimization for determining the parameter values. In section 3, we discuss about PSO and optimization of the membership function.

Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 54 3 Particle Swarm Method and Optimization of Membership Function 3. Particle Swarm Optimization Particle swarm optimization (PSO) was proposed by R. Eberhart and J. Kennydy in 995 [7]. PSO was devised by imitating movements of bird flock. PSO is similar to genetic algorithm (GA) in the point that initial solutions are set to random values. Differently from GA, PSO performs renewal of each latent solution by combining previous latent solutions and random velocity vectors. These latent solutions are called particle swarm. As a general problem, let s assume that a function f () should be optimized with the parameter P. Then PSO method is described as follows. ) Randomize each latent solution { P i0 } 2) For each j-th iteration, perform the followings 2-) f ( P ) for each P 2-2) Test the convergence. If converged, break the loop. 2-3) For each i, calculate optimal solution on iterations { 0,..., j }. Let s them be pbest. 2-4) the global optimal solution on pbest. Lets it be gbest j 2-5) the velocity of each particle as follows. v = v + cr( pbest P ), (Eq.3) + cr 2 2( gbestj P) where c and c are constants and r 2 and r2 are random variables. 2-6) Renew each particle. P = P + v (Eq.4) 3) Determine the optimal solution as gbest j. speaker identification rate. That is, the object function could be defined by f( a, b) = K L k k= l= δ (arg max Pm( Xkl), k) m, K K L k = k X kl (Eq.5) where δ (, i j) is delta function, is l-th speech feature vectors of k-th speaker, and P m is observation probability of given feature sequence for m-th speaker. And arg max P m is the index of the speaker having m max probability. It is defined by (Eq.) The object function defined (Eq.5) is a nonlinear function. So it is impossible to obtain the parameters a and b of sigmoid membership function in a closed form. Thus we apply PSO approach for optimizing the object function. We expect the parameters obtained by PSO enhance performance of speaker identification. 4 Simulation and Discussion 4. Outline of Speaker Identification Experiments To evaluate our proposed algorithm, we used a speech database constructed by ETRI. The DB was developed for text-dependent speaker recognition in mobile environment. Speech signal was sampled by 8kHz, 8bit μ -law PCM codec. The DB includes 49 speaker s speech and 20 utterances per a speaker. We divide the DB into training set and testing set. Each set include 0 utterances. Figure 3 shows an example of sample utterance. The length of each utterance is about 3sec. So total 30sec speech is used for training speaker model. 3.2 Optimization of Observation Membership Function Based on PSO For optimizing the observation membership function described by Sigmoid function we need to define a object function f (). In this paper we define f () as

Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 55 Figure 3. Waveform of sample utterance In the experiments the frame length is 40msec and each frame is overlapped by 20msec. 2-th order Mel-cepstrum is used for analyzing speech. Also, log energy is included in speech features. For compensating noise effect, CMS is adopted. To modeling probability density function for each speaker GMM having 0 mixtures are adopted. For GMM training we apply full covariance modeling and fuzzy C-means clustering as initializing GMM models. Table. shows the outline of our text-dependent speaker recognition experiments. Table. Summary of text-dependent speaker identification experiments Speech DB ETRI DB for speaker recognition in mobile environments Sampling rate/ speech coding 8000 Hz / 8 bits μ -law PCM Speaker number 49 Number of training files 0 Number of testing files 0 Speech features Frame length/ Frame increment Channel compensation GMM modeling Gaussian mixture number 2-th order mel-cepstrum, log energy 40ms/20ms Cepstral Mean Subtraction EM algorithm, full covariance 4.2 Experimental Results and Discussion For verifying our PSO-based optimization method of the observation membership function, we performed two kinds of experiments. One is to optimize separately for each fixed. The other is to obtain just one optimized parameters on speech signals having varying s. In this paper we normalized each speech signal so that all utterances have + as 0 their maximum value. Gaussian noise was added to clean speech as additive noise. (Eq.6) shows this simple process. s ( n) = s( n) + αη( n), (Eq.6) η where sn ( ) is a clean signal, η ( n) is a random noise with the power and α is a mixing parameter, which determines noise quantity (). sη ( n) is a corrupted signal by noise. For obtaining the optimal parameters of a and b, we used the DB used for GMM training. Of course, the testing is performed with the utterances not used in training. Table 2 shows the optimal parameter values for several values. Table 2. Optimal parameter values depending on α 0.05 0.025 0.025 0.00625 Avg 7.9.8 6.3 20.8 Optimal a -0.47-0.5-0.53-0.53 Optimal b 9.89 0. 2.3 2. From Table 2 it is observed that the optimal scaling parameter increases a little with. All the values are near 0.5. However, in the case of the shift parameter, it varies from about 0 to 2 while increase. The change amount is around 20%. Figure 4 shows the identification rate with and without optimization. No-weighting means that weighting based on the observation membership function does not applied. Without-optimization means that the parameter values of the paper [] were used in the experiments. The figure shows that -based probability weighting improves the identification performance. Comparing the results of the paper [] and our optimization, we can confirm that using the optimal parameters has better results than the original values. Figure 5 shows the experimental results of α -independent optimization. The optimal parameter values of α -independent optimization are 0.5 and. respectively for a and b. According to Figure 5, there is no significant difference between α -independent optimization and α -dependent optimization. As a result, we don t need to use -varying parameter values of the observation membership function.

Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 56 identification rate 00 90 80 70 60 50 40 30 20 0.05 0.025 0.025 0.00625 0 alpha no weighting without opt. (ref. []) with opt. Figure 4. Identification results with/without optimization. identification rate 00 90 80 70 60 50 40 30 20 0.05 0.025 0.025 0.00625 alpha alpha-dependent alpha-independent Figure 5. Experimental results with α -independent optimization (a=-0.5, b=.) 5 Concluding Remarks We verified the recently proposed method of -based probability weighting. We applied PSO-based approach for the optimizing observation membership function. We obtained the scaling and shift parameter values of sigmoid-type OMF by maximizing the identification rate. Through text-dependent speaker identification experiments we confirmed that PSO-based parameter optimization could enhance the performance of -based probability weighting. In the future we will develop more general optimization approach, which can be applied to the case that training utterances are corrupted by noise. Acknowledgment This work was supported by Chonnam National University as a sabbatical grant. References: [] Jinyoung Kim, et. al, Modified GMM Training for Inexact Observation and Its Application to Speaker Identification, sumbitted to Speech Science. [2] Rosenberg A. et al, Cepstral channel normalization techniques for HMM-based speaker verification, Proc. ICSLP-94, pp. 835-838, 994. [3] Zhen Bin, Wu Xihong, Liu Zhimin, CHI Huisheng An Enhanced RASTA processing for speaker identification, Proc of 2000 ICSLP, pp. 25-254, 2000. [4] Mengusoglu E, Confidence Measure based Model Adaptation for Speaker Verification, Proc. of the 2nd IASTED International Conference on Communications, Internet and Information Technology, 2003. [5] Chin-Hung Sit, Man-Wai Mak, and Sun-Yuan Kung, "Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems," Proc of st Int. Conf. on Biometric Authentication, 2004. [6] Mammone R. J., Zhang X. and Ramachandran R. P., "Robust Speaker Recognition, A Feature-based Approach," IEEE Signal Processing Magazine, Vol. 3, No.5, 58-7, 996. [7] R. Eberhart and J. Kennedy, "A New Optimizer Using Partilce Swarm Theory," Proc. of Sixth International Symposium on Micro Machine and Human Science, pp.39-43, 995.