EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition

Size: px
Start display at page:

Download "EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition"

Transcription

1 EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition Yan Han and Lou Boves Department of Language and Speech, Radboud University Nijmegen, The Netherlands {Y.Han, Abstract. In this paper, we introduce two reformulated versions of the standard EM algorithm, namely Successive Split EM and Split and Merge EM, to relax the problem of initialization dependence in datadriven Speech Trajectory Clustering. These two algorithms allow us to prevent the EM procedure in Trajectory Clustering from ending in a local maximum of the likelihood surface. Thus, the new methods will generate more coherent trajectory clusters. We applied these two methods for developing multiple parallel HMMs for a continuous digit recognition task. We compared the performance obtained with the proposed methods to the recognition performance obtained with knowledge-based contextdependent Head-Body-Tail models. The results showed that both datadriven approaches significantly outperform the knowledge-based approach. In addition, in most cases the model based on Split and Merge EM is better than the model based on Successive Split EM. 1 Introduction Over the past decades, it has been repeatedly shown that modeling pronunciation variation with multiple parallel HMM paths can significantly improve the performance of automatic speech recognition. The idea underlying multiple-hmms acoustic modeling is to use HMM topologies with multiple parallel paths that account for the structure of the acoustic variability, thus alleviating the so called trajectory folding problem [1]. Well-known examples are Gender-dependent models and Context-dependent models. The common feature of these examples is that the training tokens of an acoustic unit (e.g. phoneme, syllable, word) are first clustered into separate subgroups with respect to a priori phonetic and linguistic knowledge, and these subgroups are then used to train separate HMM paths. Head-Body-Tail (HBT) model [2] for digit recognition, is an example of context-dependent modeling in which phonetic knowledge about the immediate left and/or right neighboring acoustic unit is used as the criterion to split training tokens. However, this top-down method is not necessarily suitable for all sources of variation. First of all, it is very hard to decide what is the most important

2 source of variation in a certain speech database. Inter-speaker variation, for example, may well be more important than linguistic context variation for a small vocabulary recognition task. Secondly, even within one speech database, the most important variation for different acoustic units may be due to different factors, such as speaking style, speed or regional background of the speakers. The use of a single criterion to derive pronunciation variation clusters for all acoustic units might not be appropriate. Finally, some important sources of speech variation may not be amenable to top-down modeling. Speaking style, for instance, is important for many speech recognition tasks, but it is very hard to label utterances in a database for relevant styles. These limitations of the knowledge-based methodology limit the power of conventional multiple-hmm acoustic modeling. The limitations of the knowledge-based approach might more seriously deteriorate the performance of a Chinese speech recognizer. Chinese is syllable-based language. The most natural acoustic units for a Chinese recognizer are syllables, which can well model the long term coarticulation in speech. However, it holds that part of the pronunciation variation is due to factors such as neighboring syllables, the speaking rate, dialect, etc. Applying prior knowledge, for instance context, on syllables rather than phonemes may lead to the sparsity of available training data in the subgroups to accurately train the separate HMM paths. To overcome the limitations of the knowledge-based approach, some datedriven approaches were proposed [3] [4]. Contrary to a knowledge-based approach, a data-driven approach automatically derives the most salient pronunciation variation classes by clustering the training tokens of individual acoustic units. In this way, the most important variants can be uncovered directly from the acoustic data. However, given the fact that speech tokens are time serials data with different length, it is not straightforward to define a distance measure, which is a necessary prerequisite for clustering speech tokens in a data-driven manner. Previous methods to measure the distance include dynamic time wrapping of tokens [3], and modeling individual token as HMM [4]. However, the first method loses the information that successive frames in a speech token are not statistically independent. The second method loses information about the details of the temporal evolution of the speech patterns. In our previous work, we developed a novel data-driven method to cluster training tokens, namely Trajectory Clustering (TC), and evaluated the method on different types of recognition tasks [5] [6]. In this approach, the training tokens are represented in terms of continuous trajectories along time in acoustic parameter space. The speech trajectories are then clustered into a number of classes using the Mixture of Polynomial Regression [7]. In this way, the dependency of neighboring frames and the evolutionary pattern of a training token are preserved. One common but serious problem for TC is that it is highly sensitive to the initial value of its model parameters, due to the fact that the EM algorithm adopted by TC in the parameter estimation can only give locally optimized solution. The major contribution of this paper is to introduce two clus-

3 tering strategies, namely Successive Split EM (SSEM) and Split and Merge EM (SMEM), to partly solve the initialization problem. Experiments were carried out in connected Dutch digit recognition task to evaluate the performance of these approaches, using conventional context-dependent HBT models as a reference. The proposed TC model can also be directly applied to Chinese speech recognition. This paper is organized as follows: Section 2 introduces the mathematics underlying the TC model, together with the overall clustering strategy based on SSEM and SMEM. Section 3 describes the design and the results of the experiments. Finally, in Section 4, our main conclusions are drawn. 2 Methodology 2.1 Speech Trajectory Clustering In TC, speech tokens are assumed to be drawn from several components of mixture Gaussians, where the mean of each component density is a polynomial function of time. For speech token j with a length of N j frames, the matrix form of the regression equation for component k in D dimensional acoustic feature space can be written as or: y (d) j (1) y (d) j (2). y (d) j (N j) ( = Y j = X j β k + E k (1) N j 1 )p 1... ( N j 1 N j 1 )p β (d) k,0 β (d) k,1. β (d) k,p + k (1) k (2). (Nj) e (d) e (d) e (d) k for d = 1,..., D Y j is the feature vector matrix, which is N j D; X j is an N j (p + 1) matrix whose second column contains the frame numbers corresponding to the feature vectors in Y j, and p is the highest order of the regression model, in our case p = 3; β k is a matrix of regression coefficients; E k is N j D residual error matrix which is assumed to be zero-mean multivariate Gaussian with covariance matrix Σ k. Since the speech trajectories that we will be dealing with have different durations, we normalize the trajectories to unit length by dividing the frame numbers in the second column of X j by N j 1. Note that the normalization does not change the number of frames in a speech token, but varies the way of time representation. In [8], we found that this method of handling different durations yields the most coherent clusters. Assume that speech trajectories are modeled by a Gaussian mixture with K components, each of which is a regression model with polynomial mean and

4 Gaussian residue. Then the probability that a speech trajectory Y j is generated by the mixture model is a linear combination of component regression models, which can be written as P (Y j X j, θ) = K k=1 N j 1 ω k i=0 f k (y j (i) x j (i), θ k ) (2) where the ω k s are the weights of the components, f k (y j (i) x j (i), θ k ) is the observed density given that Y j belongs to component k, and θ k = {β k, Σ k } are the model parameters for the k th regression component. The log-likelihood of the parameter θ given the set S with M speech trajectories can be defined as N M K j 1 L(θ S) = log ω k f k (y j (i) x j (i), θ k ) (3) j=1 k=1 To find the maximum likelihood estimates of the parameters of a mixture model, EM is the most general algorithm. The EM algorithm consists of the following two steps: The E-step calculates the membership probability: h jk = N j 1 ω k K k=1 i=0 N j 1 ω k i=0 i=0 f k (y j (i) x j (i), θ k ) f k (y j (i) x j (i), θ k ) which is the posterior probability that trajectory Y j is generated by component k. All the acoustic vectors in Y j share the same probability. The M-step calculates the the new model parameters: (4) ˆβ k = (X H k X) 1 X H k Y (5) ˆΣ k = (Y Xˆβ k ) H k (Y Xˆβ k ) M j=1 N jh jk (6) ŵ k = 1 M M h jk (7) j=1 where H k is a diagonal matrix with [h 1k h 2k... h Mk ] as the diagonal, in which h jk is a row vector containing N j copies of the membership probability h jk. By default, the EM algorithm starts with randomly initialized model parameters. Then the E-step and M-step are iteratively performed until convergence on loglikelihood Eq.(3) is reached. Finally, each speech trajectory is assigned to the cluster with the highest membership probability h jk.

5 Table 1. Successive Split EM for Speech Trajectory Clustering 1. Fit one polynomial to the complete data set, compute the model parameters β and Σ, set K = 1 and ω 1 = 1; 2. Select the component k (k K) with largest ω k to split, initialize the parameters of new components with respect to Eq.(8) (10), increase K by 1; 3. Run EM on all mixture components until convergence of log-likelihood (Eq.(3)); 4. Loop to step 2, stop if K has reached the desired number. 2.2 Successive Split EM One of the issues for the EM algorithm is that it often converges to one of the local maxima of the likelihood surface. As a consequence, the EM procedure for TC is highly sensitive to the initial parameter assignments: Different initial values of the model parameters lead to different clusters after EM estimation. One way to tackle this problem is to apply the Linde-Buzo-Gray(LBG) algorithm [9] to TC. We start from the complete set of speech trajectories, then successively split one cluster selected with respect to a certain split criterion, until K clusters are obtained. Assume in a certain split iteration, component k is selected to be split into k and j. The initialization of parameters for k and j after split is as follows: β k,0 = β k,0 + ε and β j,0 = β k,0 ε (8) where ε is a small noise term sampled from N(0, Σ). Σ k = Σ j = det(σ k ) 1/D I D (9) Here, det(σ) denotes the determinant of matrix Σ and I D is the D-dimensional identity matrix. ω k = ω j = ω k 2 (10) The split criterion adopted in this work is to always split the component with largest weight ω. In all our experiments with TC that we conducted so far, we have found that the component with the largest ω is always related to the cluster with the largest number of trajectories. Thus, taking large size cluster to split is reasonable, because it makes the resulting K clusters have approximately equal size, and consequently the speech tokens in all clusters are equally sufficient to train separate HMM paths. The SSEM algorithm for TC is briefly shown in Table. 1.

6 Table 2. Split and Merge EM for Speech Trajectory Clustering 1. Randomly initialize the parameters of K mixture components; 2. Run EM on all mixture components until convergence; 3. Collect a list of merge candidates with respect to Eq.(14), and a list of split candidates with respect to ω k ; sort these list; 4. Select the most promising split and merge triplet {i, j, k} from the sorted candidate list; 5. Perform the the split and merge operation, initialize the parameter of new model with respect to Eq.(8) (13); 6. Run EM on all mixture components until convergence; 7. If the log-likelihood improved after split and merge, save the newly estimated parameters, ignore the other candidates, go back to step 3; otherwise, reject it; 8. Loop to step 4 until no candidate is available in the list. 2.3 Split and Merge EM In recent research [10], the idea of performing split and merge operations has been successfully applied to the EM for Gaussian mixture models. In the case of mixture models, local maxima found by EM often involve having too many components of a mixture model in one part of the space and too few in another. Thus, it is possible to avoid local maxima by introducing a merge operation, which merges components in regions that contain too many highly similar clusters, and a split operation, which splits components in regions where dissimilar tokens are combined in one cluster. This Split and Merge EM algorithm can also be applied to TC. SMEM starts from randomly initialized parameters of TC model with K components, and the model parameters are then estimated by a standard EM procedure. With respect to a certain split and merge criterion, a number of merge and split candidates {i, j, k} s are selected. Here {i, j} denotes the components pairs to be merged, and k is the component to be split. The most promising candidate is then selected, and the split and merge operations on this candidate are performed simultaneously so that the total number of components K is unchanged. After the split of component k, the parameters of the new components {j, k } are initialized by Eq.(8) (10). After the merge of components {i, j}, the initialization of the new component i is set as a linear combination of the original ones before merge:

7 β i,0 = ω iβ i,0 + ω j β j,0 ω i + ω j (11) Σ i = ω iσ i + ω j Σ j ω i + ω j (12) ω i = ω i + ω j (13) The newly generated model after split and merge operations then are subjected to the EM procedure. If the likelihood is better than before split and merge, save the estimated parameters and go back to the candidate selection. Otherwise, reject the new model, and select another candidate to split and merge. This procedure is iteratively performed, until no candidate produces better results than the old one. Note that in theory, the total number of available split and merge candidates is K(K 1)(K 2)/2. However, experiments have shown that it is only necessary to test about 5 promising candidates at each iteration. The merge criterion adopted in this work is defined as follow: J merge (i, j) = h T i h j (14) where h i is row vector [h 1i h 2i... h Mi ] containing the membership probabilities (cf. Eq.(4)) of all trajectory belonging to component i. The idea underlying this merge criterion is: if there are many trajectories which have almost equal membership probability for two components, it is reasonable to assume that these two components can be merged. The SMEM algorithm for TC is summarized in Table Path Mixture Multiple-HMMs Model With the results of TC, multiple HMM paths for a speech unit can be trained, based on the training tokens in different trajectory clusters. We refer to this model topology as the separate path model. An example model topology with two HMM paths is illustrated in Figure.1(a). The priori probability in separate HMM (a) Separate Path Model (b) Path Mixture Model Fig. 1. Model topologies for Separate Path Model and Path Mixture Model.

8 paths are equal to one. By recruiting two non-emitting states, the separate HMM paths are combined into one entity with weighted HMM paths (cf. Figure 1(b)). We refer to this model topology as the path-mixture model. The difference between the path-mixture model and the separate path model is not only the additional weights for parallel HMM paths, but also the way we train them. For the separate path models the HMM paths are trained by using separate sets of tokens corresponding to the trajectory clusters, whereas all the tokens are used to train path-mixture models by means of the Baum-Welch algorithm. Thus, the training of the path-mixture model is equivalent to clustering the tokens again as in the Mixture of Hidden Markov Models approach, but now with the initialization of the parameters obtained from TC-based separate path models. In our previous work [11], it was shown that Path Mixtures Models outperformed Separate Path Models. Thus, we used only Path Mixture Models in this work. In decoding, the Viterbi algorithm can also be directly used in path mixture models. It should be noted that in decoding when a search path begins with a state in a HMM path, it will end in the same HMM path, thus alleviating the trajectory folding problem. 3 Experiments 3.1 Speech Material The performance of the proposed TC based models was evaluated by applying it to a connected Dutch digit recognition task. The speech material for our experiments was taken from the Dutch POLYPHONE [12], SESP [13] and CASIMIR corpora [14]. For each of the corpora, speech was recorded over the public switched telephone network in the Netherlands. Among other things, the speakers were asked to read several connected digit strings. The number of digits in a string varied from 1 to 14. For training we used a set of 9,753 strings containing 61,592 digits. All models were evaluated with an independent set of 10,000 test utterances comprising 80,016 digits. None of the original utterances used for training or testing had a high background noise level. We computed 12 Mel-frequency log-energy coefficients using a 25 ms Hamming window shifted with 10 ms steps and a pre-emphasis factor of Based on a Fast Fourier Transform, 12 filter band energy values were calculated, with the filter bands triangularly shaped and uniformly distributed on a Mel-frequency scale. Mel-frequency cepstra were computed from the raw Mel-frequency logenergy coefficients using the DCT. Channel normalization was done by means of cepstrum mean subtraction over the entire utterance. Finally, we computed the first and second order time derivatives. Together with log-energy and first and second order delta log-energy we obtained 39 dimensional feature vectors. 3.2 Experimental Design In our experiments we used Head-Body-Tail (HBT) [2] models as the baseline system. HBT models account for pronunciation variation in a knowledge based

9 manner. Because pronunciation variation at the boundary of a digit is much larger than in the middle, each digit is split up into three parts. The middle part of a digit (the Body) is assumed to be context-independent. The first part (the Head) and the last part (the Tail) are dependent on the previous and subsequent digit (or silence), respectively. Thus, each digit is modeled as one context-independent body HMM and 11 context-dependent head and tail HMMs that can be conflated in models with 11 parallel paths. In all our experiments the head and tail HMMs consisted of three states, whereas the number of states in body models was based on the mean duration of the digit as observed in the train corpus [14]. In addition to digit models, one silence and one noise model, both consisting of three states, were built. All the HMM paths have the standard left-to-right no-skip topology. In addition to the knowledge-based multiple-hmms, we also built SSEM-TC and SMEM-TC based models. To that end, we used the baseline HBT models to segment the training data by means of forced alignment. This allowed us to cluster the Head and Tail parts of the training tokens of the ten digits. The segmented tokens of each Head or Tail part were then clustered into 11 subgroups with respect to both SSEM and SMEM strategies. Considering that the dependence between frames is explicitly modeled in TC, we only used the 12 MFCCs as the acoustic feature vector. Based on the clustering results obtained with SSEM-TC and SMEM-TC, the two types of TC-based multiple-hmms models with separate paths can be trained, and then subjected to four passes of Baum-Welch re-estimation to train path mixture models. The new models had the same three state left to right no-skip topologies as the baseline models. In training Multiple-HMMs models, we made use of 39 dimensional acoustic feature vectors. All the models in these experiments were trained and evaluated with HTK [15]. In order to study the improvements due to changes in acoustic modeling only, without the risk that the language model could mask the effects, we used a language model that only specifies that all digits have equal prior probability, and that each digit (or silence) can follow each other digit with equal prior probability. 3.3 Result and Discussion We applied the proposed algorithms to cluster the Head and Tail parts of all the 10 digits into 11 subgroups. Table 3 shows the summary statistics (mean, standard deviation (std), maximum, and minimum) of the log-likelihood values obtained by the standard EM, SSEM, and SMEM algorithms with 20 different simulations for the Head and Tail (H1 and T1) of digit /een/ (one), together with the equivalent log-likelihood values obtained by clustering based on context (Context). For the SSEM method only one result can be obtained with the successive splitting procedure used in this study. In Table 3 it is shown that the log-likelihoods for the TC model based on any of EM, SSEM and SMEM are much larger than that given by the clustering based on context, which strongly suggests that the clusters yielded by TC are

10 Table 3. Log Likelihoods found after the clustering ( 10 5 ) Unit Statistics EM SSEM SMEM Context mean std n.a H max n.a min n.a mean std n.a T max n.a min n.a more coherent than those produced by linguistic context criteria. As shown in Table 3, the log-likelihoods achieved by the SSEM and SMEM algorithms have lower variance than those achieved by the EM algorithm. Even the worst solution found by the SMEM algorithm was better than the best solutions found by SSEM algorithm. These results indicate that the proposed algorithms worked very well to avoid the local maxima of the likelihood surface, and the SMEM algorithm worked even better than SSEM. Fig. 2. Results of connected Dutch Digit recognition. Fig.2 illustrates the recognition performance of the baseline context-dependent HBT models, SSEM-TC based models and SMEM-TC based models. The results in this figure correspond to models with 1, 2, 4, 8, 16, 32 Gaussians in each HMM state. The error bars represent the 95% confidence interval of the measurements. From Fig.2, it can be seen that the recognition accuracies for

11 both SSEM-TC and SMEM-TC based models always significantly outperform HBT models. With lower model complexity, the advantages of TC based model are more obvious. These recognition results again prove the effectiveness of the proposed methods for defining Multiple-HMMs acoustic models. Comparing SMEM-TC and SSEM-TC based models, the recognition performance of former is significantly better than later for the models with 1, 2, 4, 8 Gaussians per HMM state. For the models with 16 and 32 Gaussians per state, SSEM-TC models are competitive to SMEM-TC models. For the 32 Gaussian systems, the performance of SSEM-TC is even slightly better than the SMEM-TC models. This is because the variance of the number of tokens in the clusters produced by SMEM-TC is larger than in the case of SSEM-TC. As a consequence, the number of training tokens in some of the clusters yielded by SMEM-TC may have been too small to accurately train the separate HMM paths. However, one should be aware that the level of pronunciation variation in connected digits recognition task is not very high. When facing recognition tasks with a high level variation, the SMEM-TC might outperform SSEM-TC even with high model complexity. 4 Conclusion In this paper, we investigated the effectiveness of the SSEM and SMEM algorithm in solving the problem of initialization dependence in standard EM in Speech Trajectory Clustering. The SSEM and SMEM are reformulated versions of the standard EM algorithm, which can partly avoid the local maxima of the likelihood surface by the means of incrementally increasing the number of mixture components and heuristically reallocating the mixture components in the data space, respectively. The clustering results showed that SMEM gave more coherent clusters than the SSEM and knowledge-based methods. To evaluate the performance of SSEM-TC and SMEM-TC based Multiple- HMM acoustic models, a number of experiments were carried out to compare their performance with context dependent HBT models in a connected digits recognition task. The results show that both the SSEM-TC and the SMEM-TC based models always significantly outperformed the conventional HBT models. When the model complexity is low, the recognition accuracy of SMEM-TC based models is significantly better than SSEM-TC based models. For models with high complexity, SMEM-TC based models is competitive to SSEM-TC based models. From the experimental results of the TC based models in Dutch digit recognition, we believe that this novel data-driven method in Multiple-HMMs acoustic modeling can improve the performance of Chinese speech recognition as well. In our future work, we will consider the possibility to apply the proposed method to Chinese speech. Furthermore, we will investigate the relation among local maxima, amount of training data and model complexity. Moreover, a method for automatically deriving the optimal number of parallel HMM paths with respect to some statistical criteria is also a very promising direction to improve TC-Base Multiple-HMM acoustic modeling.

12 Acknowledgements The research is part of the Interactive Multimodal Information extraction (IMIX) program, which is funded by the Netherlands Organization for Scientific Research (NWO). References 1. I. Illina and Y. Gong, Elimination of trajectory folding phenomenon: HMM, Trajecotry Mixture HMM and Mixture Stochastic Trajectory model, In Proceedings of ICASSP-97, vol. 2, pp , W. Chou, C. Lee, and B. Juang, Minimum error rate training of inter-word context-dependent acoustic model units in speech recognition, In Proceedings of ICSLP-94, pp , J. Picone, Duration in context clustering for speech recognition, Speech Communication, vol. 9, pp , F. Korkmazskiy, Generalized mixture of HMMs for continuous speech recognition, In Proceedings of ICASSP97, pp , Y. Han, J. de Veth, and L. Boves, Speech trajactory clustering for inproved speech recognition, In Proceedings of INTERSPEECH-2005, September Y. Han, A. Hamalainen, and L. Boves, Trajectory clustering of syllable-length acoustic models for continuous speech recognition, In Proceedings of ICASSP- 2006, April S. Gaffney and P. Smyth, Trajectory clustering with mixtures of regression models, In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp , Y. Han, J. de Veth, and L. Boves, Trajectory clustering for automatic speech recognition, In Proceedings of EUSIPCO-2005, September Y. Linde, A. Buzo, and R. M. Gray, An algorithm fro the vector quantizer design, IEEE Trans. Commun., vol. COM-28, pp , N. Ueda and R. Nakano, EM algorithm with split and merge operations for mixture models, Systems and Computers in Japan, vol. 32, pp. 1 11, Y. Han and L. Boves, Syllable-length path mixture Hidden Markov Models with trajectory clustering for continuous speech recognition, In Proceedings of INTERSPEECH-2006, September E. den Os, T. Boogaart, L. Boves, and E. Klabbers, The Dutch Polyphone corpus, In Proceedings of EuroSpeech-95, pp , F. Bimbot, An overview of the CAVE project research activaties in speaker verification, Speech Communication, vol. 31, pp , J. Sturm and E. Sanders, Modelling phonetic context using Head-Body-Tail acoustic models for connected digit recognition, In Proceedings of ICSLP-2000, vol. 1, pp , S. Young, G. Evermann, and T. Hain, The HTK Book (for HTK version 3.2.1). Cambridge University Engineering Department, 1997.

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Introduction to HTK Toolkit

Introduction to HTK Toolkit Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION Hung-An Chang and James R. Glass MIT Computer Science and Artificial Intelligence Laboratory Cambridge, Massachusetts, 02139,

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

THE most popular training method for hidden Markov

THE most popular training method for hidden Markov 204 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 A Discriminative Training Algorithm for Hidden Markov Models Assaf Ben-Yishai and David Burshtein, Senior Member, IEEE Abstract

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Variable-Component Deep Neural Network for Robust Speech Recognition

Variable-Component Deep Neural Network for Robust Speech Recognition Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Speaker Diarization System Based on GMM and BIC

Speaker Diarization System Based on GMM and BIC Speaer Diarization System Based on GMM and BIC Tantan Liu 1, Xiaoxing Liu 1, Yonghong Yan 1 1 ThinIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing 100080 {tliu, xliu,yyan}@hccl.ioa.ac.cn

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Observational Learning with Modular Networks

Observational Learning with Modular Networks Observational Learning with Modular Networks Hyunjung Shin, Hyoungjoo Lee and Sungzoon Cho {hjshin72, impatton, zoon}@snu.ac.kr Department of Industrial Engineering, Seoul National University, San56-1,

More information

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on [5] Teuvo Kohonen. The Self-Organizing Map. In Proceedings of the IEEE, pages 1464{1480, 1990. [6] Teuvo Kohonen, Jari Kangas, Jorma Laaksonen, and Kari Torkkola. LVQPAK: A program package for the correct

More information

PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK

PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK Vladimir Popescu 1, 2, Corneliu Burileanu 1, Monica Rafaila 1, Ramona Calimanescu 1 1 Faculty

More information

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

Client Dependent GMM-SVM Models for Speaker Verification

Client Dependent GMM-SVM Models for Speaker Verification Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)

More information

HIDDEN Markov model (HMM)-based statistical parametric

HIDDEN Markov model (HMM)-based statistical parametric 1492 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Minimum Kullback Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis Zhen-Hua Ling, Member,

More information

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION Hazim Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs, Universität Karlsruhe (TH) 76131 Karlsruhe, Germany

More information

Optimization of HMM by the Tabu Search Algorithm

Optimization of HMM by the Tabu Search Algorithm JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 20, 949-957 (2004) Optimization of HMM by the Tabu Search Algorithm TSONG-YI CHEN, XIAO-DAN MEI *, JENG-SHYANG PAN AND SHENG-HE SUN * Department of Electronic

More information

Constraints in Particle Swarm Optimization of Hidden Markov Models

Constraints in Particle Swarm Optimization of Hidden Markov Models Constraints in Particle Swarm Optimization of Hidden Markov Models Martin Macaš, Daniel Novák, and Lenka Lhotská Czech Technical University, Faculty of Electrical Engineering, Dep. of Cybernetics, Prague,

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Blur Space Iterative De-blurring

Blur Space Iterative De-blurring Blur Space Iterative De-blurring RADU CIPRIAN BILCU 1, MEJDI TRIMECHE 2, SAKARI ALENIUS 3, MARKKU VEHVILAINEN 4 1,2,3,4 Multimedia Technologies Laboratory, Nokia Research Center Visiokatu 1, FIN-33720,

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System

ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System Nianjun Liu, Brian C. Lovell, Peter J. Kootsookos, and Richard I.A. Davis Intelligent Real-Time Imaging and Sensing (IRIS)

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

Multi-Modal Human Verification Using Face and Speech

Multi-Modal Human Verification Using Face and Speech 22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based

More information

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,

More information

Speech User Interface for Information Retrieval

Speech User Interface for Information Retrieval Speech User Interface for Information Retrieval Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic Institute, Nagpur Sadar, Nagpur 440001 (INDIA) urmilas@rediffmail.com Cell : +919422803996

More information

Radial Basis Function Neural Network Classifier

Radial Basis Function Neural Network Classifier Recognition of Unconstrained Handwritten Numerals by a Radial Basis Function Neural Network Classifier Hwang, Young-Sup and Bang, Sung-Yang Department of Computer Science & Engineering Pohang University

More information

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden

More information

IMPROVED SIDE MATCHING FOR MATCHED-TEXTURE CODING

IMPROVED SIDE MATCHING FOR MATCHED-TEXTURE CODING IMPROVED SIDE MATCHING FOR MATCHED-TEXTURE CODING Guoxin Jin 1, Thrasyvoulos N. Pappas 1 and David L. Neuhoff 2 1 EECS Department, Northwestern University, Evanston, IL 60208 2 EECS Department, University

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

WHO WANTS TO BE A MILLIONAIRE?

WHO WANTS TO BE A MILLIONAIRE? IDIAP COMMUNICATION REPORT WHO WANTS TO BE A MILLIONAIRE? Huseyn Gasimov a Aleksei Triastcyn Hervé Bourlard Idiap-Com-03-2012 JULY 2012 a EPFL Centre du Parc, Rue Marconi 19, PO Box 592, CH - 1920 Martigny

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

AN ITERATIVE APPROACH TO DECISION TREE TRAINING FOR CONTEXT DEPENDENT SPEECH SYNTHESIS. Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson

AN ITERATIVE APPROACH TO DECISION TREE TRAINING FOR CONTEXT DEPENDENT SPEECH SYNTHESIS. Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson AN ITERATIVE APPROACH TO DECISION TREE TRAINING FOR CONTEXT DEPENDENT SPEECH SYNTHESIS Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson Department of Electrical and Computer Engineering, University of Illinois

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

A study of large vocabulary speech recognition decoding using finite-state graphs 1

A study of large vocabulary speech recognition decoding using finite-state graphs 1 A study of large vocabulary speech recognition decoding using finite-state graphs 1 Zhijian OU, Ji XIAO Department of Electronic Engineering, Tsinghua University, Beijing Corresponding email: ozj@tsinghua.edu.cn

More information

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2 A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Confidence Measures: how much we can trust our speech recognizers

Confidence Measures: how much we can trust our speech recognizers Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition

More information

Robust color segmentation algorithms in illumination variation conditions

Robust color segmentation algorithms in illumination variation conditions 286 CHINESE OPTICS LETTERS / Vol. 8, No. / March 10, 2010 Robust color segmentation algorithms in illumination variation conditions Jinhui Lan ( ) and Kai Shen ( Department of Measurement and Control Technologies,

More information

A Gaussian Mixture Model Spectral Representation for Speech Recognition

A Gaussian Mixture Model Spectral Representation for Speech Recognition A Gaussian Mixture Model Spectral Representation for Speech Recognition Matthew Nicholas Stuttle Hughes Hall and Cambridge University Engineering Department PSfrag replacements July 2003 Dissertation submitted

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints IEEE SIGNAL PROCESSING LETTERS 1 Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints Alexander Suhre, Orhan Arikan, Member, IEEE, and A. Enis Cetin,

More information

Image denoising in the wavelet domain using Improved Neigh-shrink

Image denoising in the wavelet domain using Improved Neigh-shrink Image denoising in the wavelet domain using Improved Neigh-shrink Rahim Kamran 1, Mehdi Nasri, Hossein Nezamabadi-pour 3, Saeid Saryazdi 4 1 Rahimkamran008@gmail.com nasri_me@yahoo.com 3 nezam@uk.ac.ir

More information

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION * Prof. Dr. Ban Ahmed Mitras ** Ammar Saad Abdul-Jabbar * Dept. of Operation Research & Intelligent Techniques ** Dept. of Mathematics. College

More information

A ROBUST SPEAKER CLUSTERING ALGORITHM

A ROBUST SPEAKER CLUSTERING ALGORITHM A ROBUST SPEAKER CLUSTERING ALGORITHM J. Ajmera IDIAP P.O. Box 592 CH-1920 Martigny, Switzerland jitendra@idiap.ch C. Wooters ICSI 1947 Center St., Suite 600 Berkeley, CA 94704, USA wooters@icsi.berkeley.edu

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Chao Zhang and Phil Woodland March 8, 07 Cambridge University Engineering Department

More information

Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data

Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data Martin Karafiát Λ, Igor Szöke, and Jan Černocký Brno University of Technology, Faculty of Information Technology Department

More information

SMEM Algorithm for Mixture Models

SMEM Algorithm for Mixture Models SMEM Algorithm for Mixture Models N aonori U eda Ryohei Nakano {ueda, nakano }@cslab.kecl.ntt.co.jp NTT Communication Science Laboratories Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237 Japan Zoubin

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Some questions of consensus building using co-association

Some questions of consensus building using co-association Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Evaluation of Model-Based Condition Monitoring Systems in Industrial Application Cases

Evaluation of Model-Based Condition Monitoring Systems in Industrial Application Cases Evaluation of Model-Based Condition Monitoring Systems in Industrial Application Cases S. Windmann 1, J. Eickmeyer 1, F. Jungbluth 1, J. Badinger 2, and O. Niggemann 1,2 1 Fraunhofer Application Center

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao GMM-FREE DNN TRAINING Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao Google Inc., New York {andrewsenior,heigold,michiel,hankliao}@google.com ABSTRACT While deep neural networks (DNNs) have

More information

COMP5318 Knowledge Management & Data Mining Assignment 1

COMP5318 Knowledge Management & Data Mining Assignment 1 COMP538 Knowledge Management & Data Mining Assignment Enoch Lau SID 20045765 7 May 2007 Abstract 5.5 Scalability............... 5 Clustering is a fundamental task in data mining that aims to place similar

More information

Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization

Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization Ke Li 1, Kalyanmoy Deb 1, Qingfu Zhang 2, and Sam Kwong 2 1 Department of Electrical and Computer

More information

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION 1 M.S.Rekha, 2 S.G.Nawaz 1 PG SCALOR, CSE, SRI KRISHNADEVARAYA ENGINEERING COLLEGE, GOOTY 2 ASSOCIATE PROFESSOR, SRI KRISHNADEVARAYA

More information

Opinion Mining by Transformation-Based Domain Adaptation

Opinion Mining by Transformation-Based Domain Adaptation Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose

More information

Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm

Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm Griffith Research Online https://research-repository.griffith.edu.au Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm Author Paliwal,

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

OPTIMIZING A VIDEO PREPROCESSOR FOR OCR. MR IBM Systems Dev Rochester, elopment Division Minnesota

OPTIMIZING A VIDEO PREPROCESSOR FOR OCR. MR IBM Systems Dev Rochester, elopment Division Minnesota OPTIMIZING A VIDEO PREPROCESSOR FOR OCR MR IBM Systems Dev Rochester, elopment Division Minnesota Summary This paper describes how optimal video preprocessor performance can be achieved using a software

More information

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH Ignazio Gallo, Elisabetta Binaghi and Mario Raspanti Universitá degli Studi dell Insubria Varese, Italy email: ignazio.gallo@uninsubria.it ABSTRACT

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Hierarchical Mixture Models for Nested Data Structures

Hierarchical Mixture Models for Nested Data Structures Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands

More information

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton

More information

Introduction to The HTK Toolkit

Introduction to The HTK Toolkit Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools A Tutorial Example

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Speaker Verification with Adaptive Spectral Subband Centroids

Speaker Verification with Adaptive Spectral Subband Centroids Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21

More information

Det De e t cting abnormal event n s Jaechul Kim

Det De e t cting abnormal event n s Jaechul Kim Detecting abnormal events Jaechul Kim Purpose Introduce general methodologies used in abnormality detection Deal with technical details of selected papers Abnormal events Easy to verify, but hard to describe

More information

The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering

The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering 1 D. Jareena Begum, 2 K Rajendra Prasad, 3 M Suleman Basha 1 M.Tech in SE, RGMCET, Nandyal 2 Assoc Prof, Dept

More information

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Face Hallucination Based on Eigentransformation Learning

Face Hallucination Based on Eigentransformation Learning Advanced Science and Technology etters, pp.32-37 http://dx.doi.org/10.14257/astl.2016. Face allucination Based on Eigentransformation earning Guohua Zou School of software, East China University of Technology,

More information

Automatic Speech Recognition using Dynamic Bayesian Networks

Automatic Speech Recognition using Dynamic Bayesian Networks Automatic Speech Recognition using Dynamic Bayesian Networks Rob van de Lisdonk Faculty Electrical Engineering, Mathematics and Computer Science Delft University of Technology June 2009 Graduation Committee:

More information