Clustering Sequence Data using Hidden Markov Model. Representation. Cen Li and Gautam Biswas. Box 1679 Station B,

Size: px
Start display at page:

Download "Clustering Sequence Data using Hidden Markov Model. Representation. Cen Li and Gautam Biswas. Box 1679 Station B,"

Transcription

1 Clustering Sequence Data using Hidden Markov Model Representation Cen Li and Gautam Biswas Box 1679 Station B, Department of Computer Science, Vanderbilt University, Nashville, TN USA ABSTRACT This paper proposed a clustering methodology for sequence data using hidden Markov model(hmm) representation. The proposed methodology improves upon existing HMM based clustering methods in two ways: (i) it enables HMMs to dynamically change its model structure to obtain a better t model for data during clustering process, and (ii) it provides objective criterion function to select the optimal clustering partition. The algorithm is presented in terms of four nested levels of searches: (i) the search for the optimal number of clusters in a partition, (ii) the search for the optimal structure for a given partition, (iii) the search for the optimal HMM structure for each cluster, and (iv) the search for the optimal HMM parameters for each HMM. Preliminary results are given to support the proposed methodology. Keywords: clustering, hidden Markov model, model selection, Bayesian Information Criterion(BIC), mutual information 1. INTRODUCTION Clustering assumes data is not labeled with class information. The goal is to create structure for data by objectively partitioning data into homogeneous groups where the within group object similarity and the between group object dissimilarity are optimized. The technique has been used extensively and successfully by data mining researchers in discovering structures from databases where domain knowledge is not available or incomplete 1. 2 In the past, the focus of clustering analysis has been on data described with static features 123, i.e., valuesofthe features do not change during the observation period. Examples of static features include an employee's educational level and salary, or a patient's age, gender, and weight. In real world, most systems are dynamic which can often be best described by temporal features, whose values change signicantly during observation period. Examples of temporal features include monthly ATM transactions and account balances of bank customers, and blood pressure, temperature and respiratory rate of patients under intensive care. This paper addresses the problem of clustering data described by temporal features. Clustering temporal data is inherently more complex than clustering static data because (i) the dimensionality of the data is signicantly larger in the dynamic case, and (ii) the complexity of cluster denition(modeling) and interpretation increases by orders of magnitude with dynamic data. 5 We choose hidden Markov model representation for our temporal data clustering problem. There are a number of advantages in the HMM representation for our problem: There are direct links between the HMM states and real world situations for the problem under consideration. The hidden states of a HMM can be used to eectively model the set of potentially valid states of a dynamic process. While the exact sequence of stages going through by a dynamic system may not be observed, it can be estimated based on observable behavior of the systems. HMMs represent a well-dened probabilistic model. The parameters of a HMM can be determined in a precise, well-dened manner, using methods such as maximal likelihood estimates or maximal mutual information criterion. HMMs are graphical models of underlying dynamic processes that govern system behavior. Graphical models may aid the interpretation task.

2 Clustering using HMMs was rst mentioned by Rabiner et al. 6 for speech recognition problems. The idea has been further explored by other researchers including Lee, 7 Dermatas and Kokkinakis, 8 Lee, 7 Kosaka et al., 9 and Smyth. Two main problems that have been identied in these works are: (i) no objective criterion measure is used for determining the optimal size of the clustering partition, and (ii) uniform, pre-specied HMM structure is assumed for dierent clusters of each partition. This paper describes a HMM clustering methodology that tries to remedy these two problems by developing an objective partition criterion measure based on model mutual information, and by developing an explicit HMM model renement procedure that dynamically modify HMM structures during clustering process. 2. PROPOSED HMM CLUSTERING METHODOLOGY The proposed HMM clustering method can be summarized in terms of four levels of nested searches. From the outer most to the inner most level, the four searches are: the search for 1. the optimal number of clusters in a partition, 2. the optimal structure for a given partition, 3. the optimal HMM structure for each cluster, and. the optimal HMM parameters for each cluster. Starting from the inner most level of search, each of these four search steps are described in more detail next Search Level : HMM Parameter Reestimation This step tries to nd the maximal likelihood parameters for the HMM of a xed size. The well known Baum- Welch parameter reestimation procedure 11 is used for this purpose. The Baum-Welch procedure is a variation of the more general EM algorithm, 12 which iterates between two steps: (i) the expectation step(e-step), and (ii) the maximization step(m-step). The E-step assumes the current parameters of the model and computes the expected values of a necessary statistics. The M-step uses these statistics to update the model parameters so as to maximize the expected likelihood of the parameters. 13 The procedure is implemented using the forward-backward computations Search level 3: the optimal HMM structure This step attempts to replace an existing model for a group of objects by a more accurate and rened HMM model. Solcke and Omohundro 1 described a technique for inducing the structure of HMMs from data based on a general \model merging" strategy. Takami and Sagayama 16 proposed the Successive State Splitting(SSS) algorithm to model context-dependent phonetic variations. Ostendorf and Singer 17 further expanded the basic SSS algorithm by choosing the node and the candidate split at the same time based on the likelihood gains. Casacuberta et. al 18 proposed to derive the structure of HMM through error correcting grammatical inference techniques. Our HMM renement procedure combines ideas from the past works. We start with an initial model conguration and incrementally grow or shrink the model through HMM state splitting and merging operations for choosing the right size model. The goal is to obtain a model that can better account for the data, i.e., having a higher model posterior probability. For both merge and split operations, we assume the Viterbi path does not change after each operation, that is for the split operation, the observations that were in state s will reside in either one of the two new states, q 0 or q 1. The same is true for the merge operation. This assumption can greatly simplify the parameter estimation process for the new states. The choice of state(s) to apply the split(merge) operation is dependent upon the state emission probabilities. For the split operation, the state that has the highest variances is split. For the merge operation, the two states that have the closest mean vector are considered for merging. Next we describe two criteria measure we propose to use for HMM model selection: (i) Posterior Probability Measure(PPM), and (ii) Bayesian information criterion.

3 Posterior probabilities for HMMs The computation for Posterior Probability of a HMM model (PPM) is based on the computation for Bayesian model merging criterion in. 1 The Bayesian model merging criterion trades the model likelihood against bias towards a simpler model. Assume the a prior probability of a fully parameterized model, ( ) the model structure and the model parameter, is uniformly distributed. Given some data X, using Bayes's rule, the posterior probability of P ( )P (Xj ) the model P ( jx) can be expressed as: P ( jx) = P / P ( )P (Xj ), where P (Xj ) is the (X) likelihood function. We propose to extend Stolcke and Omohundro's P ( jx) computation in Discrete Density HMMs(DDHMMs) for our Continuous Density HMM model. We decompose model into three independent components: its global structure, G, the transitions from each state q, trans, (q) and the emissions within each state, ( (q) (q) ). Assuming parameters associated with one state are independent of those in another state, the model prior can be written as P ( ) =P ( G ) Y q2q P ( (q) trans j(q) G ) Y q2q P ( (q) (q) j(q) G ): The structure of the model is modeled with an exponential distribution which explicitly bias towards smaller models: P ( G ) / C ;N where C is a constant, and C>1. Since the transitions represent discrete, nite probabilistic choices of the next state, Dirichlet distribution is used for calculating the probability of transitions from each state 1 1 : P ( (q) trans j G)= ty n (q) 1 B( t ::: t ) i=1 qi t;1 where qi are the transition probabilities at state q, i ranging over the states that can follow q. t is the prior weights, and can be chosen to introduce more or less bias towards uniform assignment of the parameters. This prior has a desirable characteristic that it favors state congurations that have less yet more signicant output transitions. For our single component CDHMM case, we propose to use Jerey's prior for the location-scale parameters, i.e., the mean vector and variance matrices associated with each state 19 : P (( (q) (q) )j G)= ;1 : (q) This location-scale prior shows that data having a smaller lead to a more accurate determination of parameter. In the case of CDHMM state conguration, this prior awards CDHMMs with clearly dened states, i.e., the variances,, associated with these states are small Bayesian Information Criterion for HMMs One problem with PPM criterion is that it depends heavily on the base value, C, of the exponential distribution for the global model structure probability. Currently, we do not have a strategy for selecting the exponential base value for dierent problems and the model selection performance deteriorates if the right base value is not used. An alternative scheme is the Bayesian model selection approach. A criterion that is often used by Bayesian model selection is relative model posterior probability, P ( X), given by P ( X) =P ()P (Xj): By assuming an uniform prior probability for dierent models, P ( X) / P (Xj), where P (Xj) is the marginal likelihood. The goal of this approach is to select the model that gives the highest marginal likelihood. Computing marginal likelihood for complex models has been an active research area. Established approaches include Monte-Carlo methods, i.e., Gibbs sampling methods 23, 2 and various approximation methods, i.e., the Laplace approximation and approximation based on Bayesian information criterion. 21 Ithasbeenwell documented that the Monte-Carlo methods are very accurate, but are computationally inecient especially for large databases. It is also shown that under certain regularity conditions, Laplace approximation can be quite accurate, but its computation can be expensive, especially for its component Hessian matrix computation.

4 A widely used and very ecient approximation method for marginal likelihood is Bayesian information criterion where, in log form, marginal likelihood of a model given data is computed as: logp(jx) =logp(xj ) ; d 2 logn where is Maximum Likelihood(ML) conguration of the model, d is the dimensionality of the model parameter space and N is the number of cases in data. We choose BIC as our alternative HMM model selection criterion Search Level 2: the optimal partition structure The two most commonly used distance measures in the context of the HMM representation is the sequence-tomodel likelihood measure and the symmetrized distance measure between pairwise models. 26 Wechoose the sequence-to-model likelihood distance measure for our HMM clustering algorithm. Sequence-to-HMM likelihood, P (Oj), measures the probability that a sequence, O, is generated by a given model,. When the sequence-to- HMM likelihood distance measure is used for object-to-cluster assignments, it automatically enforces the maximizing within-group similarity criterion. A K-means style clustering control structure and a depth-rst binary divisive clustering control structure are proposed to generate partitions having dierent number of clusters. For each partition, the initial object-to-cluster memberships are determined by the sequence-to-hmm likelihood(see Section 2.2.1) distance measure. The objects are subsequently redistributed after HMM parameter reestimation and HMM model renement have been applied in the intermediate clusters. For the K-means algorithm, the redistribution is global for all clusters. For binary hierarchical clustering, the redistribution is carried out between the child clusters of the current cluster. Thus the algorithm is not guaranteed to produce the maximally probable partition of the data set. If the goal is to have a single partition of data, K-means style control structure may be used. If one wants to look at partitions at various levels of details, binary divisive clustering may be suitable. 2.. Search level 1: the optimal number of clusters in the partition The quality of a clustering is measured in terms of its within cluster similarity andbetween cluster dissimilarity. A common criterion measure used by anumber of HMM clustering schemes is the overall likelihood of data given models of the set of clusters. Since our distance measure does well in maximizing the homogeneity of objects within each cluster, we want a criterion measure that is good at comparing partitions in terms of their between-cluster distances. We use the Partition Mutual Information(PMI) measure 27 for this task. From Bayes rule, the posterior probability of a model, i, trained on data, O i,isgiven by: P ( i jo i )= P (O ij i )P ( i ) P (O i ) = P (O i j i )P ( i ) P J P (O j=1 ij j )P ( j ) where P ( i ) is the prior probability of a data coming from cluster i before the feature values are inspected, and P (O i j i ) is the conditional probability of displaying the feature O i given that it comes from cluster i. Let MI i represent the average mutual information between the observation sequence O i and the complete set of models =( 1 ::: J ): MI i = logp( i jo i ) = log(p(o i j i )P( i )) ; logpj j=1 P(O ij j )P( j ): Maximizing this value is equivalent to separating the correct model i from all other models on the training sequence O i. Then, the overall information of the partition with J models is computed by summing over the mutual information PJ j j=1pn i=1 MIi of all training sequences: PMI = J where n j is the number of objects in cluster j, andj is the total number of clusters in a partition. PMI is maximized when the J models are the most separated set of models, without fragmentations.

5 Object 1 Object 2 Object 3 Object Feature 1 Feature Figure 1. Objects generated from dierent HMMs 3. PRELIMINARY RESULTS We have conducted preliminary experiments with HMM clustering on articially generated data. Since we have not nished implementing the HMM renement procedure, in the following experiments, we assume the correct model structure is known and xed throughout the clustering process. Therefore, uniform prior distributions are assumed for all HMMs in computation. For these experiments, the objectives of the HMM clustering is to derive a good partition with optimal number of clusters and object-cluster memberships. To generate data with K clusters, rst we manually create K HMMs. From each of these K HMMs, we generate N k objects, each described with M temporal sequences. The length of each temporal sequence is L. The total data points for such a data set is K N k M L. In these experiments, we choose K =,N k =,M = 2, and L = 0. The HMM for each cluster has 5 states. Figure 1 shows four example data objects from these models. It is observed that, from the feature values, it is quite dicult to dierentiate which objects are generated from the same model. In fact, objects 1 and 3 are generated from the same model and objects 2 and are generated from a dierent model Experiment One In this experiment, we illustrate the binary HMM clustering process and the eects of the PMI criterion measure. In the rst part of the experiment, the PMI criterion measure was not incorporated in the binary clustering tree building process. The branches of the tree is terminated either because there are too few objects in the node, or because the object redistribution process in a node ends with one cluster partition. The full binary clustering tree, as well as the PMI scores for intermediate and nal partitions are computed and shown in Figure 2(a). The PMI scores to the right of the tree indicate the quality of the current partition, which includes all nodes at the frontier of the current tree. For example, the PMI score for the partition having clusters C and C 123 is 0.0, and PMI score for the partition having clusters C, C 2, 26C 2,andC 13 is ;1:75 2. The result of this clustering process is a 7-cluster partition, with six fragmented clusters, i.e., cluster C 2 is fragmented into C 2 and 26C 2, cluster C 3 is fragmented into 1 C 3, 29C 3, and cluster C 1 is fragmented into C 1 and 26C 1. Figure 2(b) shows the binary HMM clustering tree where PMI criterion measure is used for determining branch terminations. The dotted lines cut o branches of the search tree where the split of the parent cluster results in a decrease in the PMI score. This clustering process rediscovers the correct -cluster partition.

6 C 123 PMI C 123 C C C C 123 C 2 C PMI bef ore =0:0 C 2 C 13 C 2 26 C PMI af ter = ;87:3 C 2 26 C 2 C 3 C PMI bef ore =0:0 C 3 C 1 1 C 3 29 C 3 C 1 26 C PMI af ter = ;:13 1 C 3 PMI bef ore =0:0 PMI af ter = ;58:7 29 C 3 C 1 26 C 1 (a) (b) Figure 2. The binary HMM clustering tree. Misclassification Misclassification 0 35 K-Means Binary Divisive (a) S/N ratio 5 (b) Binary Divisive K-means Number of States Figure 3. HMM clustering results: (a) data having dierent levels of noise, (b) clustering starting with dierent size HMMs

7 3.2. Experiment Two In this experiment, we want to study the performance of the HMM clustering system with data being corrupted by dierent levels of noises. White Gaussian noises were added to the data. The added noise was computed at dierent signal-to-noise ratios. 28 More noise is successively added to the original -cluster data, i.e., the signal-to-noise ratio is successfully decreased from 35 to 1. Figure 3(a) shows the clustering results in terms of the misclassication counts vs. the signal-to-noise ratio. We observe that noise does not seem to have much eect on the clustering results until it is very large, i.e., S=NdB < 5. After that, the clustering process fails to separate out objects from three HMMs Experiment Three In this experiment, we study the eects of dierent initial HMM structure on clustering performance. Since the four original HMMs all have 5-states. The initial HMMs in this experiment have number of states ranging from 2 to 8. Figure 3(b) shows the results in terms of the misclassication counts versus number of states in the initial HMMs. The clustering results remains the same for initial HMMs having 3, and 5 states. For initial HMMs having 2 states, the misclassication is high. The algorithm fails to separate objects from HMM model 2 and 3. For initial HMMs having 6, 7, and 8 states, the clustering partition generated is close to optimal. This result agree with the intuition that the initial HMMs having too few states will result in worse clustering partition than cases when initial HMMs have too many states. The reason for this that when a model is too small, multiple state denitions have to be squeezed into one state, which makes the state denitions less specic and the model less accurate. On the other hand, when there are extra states in the model, the model can be made more accurate by dividing single state denition into multiple state denitions. At the very least, it can retain the original model by setting the transitions to the extra states very small to ignore those states. REFERENCES 1. P. Cheeseman and J. Stutz, \Bayesian classication(autoclass): Theory and results," in Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., ch. 6, pp. 3{180, AAAI-MIT press, G. Biswas, J. Weinberg, and C. Li, \Iterate: A conceptual clustering method for knowledge discovery in databases," in Articial Intelligence in Petroleum Industry: Symbolic and Computational Applications, B. Braunschweig and R. Day, eds., Teditions Technip, D. Fisher, \Knowledge acquisition via incremental conceptual clustering," Machine Learning 2, pp. 139{172, C. S. Wallace and D. L. Dowe, \Intrinsic classicatin by mml - the snob program," in Proceedings of the Seventh Australian Joint Conference onarticial Intelligence, pp. 37{, World Scientic, C. Li, \Unsupervised classication on temporal data." Survey paper, Department of Computer Science, Vanderbilt University, Apr L. R. Rabiner, C. H. Lee, B. H. Juang, and J. G. Wilpon, \Hmm clustering for connected word recognition," in Proceedings of the International Conference onacoustics, Speech, and Signal Processing, K. F. Lee, \Context-dependent phonetic hidden markov models for speaker-independent continuous speech recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing 38(), pp. 599{609, E. Dermatas and G. Kokkinakis, \Algorithm for clustering continuous density hmmby recognition error," IEEE Transactions on Speech and Audio Processing, pp. 231{23, May T. Kosaka, S. Masunaga, and M. Kuraoka, \Speaker-independent phone modeling based on speaker-dependent hmm's composition and clustering," in Proceedings of the ICASSP' 95, pp. 1{, P. Smyth, \Clustering sequences with hidden markov models," Advances in Neural Information Processing, L. E. Baum, T. Petrie, G. Soules, and N. Weiss, \A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains," The Annuals of Mathematical Statistics (1), pp. 16{171, A. P. Dempster, N. M. Laird, and D. B. Rubin, \Maximum likelihood from incomplete data via the em algorithm," Journal of Royal Statistical Society Series B(methodological) 39, pp. 1{38, 1977.

8 13. Z. Ghahramani and M. I. Jordan, \Factorial hidden markov models," Tech. Rep. 9502, MIT Comuter Cognitive Science, Aug A. Stolcke and S. M. Omohundro, \Best-rst model merging for hidden markov model induction," Tech. Rep. TR-9-003, International Computer Science Institute, 197 Center St., Suite 600, Berkeley, CA , Jan S. M. Omohundro, \Best-rst model merging for dynamic learning and recognition," Advances in Neural Information Processing Systems, pp. 958{965, J. Takami and S. Sagayama, \A successive state splitting algorithm for ecient allophone modeling," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1, pp. 573{576, M. Ostendorf and H. Singer, \Hmm topology design using maximum likelihood successive state splitting," Computer Speech and Language 11, pp. 17{1, F. Casacuberta, E. Vidal, and B. Mas, \Learning the structure of hmm's through grammatical inference techniques," in Proceedings of the International Conference on Acoustic, Speech, and Signal Processing, pp. 717{7, G. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis, Addison-Wesley Publishing Co., R. E. Kass and A. E. Raftery, \Bayes factor," Journal of the American Statistical Association, pp. 773{795, June D. Heckerman, \A tutorial on learning with bayesian networks," Tech. Rep. MSR-TR-95-06, Microsoft Research, Advanced Technology Division, One Microsoft Way, Redmond, WA 98052, G. F. Cooper and E. Herskovits, \A bayesian method for the induction of probabilistic network from data," Machine Learning 9, pp. 9{37, S. Chib, \Marginal likelihood from the gibbs sampling," Journal of the American Statistical Association, pp. 1313{1321, Dec C. G. and G. E. I., \Explaining the gibbs sampler," The American Statistician 6, pp. 167{17, Aug L. R. Rabiner, \A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE 77, pp. 7{285, Feb B. H. Juang and L. R. Rabiner, \A probabilistic distance measure for hidden markov models," AT&T Technical Journal 6, pp. 391{08, Feb L. R. Bahl, P. F. Brown, P. V. De Souza, and R. L. Mercer, \Maximum mutual information estimation of hidden markov model parameters," in Proceedings of the IEEE-IECEJ-AS International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 9{52, D. J. Mashao, Computations and Evaluations of an Optimal Feature-set for an HMM-based Recognizer. PhD thesis, Brown University, May 1996.

(a) (b) Penalty Prior CS BIC Likelihood -600 Likelihood

(a) (b) Penalty Prior CS BIC Likelihood -600 Likelihood A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models Cen Li cenli@vuse.vanderbilt.edu Gautam Biswas biswas@vuse.vanderbilt.edu Department of EECS, Box 1679 Station B, Vanderbilt University,

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

the number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc

the number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc Clustering Time Series with Hidden Markov Models and Dynamic Time Warping Tim Oates, Laura Firoiu and Paul R. Cohen Computer Science Department, LGRC University of Massachusetts, Box 34610 Amherst, MA

More information

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on [5] Teuvo Kohonen. The Self-Organizing Map. In Proceedings of the IEEE, pages 1464{1480, 1990. [6] Teuvo Kohonen, Jari Kangas, Jorma Laaksonen, and Kari Torkkola. LVQPAK: A program package for the correct

More information

Skill. Robot/ Controller

Skill. Robot/ Controller Skill Acquisition from Human Demonstration Using a Hidden Markov Model G. E. Hovland, P. Sikka and B. J. McCarragher Department of Engineering Faculty of Engineering and Information Technology The Australian

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Hierarchical Mixture Models for Nested Data Structures

Hierarchical Mixture Models for Nested Data Structures Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Similarity-Based Clustering of Sequences using Hidden Markov Models

Similarity-Based Clustering of Sequences using Hidden Markov Models Similarity-Based Clustering of Sequences using Hidden Markov Models Manuele Bicego 1, Vittorio Murino 1, and Mário A.T. Figueiredo 2 1 Dipartimento di Informatica, Università di Verona Ca Vignal 2, Strada

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Clustering Documents in Large Text Corpora

Clustering Documents in Large Text Corpora Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science

More information

ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System

ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System Nianjun Liu, Brian C. Lovell, Peter J. Kootsookos, and Richard I.A. Davis Intelligent Real-Time Imaging and Sensing (IRIS)

More information

A Model Selection Criterion for Classification: Application to HMM Topology Optimization

A Model Selection Criterion for Classification: Application to HMM Topology Optimization A Model Selection Criterion for Classification Application to HMM Topology Optimization Alain Biem IBM T. J. Watson Research Center P.O Box 218, Yorktown Heights, NY 10549, USA biem@us.ibm.com Abstract

More information

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop EM Algorithm Geoff McLachlan and Angus Ng Department of Mathematics & Institute for Molecular Bioscience University of Queensland Adapted by Joydeep Ghosh Schlumberger Chaired Professor Univ. of Texas

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Modeling time series with hidden Markov models

Modeling time series with hidden Markov models Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa, Jose Medina and Aude Billard Time series data Barometric pressure Temperature Data Humidity Time What s going

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Effect of Initial HMM Choices in Multiple Sequence Training for Gesture Recognition

Effect of Initial HMM Choices in Multiple Sequence Training for Gesture Recognition Effect of Initial HMM Choices in Multiple Sequence Training for Gesture Recognition Nianjun Liu, Richard I.A. Davis, Brian C. Lovell and Peter J. Kootsookos Intelligent Real-Time Imaging and Sensing (IRIS)

More information

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Networks for Control. California Institute of Technology. Pasadena, CA Abstract Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract

More information

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

THE most popular training method for hidden Markov

THE most popular training method for hidden Markov 204 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 A Discriminative Training Algorithm for Hidden Markov Models Assaf Ben-Yishai and David Burshtein, Senior Member, IEEE Abstract

More information

Hidden Markov Model for Sequential Data

Hidden Markov Model for Sequential Data Hidden Markov Model for Sequential Data Dr.-Ing. Michelle Karg mekarg@uwaterloo.ca Electrical and Computer Engineering Cheriton School of Computer Science Sequential Data Measurement of time series: Example:

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models

More information

Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree

Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree David Bellot and Pierre Bessière GravirIMAG CNRS and INRIA Rhône-Alpes Zirst - 6 avenue de l Europe - Montbonnot

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

images is then estimated using the function approximator, such as the neural networks, or by matching novel images to the examples in the database. Wi

images is then estimated using the function approximator, such as the neural networks, or by matching novel images to the examples in the database. Wi From Gaze to Focus of Attention Rainer Stiefelhagen 1, Michael Finke 2, Jie Yang 2, and Alex Waibel 12 1 Universitat Karlsruhe, Computer Science, ILKD Am Fasanengarten 5, 76131 Karlsruhe, Germany stiefel@ira.uka.de

More information

Ambiguity Detection by Fusion and Conformity: A Spectral Clustering Approach

Ambiguity Detection by Fusion and Conformity: A Spectral Clustering Approach KIMAS 25 WALTHAM, MA, USA Ambiguity Detection by Fusion and Conformity: A Spectral Clustering Approach Fatih Porikli Mitsubishi Electric Research Laboratories Cambridge, MA, 239, USA fatih@merl.com Abstract

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Variational Methods for Graphical Models

Variational Methods for Graphical Models Chapter 2 Variational Methods for Graphical Models 2.1 Introduction The problem of probabb1istic inference in graphical models is the problem of computing a conditional probability distribution over the

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

t 1 y(x;w) x 2 t 2 t 3 x 1

t 1 y(x;w) x 2 t 2 t 3 x 1 Neural Computing Research Group Dept of Computer Science & Applied Mathematics Aston University Birmingham B4 7ET United Kingdom Tel: +44 (0)121 333 4631 Fax: +44 (0)121 333 4586 http://www.ncrg.aston.ac.uk/

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models

Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models Manuele Bicego 1, Enrico Grosso 1, and Massimo Tistarelli 2 1 DEIR - University

More information

Norbert Schuff VA Medical Center and UCSF

Norbert Schuff VA Medical Center and UCSF Norbert Schuff Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics N.Schuff Course # 170.03 Slide 1/67 Objective Learn the principle segmentation techniques Understand the role

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Submitted to ACM Autonomous Agents 99. When building statistical machine learning models from real data

Submitted to ACM Autonomous Agents 99. When building statistical machine learning models from real data Submitted to ACM Autonomous Agents 99 A Synthetic Agent System for Bayesian Modeling Human Interactions Barbara Rosario, Nuria Oliver and Alex Pentland Vision and Modeling. Media Laboratory MIT, Cambridge,

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Mixture models and frequent sets: combining global and local methods for 0 1 data

Mixture models and frequent sets: combining global and local methods for 0 1 data Mixture models and frequent sets: combining global and local methods for 1 data Jaakko Hollmén Jouni K. Seppänen Heikki Mannila Abstract We study the interaction between global and local techniques in

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

A Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA -

A Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA - A hierarchical statistical framework for the segmentation of deformable objects in image sequences Charles Kervrann and Fabrice Heitz IRISA/INRIA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex,

More information

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds.

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds. Constrained K-Means Clustering P. S. Bradley K. P. Bennett A. Demiriz Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. Redmond, WA 98052 Renselaer

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Outline Objective Approach Experiment Conclusion and Future work Objective Automatically establish linguistic indexing of pictures

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org

More information

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION * Prof. Dr. Ban Ahmed Mitras ** Ammar Saad Abdul-Jabbar * Dept. of Operation Research & Intelligent Techniques ** Dept. of Mathematics. College

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Hidden Markov decision trees

Hidden Markov decision trees Hidden Markov decision trees Michael I. Jordan*, Zoubin Ghahramanit, and Lawrence K. Saul* {jordan.zoubin.lksaul}~psyche.mit.edu *Center for Biological and Computational Learning Massachusetts Institute

More information

Dependency detection with Bayesian Networks

Dependency detection with Bayesian Networks Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov

More information

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden

More information

1 1 λ ( i 1) Sync diagram is the lack of a synchronization stage, which isthe main advantage of this method. Each iteration of ITSAT performs ex

1 1 λ ( i 1) Sync diagram is the lack of a synchronization stage, which isthe main advantage of this method. Each iteration of ITSAT performs ex Fast Robust Inverse Transform SAT and Multi-stage ation Hubert Jin, Spyros Matsoukas, Richard Schwartz, Francis Kubala BBN Technologies 70 Fawcett Street, Cambridge, MA 02138 ABSTRACT We present a new

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

Time series, HMMs, Kalman Filters

Time series, HMMs, Kalman Filters Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Package HMMCont. February 19, 2015

Package HMMCont. February 19, 2015 Type Package Package HMMCont February 19, 2015 Title Hidden Markov Model for Continuous Observations Processes Version 1.0 Date 2014-02-11 Author Maintainer The package includes

More information

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991. Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Clustering Algorithms In Data Mining

Clustering Algorithms In Data Mining 2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017) Clustering Algorithms In Data Mining Xiaosong Chen 1, a 1 Deparment of Computer Science, University of Vermont,

More information

Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables

Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables Scott Davies and Andrew Moore School of Computer Science Carnegie Mellon University Pittsburgh,

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM) Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials José A. Gámez Computing Systems Department Intelligent Systems and Data Mining Group i 3 A University of Castilla-La

More information

Parameter Selection for EM Clustering Using Information Criterion and PDDP

Parameter Selection for EM Clustering Using Information Criterion and PDDP Parameter Selection for EM Clustering Using Information Criterion and PDDP Ujjwal Das Gupta,Vinay Menon and Uday Babbar Abstract This paper presents an algorithm to automatically determine the number of

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Model-Based Clustering for Online Crisis Identification in Distributed Computing

Model-Based Clustering for Online Crisis Identification in Distributed Computing Model-Based Clustering for Crisis Identification in Distributed Computing Dawn Woodard Operations Research and Information Engineering Cornell University with Moises Goldszmidt Microsoft Research 1 Outline

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft

More information

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an Competitive Neural Trees for Vector Quantization Sven Behnke and Nicolaos B. Karayiannis Department of Mathematics Department of Electrical and Computer Science and Computer Engineering Martin-Luther-University

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

A Web Recommendation System Based on Maximum Entropy

A Web Recommendation System Based on Maximum Entropy A Web Recommendation System Based on Maximum Entropy Xin Jin, Bamshad Mobasher,Yanzan Zhou Center for Web Intelligence School of Computer Science, Telecommunication, and Information Systems DePaul University,

More information

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December CS 532c Probabilistic Graphical Models N-Best Hypotheses Zvonimir Rakamaric Chris Dabrowski December 18 2004 Contents 1 Introduction 3 2 Background Info 3 3 Brute Force Algorithm 4 3.1 Description.........................................

More information

Constraints in Particle Swarm Optimization of Hidden Markov Models

Constraints in Particle Swarm Optimization of Hidden Markov Models Constraints in Particle Swarm Optimization of Hidden Markov Models Martin Macaš, Daniel Novák, and Lenka Lhotská Czech Technical University, Faculty of Electrical Engineering, Dep. of Cybernetics, Prague,

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauef and Charles A. Bournant *Department of Electrical Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 tschoo1

More information