Hidden Markov Models in the context of genetic analysis
|
|
- Josephine Short
- 5 years ago
- Views:
Transcription
1 Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012
2 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
3 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
4 The problem Many applications of statistics can be seen as a categorisation. We try to fit complex patterns into discrete boxes in order to apprehend them better. Clustering approaches are typical of this: Inference of an individual s ancestry being a mix of X and Y. Separation between high risk and low risk disease groups... Hidden Markov Models try to achieve exactly this purpose in a different context.
5 Basic framework
6 An example: gene discovery from DNA sequence
7 An example: gene discovery from DNA sequence We will first use this simplest example. We assume that the hidden chain X has two states: gene, or intergenic. To be complete there should be a third state: gene on the reverse strand. For now we assume that the emission probabilities P(Y i X i ) are independent conditionally on the hidden chain X. This may not be good enough for most applications but this is a place to start.
8 Notations (Y ) n i=1 represents the sequence of observed data points. The Y i can be discrete or continuous, but we will assume discrete for now. (X ) n i=1 is the sequence of hidden states. i, X i {1,..., S} and we have S discrete hidden states. We also assume that we know the distribution P(Y X ), but this set of parameters may also be unknown.
9 Basic description of Markov Chains (1) A discrete stochastic process X is Markovian is P(X n 1 X i ) = P(X i 1 1 X i )P(X i1 1 X i ) Essentially the future and the past are independent conditionally on the present: it is memory-less. One can easily make a continuous version of this. If the Markov model has S states, then the process can be described using a SxS transition matrix. The diagonal values p ii describe the probability to stay in state i.
10 Basic description of Markov Chains (2) The probability to spend exactly k units of time in state i is: P(X spends k units in i) = p k ii (1 p ii ) This is the definition of an geometric variable. In a continuous state it would be an exponential distribution. The definition of the present can also be modified: X i may for example depends on the previous k states instead of the last one. This increases the size of the parameter space but makes the model richer.
11 Basics for hidden Markov Chains The hidden Markov chain framework adds one layer (denoted Y ) to the Markovian process discribed previously. The conditional distribution of P(Y j X j = s) may be unknown, completely specified or partially specified. Typically the number of hidden states S is relatively small (no more than a few hundreds of states). But n may be very large, i. e. X and Y may be very long sequences (think DNA sequences).
12 Slightly more general version Without complicating anything, we can most of the time assume that P(Y j X j ) also varies with j. Y could also be a Markov chain. Non-Markovian stays can be, to some extent, mimicked by using a sequence of hidden state: First part of the gene, middle of the gene, end of the gene.
13 The set of parameters Θ 1 (P st ) is the transition matrix for the hidden states. 2 Q sk = P(Y = k X = s) is probability distribution for the observed chain Y give X. 3 Lastly, we need a vector µ to initiate the hidden chain X.
14 Two related problems 1 At a given point i in the sequence, what is the most likely hidden state X i? 2 What is the most likely hidden sequence (X ) n i=1? 3 The first question relates to marginal probabilities and the second to the joint likelihood.
15 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
16 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
17 What we can compute at this stage At this stage our tools are limited. Given a sequence x = (x 1,..., x n ) we can compute P(X = x, Y = y) = P(X = x)p(y = y X = x) This is the full joint likelihood for (X, Y ).
18 Why problem 1 is difficult P(X i = x i Y ) = P(X i = x i, Y ) P(Y ) = P(X i = x i, Y ) s=1,...,s P(X i = s, Y ) So the problem amounts to estimating P(X i = r, Y ) A direct computation would sum over all possible sequences: P(X i = s, Y ) = P [X = x, Y ] x x i =s With S hidden states we need to sum over S n terms, which is not practical. We need to be smarter.
19 We need to use the Markovian assumption P(X i = s, Y ) = P(X i = s)p(y X i = s) = P(X i = s) x = P(X i = s) x i 1 P(Y, X = x X i = s) P(Y i 1, X i 1 = x i 1 X i = s) x n i1 P(Y n i1, X n i1 = x n i1 X i = s) = P(X i = s)p(y i 1 X i = s) P(Y n i1 X i = s) = P(Y i 1, X i = s)p(y n i1 X i = s) = α s (i) β s (i)
20 A new computation We have shown that: P(X i = s Y ) = α s (i) β s (i) S t=1 α t(i) β t (i) where: α s (i) = P(Y i 1, X i = s) β s (i) = P(Y n i1 X i = s) And it is actually possible to compute, recursively, the quantities α s (i), β s (i).
21 Two recursive computations The (forward) recursion for α is: α s (i 1) = P(Y i1 X i1 = s) The (backward) recursion for β is: S α t (i)p ts t=1 β s (i 1) = t P st β t (i)p(y i X i = t)
22 Proof for the first recursion α s (i 1) = P(Y i1 1, X i1 = s) = t P(Y i1 1, X i1 = s X i = t)p(x i = t) = t P(Y i1 1 X i1 = s, X i = t)p(x i1 = s X i = t)p(x i = t) = P(Y i1 X i1 = s) t = P(Y i1 X i1 = s) t = P(Y i1 X i1 = s) t P ts P(Y i 1 X i = t, X i1 = s)p(x i = t) P ts P(Y i 1, X i = t) P ts α t (i) A similar proof is used for the backward recursion.
23 Computational considerations The algorithm requires to store n S floats. In terms of computation times, the requirements are in S 2 N. Linearity in n is the key feature because it enables the analysis of very long DNA sequences. Note that probabilities rapidly become infinitely small. Everything needs to be done at the log scale (be careful when implementing it). Various R packages are available for hidden Markov Chains (google it!).
24 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
25 Pbm 2: Finding the most likely hidden sequence ˆX A different problem consists of finding the most likely hidden sequence ˆX. Indeed, the most likely X i using the marginal distribution may be quite different from ˆX i. An algorithm exists to achieve this maximisation and it is called Viterbi algorithm.
26 The Viterbi algorithm Define V s (i) = max P(Y1, i X1 X i i = s) x1 i Similarly to the previous problem a forward recursion can be defined for V s (n 1) as a function of V s. Following this forward computation a reverse parsing of the Markov chain can identify the most likely sequence.
27 An exercise Here is a table that shows the probability of the data for three states (one state per row, 6 points in the chain). This matrix shows a log likelihood of the data given the position in the chain and the hidden state (which can be either 1, 2 or 3). State Assume that remaining in the same state costs no log-likelihood, but transitioning from one state to another costs one unit of likelihood. The probability over the three states is uniform to start the chain. Compute V s (i) = max P(Y1, i X1 X i i = s) x1 i and estimate the most liekly Viterbi path.
28 A few words about Andrew Viterbi Andrew James Viterbi (born in Bergamo in 1935) is an Italian-American electrical engineer and businessman. In addition to his academic work he co-founded Qualcomm. Viterbi made a very large donation to the University of Southern California to name the school the Viterbi school of engineering.
29 Computational considerations Requirements are the same as before. The algorithm requires to store n S floats. In terms of computation times, the requirements are in S 2 N. Linearity in n is the key feature because it enables the analysis of very long DNA sequences. Easy to code (in C or R, see example and R libraries).
30 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
31 Unknown parameters case Often we do not know the distribution P(Y X ). We may also not know the transition probabilities for the hidden Markov chain X. If the parameters Θ are not known, how can we estimate them?
32 What if we knew X? If we know X, the problem becomes straightforward. For example a maximum likelihood estimate would be: i P(Y = k X = s) = 1 Y i =k,x i =s i 1 X i =s More sophisticated (but still straightforward) versions of this could be used if Y was a n th order Markov chain.
33 A typical missing data problem In this missing data context, a widely used algorithm is the the Expectation-Maximisation (EM) algorithm. The EM algorithm is set up to find the parameters that maximise the likelihood of the observed data Y in the presence of missing data X. At each step the likelihood is guaranteed to increase. The algorithm can easily be stuck in a local maximum of the likelihood surface.
34 The basic idea of the EM The is a general iterative algorithm with multiple applications. It first computes the expected value of the likelihood given the current parameters (essentially imputing the hidden chain X ): Q(θ, θ n ) = E X Y (log L(X, Y, θ n )) Then maximises the quantity Q(θ, θ n1 ) as a function of θ. θ n1 = argmax Q(θ, θ n ) θ
35 EM in the context of HMM i P st = P(X i = s, X i1 = t Y ) i P(X i = s Y ) i Q sk = 1 Y i =kp(x i = s Y ) i P(X i = s Y ) The updated probabilities can be estimated using the sequences α s, β s estimated previously. This special case of the EM for HMM is called the Baum-Welch algorithm.
36 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
37 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
38 Gene prediction Zhang, Nat Rev Genetics, 2002
39 Some drawbacks of this approach The number of hidden states can be very large. Modelling codons takes three states, plus probably three states for the first and three states for the last codons. So about nine states just for the exons. One probably needs nine more states on the reverse strand. Some alternatives exist (using semi-markov models).
40 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
41 Copy number variant detection from SNP arrays Allele 1 Allele 2
42 Copy number variant detection from SNP arrays Wang et al, Genome Research 2007
43 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
44 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
45 Stochastic EM (SEM) The EM-Baum-Welch algorithm essentially uses the conditional distribution of X given Y. Another way to compute this expectation is to use a Monte-Carlo approach by simulating X given Y and taking an average. This is a trade-off: We of course do not retain the certainty that the likelihood is increasing (as provided by the EM). However added randomness may avoid the pitfall of having the estimator stuck in local maximum (a major issue with the EM).
46 Stochastic EM (SEM) A simulation of X conditionally on Y would use the following decomposition: P(X N 1 Y N 1 ) = P(X 1 Y N 1 )P(X 2 Y N 1, X 1 ) P(X N Y N 1, X N 1 1 ) This relies on being able to compute the marginal probabilities but this is what Baum-Welch does. Once the α, β have been computed, the simulation is linear in time and multiple sequences can be simulated rapidly.
47 How to simulate in practice The simulation uses the equality: P(X i1 = t Y, X i = s) = P stp(y i1 X i1 = t)p(y n i2 X i1 = s) P(Y n i1 X i = s) = P stp(y i1 X i1 = t)β t (i 1) β s (i) Note that this is a forward-backward algorithm as well but the forward step is built into the simulation step, unlike the traditional Baum-Welch.
48 Estimation issues Using a single estimated run for the hidden chain X is necessarily less efficient that relying on the expected probability. The number of data points must be very large to make the estimation precise. One could potentially take an average of multiple simulated runs. With sufficient numbers of simulations one actually gets very close to the EM. List most practical estimation procedures one has to find the good combination of tools, and there is not one answer.
49 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications Gene prediction CNV detection from SNP arrays 5 Two extensions to the basic HMM Stochastic EM Semi-Markov models
50 Semi-Markov models (HSMM) In the context of the gene prediction using three states per codon is not satisfying. We would like something that takes into account groups of 3bp jointly. Semi-Markov models do exactly this. When entering a state s, a random variable T s is drawn for the duration in state s. Then the emission probability for Y can be defined for the entire duration of the stay. So codons are naturally defined by groups of 3bp instead of dealing with multiple hidden states.
51 Backward recursion for SEM applied to semi-markov hidden chains We are interested in computing the quantities: n [1, N 1], i [1, k], β i (n) = P(Y N n1 Y n 1, X n = i) β i (N) = 1 β i (n) = P(Yn1 Y N 1 n, X n = ɛ i ) = P ij P(T γj = l)p(yn1 nl1 Xn1 nl )β j(n l) j l<n n Note the complexity not in NS 2 max(l) as opposed to NS 2 before.
52 Forward simulations for SEM One can simulate a new hidden sequence recursively with the formulas: P(X nl n1 = j Y N 1, X n = i) = P ijp(t j = l)p(y nl1 n1 X nl n1 = j)β j(n l) β i (n) This is very much analogous to the basic HMM situation, with the extra complication generated by the variable state length.
53 Estimation for semi-markov models It is possible to run a Viterbi algorithm using the same recursion derived for the Markovian case. It is also possible to use a SEM algorithm to simulate the hidden sequence X and use it to estimate the parameters of the model. A full EM is also possible but I never implemented it. The computational requirements may become challenging but it all depends on the application.
ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationBiology 644: Bioinformatics
A statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states in the training data. First used in speech and handwriting recognition In
More informationHidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi
Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential
More informationGraphical Models & HMMs
Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationε-machine Estimation and Forecasting
ε-machine Estimation and Forecasting Comparative Study of Inference Methods D. Shemetov 1 1 Department of Mathematics University of California, Davis Natural Computation, 2014 Outline 1 Motivation ε-machines
More informationHMMConverter A tool-box for hidden Markov models with two novel, memory efficient parameter training algorithms
HMMConverter A tool-box for hidden Markov models with two novel, memory efficient parameter training algorithms by TIN YIN LAM B.Sc., The Chinese University of Hong Kong, 2006 A THESIS SUBMITTED IN PARTIAL
More informationModeling time series with hidden Markov models
Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa, Jose Medina and Aude Billard Time series data Barometric pressure Temperature Data Humidity Time What s going
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationHIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT
HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins
More informationEukaryotic Gene Finding: The GENSCAN System
Eukaryotic Gene Finding: The GENSCAN System BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC
More informationQuiz Section Week 8 May 17, Machine learning and Support Vector Machines
Quiz Section Week 8 May 17, 2016 Machine learning and Support Vector Machines Another definition of supervised machine learning Given N training examples (objects) {(x 1,y 1 ), (x 2,y 2 ),, (x N,y N )}
More informationClustering Relational Data using the Infinite Relational Model
Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015
More informationAssignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018
Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationLecture 5: Markov models
Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a
More informationApplications of admixture models
Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27
More informationBMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven)
BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling Colin Dewey (adapted from slides by Mark Craven) 2007.04.12 1 Modeling RNA with Stochastic Context Free Grammars consider
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationHPC methods for hidden Markov models (HMMs) in population genetics
HPC methods for hidden Markov models (HMMs) in population genetics Peter Kecskemethy supervised by: Chris Holmes Department of Statistics and, University of Oxford February 20, 2013 Outline Background
More informationTime series, HMMs, Kalman Filters
Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationIntroduction to Hidden Markov models
1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationMathematical Analysis of Google PageRank
INRIA Sophia Antipolis, France Ranking Answers to User Query Ranking Answers to User Query How a search engine should sort the retrieved answers? Possible solutions: (a) use the frequency of the searched
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationAn Introduction to Hidden Markov Models
An Introduction to Hidden Markov Models Max Heimel Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin http://www.dima.tu-berlin.de/ 07.10.2010 DIMA TU Berlin 1 Agenda
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationStephen Scott.
1 / 33 sscott@cse.unl.edu 2 / 33 Start with a set of sequences In each column, residues are homolgous Residues occupy similar positions in 3D structure Residues diverge from a common ancestral residue
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationBiostatistics 615/815 Lecture 23: The Baum-Welch Algorithm Advanced Hidden Markov Models
Biostatistics 615/815 Lecture 23: The Algorithm Advanced Hidden Markov Models Hyun Min Kang April 12th, 2011 Hyun Min Kang Biostatistics 615/815 - Lecture 22 April 12th, 2011 1 / 35 Annoucement Homework
More information10708 Graphical Models: Homework 4
10708 Graphical Models: Homework 4 Due November 12th, beginning of class October 29, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,
More informationThe Expectation Maximization (EM) Algorithm
The Expectation Maximization (EM) Algorithm continued! 600.465 - Intro to NLP - J. Eisner 1 General Idea Start by devising a noisy channel Any model that predicts the corpus observations via some hidden
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationCOMPUTATIONAL STATISTICS UNSUPERVISED LEARNING
COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationShort-Cut MCMC: An Alternative to Adaptation
Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationGraphical Models, Bayesian Method, Sampling, and Variational Inference
Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University
More informationGenome 559. Hidden Markov Models
Genome 559 Hidden Markov Models A simple HMM Eddy, Nat. Biotech, 2004 Notes Probability of a given a state path and output sequence is just product of emission/transition probabilities If state path is
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies
More informationRegularization and Markov Random Fields (MRF) CS 664 Spring 2008
Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))
More informationUsing Hidden Markov Models to Detect DNA Motifs
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-13-2015 Using Hidden Markov Models to Detect DNA Motifs Santrupti Nerli San Jose State University
More informationCSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates
CSCI 599 Class Presenta/on Zach Levine Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates April 26 th, 2012 Topics Covered in this Presenta2on A (Brief) Review of HMMs HMM Parameter Learning Expecta2on-
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationECE521 Lecture 18 Graphical Models Hidden Markov Models
ECE521 Lecture 18 Graphical Models Hidden Markov Models Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 2 Graphical
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION
ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationToday s outline: pp
Chapter 3 sections We will SKIP a number of sections Random variables and discrete distributions Continuous distributions The cumulative distribution function Bivariate distributions Marginal distributions
More informationDynamic Programming. Ellen Feldman and Avishek Dutta. February 27, CS155 Machine Learning and Data Mining
CS155 Machine Learning and Data Mining February 27, 2018 Motivation Much of machine learning is heavily dependent on computational power Many libraries exist that aim to reduce computational time TensorFlow
More informationMachine Learning. Computational biology: Sequence alignment and profile HMMs
10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth
More informationHidden Markov Models. Mark Voorhies 4/2/2012
4/2/2012 Searching with PSI-BLAST 0 th order Markov Model 1 st order Markov Model 1 st order Markov Model 1 st order Markov Model What are Markov Models good for? Background sequence composition Spam Hidden
More informationRJaCGH, a package for analysis of
RJaCGH, a package for analysis of CGH arrays with Reversible Jump MCMC 1. CGH Arrays: Biological problem: Changes in number of DNA copies are associated to cancer activity. Microarray technology: Oscar
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationOptimization of HMM by the Tabu Search Algorithm
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 20, 949-957 (2004) Optimization of HMM by the Tabu Search Algorithm TSONG-YI CHEN, XIAO-DAN MEI *, JENG-SHYANG PAN AND SHENG-HE SUN * Department of Electronic
More informationAlgorithms for Markov Random Fields in Computer Vision
Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden
More informationSpeech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationClustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract
Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic
More informationCS 6784 Paper Presentation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John La erty, Andrew McCallum, Fernando C. N. Pereira February 20, 2014 Main Contributions Main Contribution Summary
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationInference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at
More informationActive Network Tomography: Design and Estimation Issues George Michailidis Department of Statistics The University of Michigan
Active Network Tomography: Design and Estimation Issues George Michailidis Department of Statistics The University of Michigan November 18, 2003 Collaborators on Network Tomography Problems: Vijay Nair,
More informationHierarchical Mixture Models for Nested Data Structures
Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands
More informationNOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION
NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION * Prof. Dr. Ban Ahmed Mitras ** Ammar Saad Abdul-Jabbar * Dept. of Operation Research & Intelligent Techniques ** Dept. of Mathematics. College
More informationVariational Methods for Graphical Models
Chapter 2 Variational Methods for Graphical Models 2.1 Introduction The problem of probabb1istic inference in graphical models is the problem of computing a conditional probability distribution over the
More informationCSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3
CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas
More information6 : Factor Graphs, Message Passing and Junction Trees
10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical
More informationTo earn the extra credit, one of the following has to hold true. Please circle and sign.
CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the
More informationHidden Markov decision trees
Hidden Markov decision trees Michael I. Jordan*, Zoubin Ghahramanit, and Lawrence K. Saul* {jordan.zoubin.lksaul}~psyche.mit.edu *Center for Biological and Computational Learning Massachusetts Institute
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationNatural Language Processing - Project 2 CSE 398/ Prof. Sihong Xie Due Date: 11:59pm, Oct 12, 2017 (Phase I), 11:59pm, Oct 31, 2017 (Phase II)
Natural Language Processing - Project 2 CSE 398/498-03 Prof. Sihong Xie Due Date: :59pm, Oct 2, 207 (Phase I), :59pm, Oct 3, 207 (Phase II) Problem Statement Given a POS-tagged (labeled/training) corpus
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationMax-Sum Inference Algorithm
Ma-Sum Inference Algorithm Sargur Srihari srihari@cedar.buffalo.edu 1 The ma-sum algorithm Sum-product algorithm Takes joint distribution epressed as a factor graph Efficiently finds marginals over component
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More information