Graphical Models & HMMs

Similar documents
Machine Learning. Sourangshu Bhattacharya

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Cheng Soon Ong & Christian Walder. Canberra February June 2018

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

STA 4273H: Statistical Machine Learning

Max-Sum Inference Algorithm

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Introduction to Graphical Models

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Probabilistic Graphical Models

Graphical Models Part 1-2 (Reading Notes)

Hidden Markov Models in the context of genetic analysis

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

ECE521 W17 Tutorial 10

Computer vision: models, learning and inference. Chapter 10 Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Directed Graphical Models

CS 343: Artificial Intelligence

The Basics of Graphical Models

Machine Learning

Graphical Models. David M. Blei Columbia University. September 17, 2014

Machine Learning

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Probabilistic Graphical Models

Bayesian Machine Learning - Lecture 6

10-701/15-781, Fall 2006, Final

Algorithms for Markov Random Fields in Computer Vision

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

Biology 644: Bioinformatics

Mixture Models and the EM Algorithm

COMP90051 Statistical Machine Learning

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms

Machine Learning

Bayesian Classification Using Probabilistic Graphical Models

Inference and Representation

Decomposition of log-linear models

COMP90051 Statistical Machine Learning

Introduction to Probabilistic Graphical Models

ε-machine Estimation and Forecasting

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Latent Variable Models and Expectation Maximization

Lecture 9: Undirected Graphical Models Machine Learning

Scene Grammars, Factor Graphs, and Belief Propagation

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

CS 188: Artificial Intelligence

Introduction to SLAM Part II. Paul Robertson

A Brief Introduction to Bayesian Networks. adapted from slides by Mitch Marcus

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Time series, HMMs, Kalman Filters

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Introduction to Hidden Markov models

Machine Learning Lecture 16

Collective classification in network data

Machine Learning

3 : Representation of Undirected GMs

Scene Grammars, Factor Graphs, and Belief Propagation

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Sequence Labeling: The Problem

Modeling time series with hidden Markov models

Machine Learning

7. Boosting and Bagging Bagging

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

6 : Factor Graphs, Message Passing and Junction Trees

Machine Learning

Monte Carlo for Spatial Models

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Lecture 3: Conditional Independence - Undirected

Unsupervised Learning

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

732A54/TDDE31 Big Data Analytics

Learning Objects and Parts in Images

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

K-Means and Gaussian Mixture Models

Mixture Models and EM

Deep Generative Models Variational Autoencoders

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Statistical Methods for NLP

Probabilistic Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Transcription:

Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 1 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 2 / 83

Introduction Basically we can describe Bayesian inference through repeated use of the sum and product rules Using a graphical / diagrammatical representation is often useful 1 A way to visualize structure and consider relations 2 Provides insights into a model and possible independence 3 Allow us to leverage of the many graphical algorithms available Will consider both directed and undirected models This is a very rich areas with numerous good references Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 3 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 4 / 83

Bayesian Networks Consider the joint probability p(a, b, c) Using product rule we can rewrite it as p(a, b, c) = p(c a, b)p(a, b) Which again can be changed to p(a, b, c) = p(c a, b)p(b a)p(a) We can illustrate this as a b Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 5 / 83 c

Bayesian Networks Nodes represent variables Arcs/links represent conditional dependence We can use the decomposition for any joint distribution. The direct / brute-force application generates fully connected graphs We can represent much more general relations Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 6 / 83

Example with sparse connections We can represent relations such as p(x 1 )p(x 2 )p(x 3 )p(x 4 x 1, x 2, x 3 )p(x 5 x 1, x 3 )p(x 6 x 4 )p(x 7 x 4, x 5 ) Which is shown below x 1 x 2 x 3 x 4 x 5 x 6 x 7 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 7 / 83

The general case We can think of this as coding the factors p(x k pa k ) where pa k is the set of parents to a variable x k The inference is then p(x) = k p(x k pa k ) we will refer to this as factorization There can be no directed cycles in the graph The general form is termed a directed acyclic graph - DAG Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 8 / 83

Basic example We have seen the polynomial regression before p(t, w) = p(w) n p(t n w) Which can be visualized as w t 1 t N Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 9 / 83

Bayesian Regression We can make the parameters and variables explicit p(t, w x, α, σ 2 ) = p(w α) n p(t n w, x n, σ 2 ) as shown here x n α σ 2 t n N w Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 10 / 83

Bayesian Regression - Learning When entering data we can condition inference on it p(w t) p(w) n p(t n w) x n α σ 2 t n N w Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 11 / 83

Generative Models - Example Image Synthesis Object Position Orientation Image Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 12 / 83

Discrete Variables - 1 General joint distribution has K 2 1 parameters (for K possible outcomes) x 1 x 2 p(x 1, x 2 µ) = Independent joint distributions have 2(K 1) parameters x 1 x 2 p(x 1, x 2 µ) = K K µ x 1i x 2j ij i=1 j=1 K i=1 µ x 1i 1i K j=1 µ x 2j 2j Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 13 / 83

Discrete Variables - 2 General joint distribution over M variables will have K M 1 parameters A Markov chain with M nodes will have K 1 + (M 1)K(K 1) parameters x 1 x 2 x M Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 14 / 83

Discrete Variables - Bayesian Parms µ 1 µ 2 µ M x 1 x 2 x M The parameters can be modelled explicitly M p({x m, µ m }) = p(x 1 µ 1 )p(µ 1 ) p(x m x m 1, µ m )p(µ m ) m=2 It is assumed that p(µ m ) is a Dirachlet Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 15 / 83

Discrete Variables - Bayesian Parms (2) µ 1 µ x 1 x 2 x M For shared paraemeters the situation is simpler M p({x m }, µ 1, µ) = p(x 1 µ 1 )p(µ 1 ) p(x m x m 1, µ)p(µ) m=2 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 16 / 83

Extension to Linear Gaussian Models The model can be extended to have each node as a Gaussian process/variable that is a linear function of its parents p(x i pa i ) = N x i w ij x j + b i, v i j pa i Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 17 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 18 / 83

Conditional Independence Considerations of independence is important as part of the analysis and setup of a system As an example a is independent of b given c Or equivalently Frequent notation in statistics p(a b, c) = p(a c) p(a, b c) = p(a b, c)p(b c) = p(a c)p(b c) a b c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 19 / 83

Conditional Independence - Case 1 c p(a, b, c) = p(a c)p(b c)p(c) a b p(a, b) = c a / b p(a c)p(b c)p(c) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 20 / 83

Conditional Independence - Case 1 a c b p(a, b c) = p(a, b, c) p(c) = p(a c)p(b c) a b c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 21 / 83

Conditional Independence - Case 2 a c b p(a, b, c) = p(a)p(c a)p(b c) p(a, b) = p(a) c p(c a)p(b c) = p(a)p(b a) a / b Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 22 / 83

Conditional Independence - Case 2 a c b p(a, b c) = p(a, b, c) p(c) = p(a)p(c a)p(b c) p(c) = p(a c)p(b c) a b c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 23 / 83

Conditional Independence - Case 3 a b c p(a, b, c) = p(a)p(b)p(c a, b) p(a, b) = p(a)p(b) a b This is the opposite of Case 1 - when c unobserved Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 24 / 83

Conditional Independence - Case 3 a c b p(a, b c) = = a / b c p(a, b, c) p(c) p(a)p(b)p(c a, b) p(c) This is the opposite of Case 1 - when c observed Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 25 / 83

Diagnostics - Out of fuel? B G F p(g = 1 B = 1, F = 1) = 0.8 p(g = 1 B = 1, F = 0) = 0.2 p(g = 1 B = 0, F = 1) = 0.2 p(g = 1 B = 0, F = 0) = 0.1 B = Battery F = Fuel Tank G = Fuel Gauge p(b = 1) = 0.9 p(f = 1) = 0.9 p(f = 0) = 0.1 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 26 / 83

Diagnostics - Out of fuel? B F G p(f = 0 G = 0) = p(g = 0 F = 0)p(F = 0) p(g = 0) 0.257 Observing G=0 increased the probability of an empty tank Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 27 / 83

Diagnostics - Out of fuel? B F G p(f = 0 G = 0, B = 0) = p(g = 0 B = 0, F = 0)p(F = 0) p(g = 0 B = 0, F )p(f ) 0.111 Observing B=0 implies less likely empty tank Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 28 / 83

D-separation Consider non-intersecting subsets A, B, and C in a directed graph A path between subsets A and B is considered blocked if it contains a node such that: 1 the arcs are head-to-tail or tail-to-tail and in the set C 2 the arcs meet head-to-head and neither the node or its descendents are in the set C. If all paths between A and B are blocked then A is d-separated from B by C. If there is d-separation then all the variables in the graph satisfies A B C Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 29 / 83

D-Separation - Discussion a f a f e b e b c c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 30 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 31 / 83

Inference in graphical models Consider inference of p(x, y) we can formulate this as p(x, y) = p(x y)p(y) = p(y x)p(x) We can further marginalize p(y) = p(y x )p(x ) x Using Bayes Rule we can reverse the inference p(x y) = p(y x)p(x) p(y) Helpful as mechanisms for inference Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 32 / 83

Inference in graphical models x x x y y y (a) (b) (c) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 33 / 83

Inference on a chain x 1 x 2 x N 1 x N p(x) = 1 Z ψ 1,2(x 1, x 2 )ψ 2,3 (x 2, x 3 ) ψ N 1,N (x N 1, x N ) p(x n ) = p(x) x 1 x n 1 x n+1 x N Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 34 / 83

Inference on a chain µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N p(x n ) = 1 [ ] ψ n 1,n (x n 1, x n ) ψ 1,2 (x 1, x 2 ) Z xn 1 x 1 }{{} [ µ a(x n) [ ]] ψ n,n+1 (x n, x n+1 ) ψ N 1,N (x N 1, x N ) x n+1 x N }{{} µ b (x n) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 35 / 83

Inference on a chain µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N µ a (x n ) = x n 1 ψ n 1,n (x n 1, x n ) [ = x n 1 ψ n 1,n (x n 1, x n )µ a (x n 1 ) µ b (x n ) = x n+1 ψ n,n+1 (x n, x n+1 ) [ = x n+1 ψ n,n+1 (x n, x n+1 )µ b (x n+1 x 1 ψ 1,2 (x 1, x 2 ) ] x N ψ N 1,N (x N 1, x N ) ] Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 36 / 83

Inference on a chain µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N µ a (x 2 ) = x 1 ψ 1,2 (x 1, x 2 ) µ b (x N 1 ) = x N ψ N 1,N (x N 1, x n ) Z = x n µ a (x n )µ b (x n ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 37 / 83

Inference on a chain To compute local marginals Compute and store forward messages µ a (x n ) Compute and store backward messages µ b (x n ) Compute Z at all nodes Compute for all variables p(x n ) = 1 Z µ a(x n )µ b (x n ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 38 / 83

Inference in trees Undirected Tree Directed Tree Directed Polytree Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 39 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 40 / 83

Factor Graphs x 1 x 2 x 3 f a f b f c f d p(x) = f a (x 1, x 2 )f b (x 1, x 2 )f c (x 2, x 3 )f d (x 3 ) p(x) = s f s (x s ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 41 / 83

Factor Graphs from Directed Graphs x 1 x 2 x 1 x 2 f x 1 x 2 f c f a f b x 3 x 3 p(x) = p(x 1 )p(x 2 ) f (x 1, x 2, x 3 ) = f a (x 1 ) = p(x 1 ) p(x 3 x 1, x 2 ) p(x 1 )p(x 2 )p(x 3 x 1, x 2 ) f b (x 2 ) = p(x 2 ) f c (x 1, x 2, x 3 ) = p(x 3 x 2, x 1 ) x 3 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 42 / 83

Factor Graphs from Undirected Graphs x 1 x 2 x 1 x 2 f x 1 x 2 f a f b x 3 x 3 ψ(x 1, x 2, x 3 ) f (x 1, x 2, x 3 ) f a (x 1, x 2, x 3 )f b (x 2, x 3 ) = ψ(x 1, x 2, x 3 ) = ψ(x 1, x 2, x 3 ) x 3 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 43 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 44 / 83

The Sum-Product Algorithm Objective Key Idea exact, efficient algorithm for computing marginals make allow multiple marginals to be computed efficiently The distributive Law ab + ac = a(b + c) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 45 / 83

The Sum-Product Algorithm Fs(x, Xs) f s µ fs x(x) x p(x) = p(x) x\x p(x) = s Ne(x) F s (x, X s ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 46 / 83

The Sum-Product Algorithm Fs(x, Xs) f s µ fs x(x) x p(x) = s Ne(x) Xs F s (x, X s ) = µ fs x(x) s Ne(x) µ fs x(x) = X s F s (x, X s ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 47 / 83

The Sum-Product Algorithm x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m, X sm ) F s (x, X s ) = f s (x, x 1,..., x M )G 1 (x 1, X s1 )... G M (x M, X sm ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 48 / 83

The Sum-Product Algorithm x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m, X sm ) µ fs x(x) = f s (x, x 1,..., x M ) x 1 x M = x 1 x M f s (x, x 1,..., x M ) m Ne(f s)\x m Ne(f s)\x G m (x m, X sm ) Xsm µ xm f s (x m ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 49 / 83

The Sum-Product Algorithm x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m, X sm ) µ xm fs (x m ) G m (x m, X sm ) = F l (x m, X lm ) X sm X sm l Ne(x m)\f s = µ fl x m (x m ) l Ne(x m)\f s Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 50 / 83

The Sum-Product Algorithm Initialization For variable nodes µ x f (x) = 1 For factor nodes x µ f x (x) = f(x) f f x Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 51 / 83

The Sum-Product Algorithm To compute local marginals Pick an arbitrary node as root Compute and propagate msgs from leafs to root (store msgs) Compute and propagate msgs from root to leaf nodes (store msgs) Compute products of received msgs and normalize as required Propagate up the tree and down again to compute all marginals (as needed) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 52 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 53 / 83

Introduction There are many examples where we are processing sequential data Everyday examples include Processing of stock data Speech analysis Traffic patterns Gesture analysis Manufacturing flow We cannot assume IID as a basis for analysis The Hidden Markov Model is frequently used Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 54 / 83

Apple Stock Quote Example Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 55 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 56 / 83

Markov model We have already talked about the Markov model. Estimation of joint probability as a sequential challenge N p(x 1,..., x N ) = p(x n x n 1,..., x 1 ) n=1 An Nth order model is dependent on the N last terms A 1st order model is then N P(x 1,..., x N ) = p(x n x n 1 ) n=1 I.e.: p(x n x n 1,..., x 1 ) = p(x n x n 1 ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 57 / 83

Markov chains x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 58 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 59 / 83

Hidden Markov Model z 1 z 2 z n 1 z n z n+1 x 1 x 2 x n 1 x n x n+1 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 60 / 83

Modelling of HMM s We can model the transition probabilities as a table A jk = p(z nk = 1 z n 1,j = 1) The conditionals are then (with a 1-out-of-K coding) p(z n z n 1, A) = K K A z n 1,j z nk jk k=1 j=1 The per element probability is expressed by π k = p(z 1k = 1) with k π k = 1 p(z 1 π) = K k=1 π z 1k k Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 61 / 83

Illustration of HMM k = 1 A 11 A 11 A 11 k = 2 k = 3 A 33 A 33 A 33 n 2 n 1 n n + 1 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 62 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 63 / 83

Maximum likelihood for the HMM If we observe a set of data X = {x 1,..., x N } we can estimate the parameters using ML p(x θ) = Z p(x, Z θ) I.e. summation of paths through lattice We can use EM as a strategy to find a solution E-Step: Estimation of p(z X, θ old ) M-Step: Maximize over θ to optimize Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 64 / 83

ML solution to HMM Define Q(θ, θ old ) = Z p(z X, θ old ) ln p(x, Z θ) γ(z n ) = p(z n X, θ old ) γ(z nk ) = E[z nk ] = z γ(z)z nk ξ(z n 1, z n ) = p(z n 1, z n X, θ old ) ξ(z n 1,j, z nk ) = z γ(z)z n 1,j z nk Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 65 / 83

ML solution to HMM The quantities can be computed π k = A jk = γ(z 1k ) K j=1 γ(z 1j) N n=2 ξ(z n 1,j, z nk ) K N l=1 n=2 ξ(z n 1,j, z nl ) Assume p(x φ k ) = N(x µ k, Σ k ) so that µ k = n γ(z nk)x n Σ k = n γ(z nk) How do we efficiently compute γ(z nk )? n γ(z nk)(x n µ k )(x n µ k ) T n γ(z nk) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 66 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 67 / 83

Forward-Backward / Baum-Welch How can we efficiently compute γ() and ξ(.,.)? Remember the HMM is a tree model Using message passing we can compute the model efficiently (remember earlier discussion?) We have two parts to the message passing forward and backward for any component We have From earlier we have γ(z n ) = p(z n X ) = P(X z n)p(z n ) p(x ) γ(z n ) = α(z n)β(z n ) p(x ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 68 / 83

Forward-Backward We can then compute α(z n ) = p(x n z n ) z n 1 α(z n 1 )p(z n z n 1 ) β(z n ) = z n+1 β(z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) p(x ) = z n α(z n )β(z n ) ξ(z n 1, z n ) = α(z n 1)p(x n z n )p(z n z n 1 )β(z n ) p(x ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 69 / 83

Sum-product algorithms for the HMM Given the HMM is a tree structure Use of sum-product rule to compute marginals We can derive a simple factor graph for the tree χ z 1 z n 1 z n ψ n g 1 g n 1 g n x 1 x n 1 x n Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 70 / 83

Sum-product algorithms for the HMM We can then compute the factors h(z 1 ) = p(z 1 )p(x 1 z 1 ) f n (z n 1, z n ) = p(z n z n 1 )p(x n z n ) The update factors µ fn z n (z n ) can be used to derive message passing with α(.) and β(.) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 71 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 72 / 83

Viterbi Algorithm Using the message passing framework it is possible to determine the most likely solution (ie best recognition) Intuitively Keep only track of the most likely / probably path through the graph At any time there are only K possible paths to maintain Basically a greedy evaluation of the best solution Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 73 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 74 / 83

Small example of gesture tracking Tracking of hands using an HMM to interpret track Pre-process images to generate tracks Color segmentation Track regions using Kalman Filter Interpret tracks using HMM Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 75 / 83

Pre-process architecture Camera RGB image Color segmentation Binary image Downscale Density map Thresholding and Connecting Lookup table N largest blobs Initial HSV thresholds Color model Adapted lookup table Tracker Histogram generation Head, left and right hand blobs Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 76 / 83

Basic idea Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 77 / 83

Tracking 3.5 x 10 3 3 2.5 2 Motion 1.5 1 A B C 0.5 Threshold 0 0 50 100 150 200 Frame Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 78 / 83

Motion Patterns Attention Idle Forward Back Left Right Turn left Turn right Faster Slower Stop Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 79 / 83

Evaluation Acquired 2230 image sequences Covering 5 people in a normal living room 1115 used for training 1115 sequences were used for evaluation Capture of position and velocity data Rec Rates Position Velocity Combined Result [%] 96.6 88.7 99.5 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 80 / 83

Example Timing Phase Time/frame[ms] Image Transfer 4.3 Segmentation 0.6 Density Est 2.1 Connect Comp 2.1 Kalman Filter 0.3 HMM 21.0 Total 30.4 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 81 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 82 / 83

Summary The Hidden Markov Model (HMM) Many uses for sequential data models HMM is one possible formulation Autoregressive Models are common in data processing Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 83 / 83