Graphical Models & HMMs - PDF Free Download

Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 1 / 83

Outline 1 Introduction 2 Bayesian Networks 3 Conditional Independence 4 Inference 5 Factor Graphs 6 Sum-Product Algorithm 7 HMM Introduction 8 Markov Model 9 Hidden Markov Model 10 ML solution for the HMM 11 Forward-Backward 12 Viterbi 13 Example 14 Summary Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 2 / 83

Introduction Basically we can describe Bayesian inference through repeated use of the sum and product rules Using a graphical / diagrammatical representation is often useful 1 A way to visualize structure and consider relations 2 Provides insights into a model and possible independence 3 Allow us to leverage of the many graphical algorithms available Will consider both directed and undirected models This is a very rich areas with numerous good references Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 3 / 83

Bayesian Networks Consider the joint probability p(a, b, c) Using product rule we can rewrite it as p(a, b, c) = p(c a, b)p(a, b) Which again can be changed to p(a, b, c) = p(c a, b)p(b a)p(a) We can illustrate this as a b Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 5 / 83 c

Bayesian Networks Nodes represent variables Arcs/links represent conditional dependence We can use the decomposition for any joint distribution. The direct / brute-force application generates fully connected graphs We can represent much more general relations Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 6 / 83

Example with sparse connections We can represent relations such as p(x 1 )p(x 2 )p(x 3 )p(x 4 x 1, x 2, x 3 )p(x 5 x 1, x 3 )p(x 6 x 4 )p(x 7 x 4, x 5 ) Which is shown below x 1 x 2 x 3 x 4 x 5 x 6 x 7 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 7 / 83

The general case We can think of this as coding the factors p(x k pa k ) where pa k is the set of parents to a variable x k The inference is then p(x) = k p(x k pa k ) we will refer to this as factorization There can be no directed cycles in the graph The general form is termed a directed acyclic graph - DAG Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 8 / 83

Basic example We have seen the polynomial regression before p(t, w) = p(w) n p(t n w) Which can be visualized as w t 1 t N Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 9 / 83

Bayesian Regression We can make the parameters and variables explicit p(t, w x, α, σ 2 ) = p(w α) n p(t n w, x n, σ 2 ) as shown here x n α σ 2 t n N w Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 10 / 83

Bayesian Regression - Learning When entering data we can condition inference on it p(w t) p(w) n p(t n w) x n α σ 2 t n N w Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 11 / 83

Generative Models - Example Image Synthesis Object Position Orientation Image Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 12 / 83

Discrete Variables - 1 General joint distribution has K 2 1 parameters (for K possible outcomes) x 1 x 2 p(x 1, x 2 µ) = Independent joint distributions have 2(K 1) parameters x 1 x 2 p(x 1, x 2 µ) = K K µ x 1i x 2j ij i=1 j=1 K i=1 µ x 1i 1i K j=1 µ x 2j 2j Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 13 / 83

Discrete Variables - 2 General joint distribution over M variables will have K M 1 parameters A Markov chain with M nodes will have K 1 + (M 1)K(K 1) parameters x 1 x 2 x M Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 14 / 83

Discrete Variables - Bayesian Parms µ 1 µ 2 µ M x 1 x 2 x M The parameters can be modelled explicitly M p({x m, µ m }) = p(x 1 µ 1 )p(µ 1 ) p(x m x m 1, µ m )p(µ m ) m=2 It is assumed that p(µ m ) is a Dirachlet Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 15 / 83

Discrete Variables - Bayesian Parms (2) µ 1 µ x 1 x 2 x M For shared paraemeters the situation is simpler M p({x m }, µ 1, µ) = p(x 1 µ 1 )p(µ 1 ) p(x m x m 1, µ)p(µ) m=2 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 16 / 83

Extension to Linear Gaussian Models The model can be extended to have each node as a Gaussian process/variable that is a linear function of its parents p(x i pa i ) = N x i w ij x j + b i, v i j pa i Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 17 / 83

Conditional Independence Considerations of independence is important as part of the analysis and setup of a system As an example a is independent of b given c Or equivalently Frequent notation in statistics p(a b, c) = p(a c) p(a, b c) = p(a b, c)p(b c) = p(a c)p(b c) a b c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 19 / 83

Conditional Independence - Case 1 c p(a, b, c) = p(a c)p(b c)p(c) a b p(a, b) = c a / b p(a c)p(b c)p(c) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 20 / 83

Conditional Independence - Case 1 a c b p(a, b c) = p(a, b, c) p(c) = p(a c)p(b c) a b c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 21 / 83

Conditional Independence - Case 2 a c b p(a, b, c) = p(a)p(c a)p(b c) p(a, b) = p(a) c p(c a)p(b c) = p(a)p(b a) a / b Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 22 / 83

Conditional Independence - Case 2 a c b p(a, b c) = p(a, b, c) p(c) = p(a)p(c a)p(b c) p(c) = p(a c)p(b c) a b c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 23 / 83

Conditional Independence - Case 3 a b c p(a, b, c) = p(a)p(b)p(c a, b) p(a, b) = p(a)p(b) a b This is the opposite of Case 1 - when c unobserved Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 24 / 83

Conditional Independence - Case 3 a c b p(a, b c) = = a / b c p(a, b, c) p(c) p(a)p(b)p(c a, b) p(c) This is the opposite of Case 1 - when c observed Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 25 / 83

Diagnostics - Out of fuel? B G F p(g = 1 B = 1, F = 1) = 0.8 p(g = 1 B = 1, F = 0) = 0.2 p(g = 1 B = 0, F = 1) = 0.2 p(g = 1 B = 0, F = 0) = 0.1 B = Battery F = Fuel Tank G = Fuel Gauge p(b = 1) = 0.9 p(f = 1) = 0.9 p(f = 0) = 0.1 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 26 / 83

Diagnostics - Out of fuel? B F G p(f = 0 G = 0) = p(g = 0 F = 0)p(F = 0) p(g = 0) 0.257 Observing G=0 increased the probability of an empty tank Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 27 / 83

Diagnostics - Out of fuel? B F G p(f = 0 G = 0, B = 0) = p(g = 0 B = 0, F = 0)p(F = 0) p(g = 0 B = 0, F )p(f ) 0.111 Observing B=0 implies less likely empty tank Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 28 / 83

D-separation Consider non-intersecting subsets A, B, and C in a directed graph A path between subsets A and B is considered blocked if it contains a node such that: 1 the arcs are head-to-tail or tail-to-tail and in the set C 2 the arcs meet head-to-head and neither the node or its descendents are in the set C. If all paths between A and B are blocked then A is d-separated from B by C. If there is d-separation then all the variables in the graph satisfies A B C Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 29 / 83

D-Separation - Discussion a f a f e b e b c c Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 30 / 83

Inference in graphical models Consider inference of p(x, y) we can formulate this as p(x, y) = p(x y)p(y) = p(y x)p(x) We can further marginalize p(y) = p(y x )p(x ) x Using Bayes Rule we can reverse the inference p(x y) = p(y x)p(x) p(y) Helpful as mechanisms for inference Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 32 / 83

Inference in graphical models x x x y y y (a) (b) (c) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 33 / 83

Inference on a chain x 1 x 2 x N 1 x N p(x) = 1 Z ψ 1,2(x 1, x 2 )ψ 2,3 (x 2, x 3 ) ψ N 1,N (x N 1, x N ) p(x n ) = p(x) x 1 x n 1 x n+1 x N Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 34 / 83

Inference on a chain µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N p(x n ) = 1 [ ] ψ n 1,n (x n 1, x n ) ψ 1,2 (x 1, x 2 ) Z xn 1 x 1 }{{} [ µ a(x n) [ ]] ψ n,n+1 (x n, x n+1 ) ψ N 1,N (x N 1, x N ) x n+1 x N }{{} µ b (x n) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 35 / 83

Inference on a chain µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N µ a (x n ) = x n 1 ψ n 1,n (x n 1, x n ) [ = x n 1 ψ n 1,n (x n 1, x n )µ a (x n 1 ) µ b (x n ) = x n+1 ψ n,n+1 (x n, x n+1 ) [ = x n+1 ψ n,n+1 (x n, x n+1 )µ b (x n+1 x 1 ψ 1,2 (x 1, x 2 ) ] x N ψ N 1,N (x N 1, x N ) ] Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 36 / 83

Inference on a chain µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N µ a (x 2 ) = x 1 ψ 1,2 (x 1, x 2 ) µ b (x N 1 ) = x N ψ N 1,N (x N 1, x n ) Z = x n µ a (x n )µ b (x n ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 37 / 83

Inference on a chain To compute local marginals Compute and store forward messages µ a (x n ) Compute and store backward messages µ b (x n ) Compute Z at all nodes Compute for all variables p(x n ) = 1 Z µ a(x n )µ b (x n ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 38 / 83

Inference in trees Undirected Tree Directed Tree Directed Polytree Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 39 / 83

Factor Graphs x 1 x 2 x 3 f a f b f c f d p(x) = f a (x 1, x 2 )f b (x 1, x 2 )f c (x 2, x 3 )f d (x 3 ) p(x) = s f s (x s ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 41 / 83

Factor Graphs from Directed Graphs x 1 x 2 x 1 x 2 f x 1 x 2 f c f a f b x 3 x 3 p(x) = p(x 1 )p(x 2 ) f (x 1, x 2, x 3 ) = f a (x 1 ) = p(x 1 ) p(x 3 x 1, x 2 ) p(x 1 )p(x 2 )p(x 3 x 1, x 2 ) f b (x 2 ) = p(x 2 ) f c (x 1, x 2, x 3 ) = p(x 3 x 2, x 1 ) x 3 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 42 / 83

Factor Graphs from Undirected Graphs x 1 x 2 x 1 x 2 f x 1 x 2 f a f b x 3 x 3 ψ(x 1, x 2, x 3 ) f (x 1, x 2, x 3 ) f a (x 1, x 2, x 3 )f b (x 2, x 3 ) = ψ(x 1, x 2, x 3 ) = ψ(x 1, x 2, x 3 ) x 3 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 43 / 83

The Sum-Product Algorithm Objective Key Idea exact, efficient algorithm for computing marginals make allow multiple marginals to be computed efficiently The distributive Law ab + ac = a(b + c) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 45 / 83

The Sum-Product Algorithm Fs(x, Xs) f s µ fs x(x) x p(x) = p(x) x\x p(x) = s Ne(x) F s (x, X s ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 46 / 83

The Sum-Product Algorithm Fs(x, Xs) f s µ fs x(x) x p(x) = s Ne(x) Xs F s (x, X s ) = µ fs x(x) s Ne(x) µ fs x(x) = X s F s (x, X s ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 47 / 83

The Sum-Product Algorithm x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m, X sm ) F s (x, X s ) = f s (x, x 1,..., x M )G 1 (x 1, X s1 )... G M (x M, X sm ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 48 / 83

The Sum-Product Algorithm x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m, X sm ) µ fs x(x) = f s (x, x 1,..., x M ) x 1 x M = x 1 x M f s (x, x 1,..., x M ) m Ne(f s)\x m Ne(f s)\x G m (x m, X sm ) Xsm µ xm f s (x m ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 49 / 83

The Sum-Product Algorithm x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m, X sm ) µ xm fs (x m ) G m (x m, X sm ) = F l (x m, X lm ) X sm X sm l Ne(x m)\f s = µ fl x m (x m ) l Ne(x m)\f s Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 50 / 83

The Sum-Product Algorithm Initialization For variable nodes µ x f (x) = 1 For factor nodes x µ f x (x) = f(x) f f x Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 51 / 83

The Sum-Product Algorithm To compute local marginals Pick an arbitrary node as root Compute and propagate msgs from leafs to root (store msgs) Compute and propagate msgs from root to leaf nodes (store msgs) Compute products of received msgs and normalize as required Propagate up the tree and down again to compute all marginals (as needed) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 52 / 83

Introduction There are many examples where we are processing sequential data Everyday examples include Processing of stock data Speech analysis Traffic patterns Gesture analysis Manufacturing flow We cannot assume IID as a basis for analysis The Hidden Markov Model is frequently used Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 54 / 83

Apple Stock Quote Example Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 55 / 83

Markov model We have already talked about the Markov model. Estimation of joint probability as a sequential challenge N p(x 1,..., x N ) = p(x n x n 1,..., x 1 ) n=1 An Nth order model is dependent on the N last terms A 1st order model is then N P(x 1,..., x N ) = p(x n x n 1 ) n=1 I.e.: p(x n x n 1,..., x 1 ) = p(x n x n 1 ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 57 / 83

Markov chains x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 58 / 83

Hidden Markov Model z 1 z 2 z n 1 z n z n+1 x 1 x 2 x n 1 x n x n+1 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 60 / 83

Modelling of HMM s We can model the transition probabilities as a table A jk = p(z nk = 1 z n 1,j = 1) The conditionals are then (with a 1-out-of-K coding) p(z n z n 1, A) = K K A z n 1,j z nk jk k=1 j=1 The per element probability is expressed by π k = p(z 1k = 1) with k π k = 1 p(z 1 π) = K k=1 π z 1k k Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 61 / 83

Illustration of HMM k = 1 A 11 A 11 A 11 k = 2 k = 3 A 33 A 33 A 33 n 2 n 1 n n + 1 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 62 / 83

Maximum likelihood for the HMM If we observe a set of data X = {x 1,..., x N } we can estimate the parameters using ML p(x θ) = Z p(x, Z θ) I.e. summation of paths through lattice We can use EM as a strategy to find a solution E-Step: Estimation of p(z X, θ old ) M-Step: Maximize over θ to optimize Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 64 / 83

ML solution to HMM Define Q(θ, θ old ) = Z p(z X, θ old ) ln p(x, Z θ) γ(z n ) = p(z n X, θ old ) γ(z nk ) = E[z nk ] = z γ(z)z nk ξ(z n 1, z n ) = p(z n 1, z n X, θ old ) ξ(z n 1,j, z nk ) = z γ(z)z n 1,j z nk Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 65 / 83

ML solution to HMM The quantities can be computed π k = A jk = γ(z 1k ) K j=1 γ(z 1j) N n=2 ξ(z n 1,j, z nk ) K N l=1 n=2 ξ(z n 1,j, z nl ) Assume p(x φ k ) = N(x µ k, Σ k ) so that µ k = n γ(z nk)x n Σ k = n γ(z nk) How do we efficiently compute γ(z nk )? n γ(z nk)(x n µ k )(x n µ k ) T n γ(z nk) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 66 / 83

Forward-Backward / Baum-Welch How can we efficiently compute γ() and ξ(.,.)? Remember the HMM is a tree model Using message passing we can compute the model efficiently (remember earlier discussion?) We have two parts to the message passing forward and backward for any component We have From earlier we have γ(z n ) = p(z n X ) = P(X z n)p(z n ) p(x ) γ(z n ) = α(z n)β(z n ) p(x ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 68 / 83

Forward-Backward We can then compute α(z n ) = p(x n z n ) z n 1 α(z n 1 )p(z n z n 1 ) β(z n ) = z n+1 β(z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) p(x ) = z n α(z n )β(z n ) ξ(z n 1, z n ) = α(z n 1)p(x n z n )p(z n z n 1 )β(z n ) p(x ) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 69 / 83

Sum-product algorithms for the HMM Given the HMM is a tree structure Use of sum-product rule to compute marginals We can derive a simple factor graph for the tree χ z 1 z n 1 z n ψ n g 1 g n 1 g n x 1 x n 1 x n Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 70 / 83

Sum-product algorithms for the HMM We can then compute the factors h(z 1 ) = p(z 1 )p(x 1 z 1 ) f n (z n 1, z n ) = p(z n z n 1 )p(x n z n ) The update factors µ fn z n (z n ) can be used to derive message passing with α(.) and β(.) Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 71 / 83

Viterbi Algorithm Using the message passing framework it is possible to determine the most likely solution (ie best recognition) Intuitively Keep only track of the most likely / probably path through the graph At any time there are only K possible paths to maintain Basically a greedy evaluation of the best solution Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 73 / 83

Small example of gesture tracking Tracking of hands using an HMM to interpret track Pre-process images to generate tracks Color segmentation Track regions using Kalman Filter Interpret tracks using HMM Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 75 / 83

Pre-process architecture Camera RGB image Color segmentation Binary image Downscale Density map Thresholding and Connecting Lookup table N largest blobs Initial HSV thresholds Color model Adapted lookup table Tracker Histogram generation Head, left and right hand blobs Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 76 / 83

Basic idea Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 77 / 83

Tracking 3.5 x 10 3 3 2.5 2 Motion 1.5 1 A B C 0.5 Threshold 0 0 50 100 150 200 Frame Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 78 / 83

Motion Patterns Attention Idle Forward Back Left Right Turn left Turn right Faster Slower Stop Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 79 / 83

Evaluation Acquired 2230 image sequences Covering 5 people in a normal living room 1115 used for training 1115 sequences were used for evaluation Capture of position and velocity data Rec Rates Position Velocity Combined Result [%] 96.6 88.7 99.5 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 80 / 83

Example Timing Phase Time/frame[ms] Image Transfer 4.3 Segmentation 0.6 Density Est 2.1 Connect Comp 2.1 Kalman Filter 0.3 HMM 21.0 Total 30.4 Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 81 / 83

Summary The Hidden Markov Model (HMM) Many uses for sequential data models HMM is one possible formulation Autoregressive Models are common in data processing Henrik I. Christensen (RIM@GT) Graphical Models & HMMs 83 / 83