ECE521 Lecture 18 Graphical Models Hidden Markov Models

Similar documents
ECE521 Lecture 21 HMM cont. Message Passing Algorithms

STA 4273H: Statistical Machine Learning

COMP90051 Statistical Machine Learning

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Machine Learning

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Machine Learning

Machine Learning

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

K-Means and Gaussian Mixture Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

ECE521 W17 Tutorial 10

Graphical Models. David M. Blei Columbia University. September 17, 2014

Machine Learning. Supervised Learning. Manfred Huber

Chapter 8 of Bishop's Book: Graphical Models

The Basics of Graphical Models

3 : Representation of Undirected GMs

Directed Graphical Models (Bayes Nets) (9/4/13)

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Expectation Propagation

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Bayesian Networks Inference (continued) Learning

Building Classifiers using Bayesian Networks

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Collective classification in network data

Introduction to Data Mining

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Probabilistic Graphical Models

Machine Learning Lecture 3

Machine Learning Lecture 3

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Lecture 3: Conditional Independence - Undirected

Introduction to Hidden Markov models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

Bayesian Networks Inference

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Quantitative Biology II!

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Inference and Representation

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Machine Learning

Image Processing. Filtering. Slide 1

A Brief Introduction to Bayesian Networks. adapted from slides by Mitch Marcus

10.4 Linear interpolation method Newton s method

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Loopy Belief Propagation

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Big Data Analytics CSCI 4030

Machine Learning. Sourangshu Bhattacharya

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

COMP90051 Statistical Machine Learning

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

STA 4273H: Statistical Machine Learning

CS 343: Artificial Intelligence

Probabilistic Graphical Models

Lecture 5: Exact inference

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Machine Learning Lecture 3

Gaussian Processes, SLAM, Fast SLAM and Rao-Blackwellization

Sequence Labeling: The Problem

Probabilistic Graphical Models

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Introduction to Machine Learning CMU-10701

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

10-701/15-781, Fall 2006, Final

Probabilistic Graphical Models

Machine Learning

Today s outline: pp

Structured Learning. Jun Zhu

Mean Field and Variational Methods finishing off

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

ECG782: Multidimensional Digital Signal Processing

Machine Learning. Chao Lan

Graphical Models Part 1-2 (Reading Notes)

Introduction to Graphical Models

Probabilistic Graphical Models

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Clustering Lecture 5: Mixture Model

Developing MapReduce Programs

Probabilistic Graphical Models

Hidden Markov Models in the context of genetic analysis

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Graphical Models & HMMs

Summary: A Tutorial on Learning With Bayesian Networks

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Bayes Net Learning. EECS 474 Fall 2016

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Transcription:

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 2

Graphical models There are two ways to represent a joint probability distributions over some random variables. Express the joint distribution as a product of conditional distributions e.g. Energy-based representation: product of potential functions (or factors) e.g. The conditional independence between the random variables are explicitly captured by the graphical models. 3

Graphical models Advantages and disadvantages of the energy-based representation: Good: it is very easy to define a proper probability distribution using almost any potential functions. We can turn any loss function into a probability distribution through the energy-based representation Bad: evaluation of the PDF involves computing the normalization constant Z (or the partition function) that can be very expensive, ie. summation over all the random variable states. (If we are clever about the conditional independence, computing this summation can be efficient.) Bad: learning using MLE over the energy-based representation requires taking the gradient of the normalization constant Z w.r.t. the model parameters. (Similarly, we can perform the exact gradient-based learning efficiently for some graphical models.) 4

Graphical models Bayesian networks represent the product of conditionals. Markov random fields (and factor graph) represent the sum of energies. Bayesian networks Markov random fields Factor graph 5

Graphical models Conversion between the graphical model types: naive Bayes model mixture of Gaussians with diagonal Gaussians Bayesian networks Markov random fields Factor graph conditional independence: marginal independence: 6

Graphical models Conversion between the graphical model types: Markov chain models Bayesian networks Markov random fields Factor graph conditional independence: marginal independence: 7

Graphical models Conversion between the graphical model types: Bayesian networks Markov random fields Factor graph explain away conditional independence: marginal independence: different marginal! 8

Graphical models Representing factorizations of the joint distributions using graphical models: MRF BN FG All possible factorizations 9

Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 10

Markov Models Remember the Markov chain model for a sequence (an ordered list) of observed random variables that only depends on the most immediate previous observation: e.g. the stock price today are conditionally independent of the entire history given yesterday.... xd Although this is memory efficient and cheap to compute, the conditional independence assumptions on observations are too restrictive. 11

High order Markov models One solution to incorporate long term dependency is to construct a Markov model of degree N that depends on the N-1 most immediate previous observations. Definition: e.g. degree 3: x4 x5 x6 12

High order Markov models Sequence (language) modeling is the one of the fundamental tasks in natural language processing (NLP). The chain structure graphical models, i.e. Markov models, ideally captures the causal process in generating words. Namely, we generate(write) the current word given the previous words. One of the simplest language model is the N-grams model, which are the N degree Markov models. E.g: 13

High order Markov models N-grams model defines a frequency count table for all of the N-tuple words appeared in a dataset. (e.g. wikipedia or millions of printed books) previous words current word Frequency w_k-2 w_k-1 w_k P(w_k w_k-1, w_k-2) this is a 0.0002 is a sentence 1e-8............ 14

High order Markov models Google N-grams viewer, one of the largest N-grams models in existence. Application: study language and culture evolution over time Source: Quantitative Analysis of Culture Using Millions of Digitized Books, 2011 15

High order Markov models Google N-grams viewer, one of the largest N-grams models in existence. Application: study language and culture evolution over time Source: Quantitative Analysis of Culture Using Millions of Digitized Books, 2011 16

High order Markov models High order Markov models can model any close range dependencies in a sequence of random variables very well. Problem: computational costs grow exponentially with the degree N. It becomes less practical to model any long range dependencies with high order Markov models. So far we have tried to directly model dependencies in an observation sequence. One idea to model more interesting and complicated dependencies is to incorporate latent variables into the sequence modelling. 17

Hidden Markov models Simplest latent variable model that captures long range dependencies: Z naive Bayes x4 x5 x6 Problem: the conditional independence assumptions on all observations are too restrictive. Naive Bayes is non-markovian (i.e. the value of also depends on x4, x5,...) and can not model the sequential nature of the input data. (future depends on the past) Solution: construct a hidden Markov chain over the latent variables. 18

Hidden Markov models Formally, a hidden Markov model is defined on a sequence of observed random variables through its corresponding latent sequence. State variables: z1 z2 z3... zd Hidden & Markov Observations:... xd 19

Hidden Markov models Conditional independence: The hidden states are connected through a degree 2 Markov chain The future observations and the past observations are conditionally independent given the current hidden state The present is dependent on the entire past observations. State variables: z1 z2 z3... zd Observations:... xd 20