Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Similar documents
4 Factor graphs and Comparing Graphical Model Types

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Loopy Belief Propagation

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Machine Learning. Sourangshu Bhattacharya

STA 4273H: Statistical Machine Learning

Algorithms for Markov Random Fields in Computer Vision

Lecture 4: Undirected Graphical Models

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

Probabilistic Graphical Models

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Bayesian Classification Using Probabilistic Graphical Models

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Describe the two most important ways in which subspaces of F D arise. (These ways were given as the motivation for looking at subspaces.

Markov Networks in Computer Vision

Expectation Propagation

10 Sum-product on factor tree graphs, MAP elimination

Markov Networks in Computer Vision. Sargur Srihari

Bayesian Machine Learning - Lecture 6

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Lecture 3: Conditional Independence - Undirected

3 : Representation of Undirected GMs

Recommended Readings

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Bayesian Networks Inference

Building Classifiers using Bayesian Networks

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

The Basics of Graphical Models

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Introduction to Graphical Models

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Chapter 8 of Bishop's Book: Graphical Models

SVM: Multiclass and Structured Prediction. Bin Zhao

Introduction to SLAM Part II. Paul Robertson

CS 231A Computer Vision (Fall 2012) Problem Set 3

Graphical Models. David M. Blei Columbia University. September 17, 2014

Lecture 11: Clustering Introduction and Projects Machine Learning

Time series, HMMs, Kalman Filters

Conditional Random Fields for Object Recognition

Challenges motivating deep learning. Sargur N. Srihari

Due dates are as mentioned above. Checkoff interviews for PS2 and PS3 will be held together and will happen between October 4 and 8.

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

CS 223B Computer Vision Problem Set 3

Spring Localization II. Roland Siegwart, Margarita Chli, Juan Nieto, Nick Lawrance. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Spring Localization II. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Generative and discriminative classification techniques

Search Engines. Information Retrieval in Practice

6 : Factor Graphs, Message Passing and Junction Trees

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Collective classification in network data

Image Restoration using Markov Random Fields

Day 3 Lecture 1. Unsupervised Learning

Integrating Probabilistic Reasoning with Constraint Satisfaction

Markov/Conditional Random Fields, Graph Cut, and applications in Computer Vision

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

COS Lecture 13 Autonomous Robot Navigation

Belief propagation and MRF s

Deep Generative Models Variational Autoencoders

Artificial Intelligence for Robotics: A Brief Summary

Probabilistic Robotics

Graphical Models & HMMs

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

MauveDB: Statistical Modeling inside Database Systems

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

Sequential Dependency and Reliability Analysis of Embedded Systems. Yu Jiang Tsinghua university, Beijing, China

ECE521 W17 Tutorial 10

Scene Grammars, Factor Graphs, and Belief Propagation

Dynamic Programming and the Graphical Representation of Error-Correcting Codes

Module 6 STILL IMAGE COMPRESSION STANDARDS

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Introduction

Error Protection of Wavelet Coded Images Using Residual Source Redundancy

Scene Grammars, Factor Graphs, and Belief Propagation

Learning and Recognizing Visual Object Categories Without First Detecting Features

A New Approach to Early Sketch Processing

MR IMAGE SEGMENTATION

Transcription:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex engineering settings, providing a mathematical take on an engineering subject. While driven by applications, the course is not about these applications. Rather, it primarily provides a common foundation for inference-related questions arising in various fields not limited to machine learning, signal processing, artificial intelligence, computer vision, control, and communication. Our focus is on the general problem of inference: learning about the generically hidden state of the world that we care about from available observations. This course is part of a two course sequence: 6.437 and 6.438: 6.437 Inference and Information is about fundamental principles and concepts of inference. It is offered in the spring. 6.438 Algorithms for Inference is about fundamental structures and algorithms for inference. It is offered (now) in the fall. A key theme is that constraints imposed on system design are what make engineering challenging and interesting. To this end, 6.438 Algorithms for Inference is about recognizing structure in inference problems that lead to efficient algorithms. 1.1 Examples To motivate the course, we begin with some highly successful applications of graphical models and simple, efficient inference algorithms. 1. Navigation. Consider control and navigation of spacecraft (e.g., lunar landings, guidance of shuttles) with noisy observations from various sensors, depicted in Figure 1. Abstractly, these scenarios can be viewed as a setup where noisy observations of a hidden state are obtained. Accurately inferring the hidden state allows us to control and achieve a desired trajectory for spacecraft. Formally, such scenarios are well modeled by an undirected Guassian graphical model shown in Figure 2. An efficient inference algorithm for this graphical model is the Kalman filter, developed in the early 1960 s. Noise Spacecraft dynamics Sensors Controller Estimation of state Figure 1: Navigation feedback control in the presence of noisy observations.

s 1 s 2 s 3 s 4 y 1 y 2 y 3 y 4 Figure 2: Undirected Gaussian graphical model for a linear dynamical system, used to model control and navigation of spacecraft. Efficient inference algorithm: Kalman filtering. 2. Error correcting codes. In a communication system, we want to send a message, encoded as k information bits, over a noisy channel. Because the channel may corrupt the message, we add redundancy to our message by encoding each message as an n-bit code sequence or codeword, where n > k. The receiver s task is to recover or decode the original message bits from the received, corrupted codeword bits. A diagram showing this system is in Figure 3. The decoding task is an inference problem. The structure of coded bits plays an important role in inference. k info bits Coder n code bits Channel (corruption) Decoder recover codeword Figure 3: Simple digital communication system. recover k info bits An example of a scheme to encode message bits as codeword bits is the Hamming (7, 4) code, represented as a graphical model in Figure 4. Here, each 4-bit message is first mapped to a 7-bit codeword before transmission over the noisy channel. Note that there are 2 4 = 16 different 7-bit codewords (among 2 7 = 128 possible 7-bit sequences), each codeword corresponding to one of 16 possible 4-bit messages. The 16 possible codewords can be described by means of constraints on the codeword bits. These constraints are represented via the graphical model in Figure 4, an example of a factor graph. Note that this code requires 7 bits to be transmitted to convey 4 bits, so the effective code rate is 4/7. The task of the receiver, as explained above, is to decode the 4-bit message that was sent based on the observations about the 7 received, possibly corrupted coded bits. The goal of the decoding algorithm is to infer the 4 message bits using the constraints in the Hamming code and some noise model for the channel. Given the graphical model representation, efficient decoding can be done using the loopy belief propagation algorithm, even for large codeword lengths (e.g., one million transmitted bits rather than 7). 2

x 5 x 1 x 2 x 4 x 3 x 6 x 7 Figure 4: Factor graph representing the Hamming (7, 4) code. Efficient inference algorithm: loopy belief propagation. 3. Voice recogniziation. Consider the task of recognizing the voice of specific individuals based on an audio signal. One way to do this is to take the audio input, segment it into 10ms (or some small) time interval, and then capture appropriate signature or structure (in the form of frequency response) of each of these time sgements (the so-called cepstral coefficient vector or the features ). Speech has structure that is captured through correlation in time, i.e., what one says now and soon after are correlated. A succinct way to represent this correlation is via an undirected graphical model called a Hidden Markov model, shown in Figure 5. This model is similar to the graphical model considered for the navigation example earlier. The difference is that the hidden states in this speech example are discrete rather than continuous. The Viterbi algorithm provides an efficient, message-passing implementation for inference in such setting. x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 Figure 5: Hidden Markov model for voice recognization. Efficient inference algorithm: Viterbi. 3

x 1 x 3 x 2 x 4 Figure 6: Markov random field for image processing. Efficient inference algorithm: loopy belief propagation. 4. Computer vision/image processing. Suppose we have a low resolution image and want to come up with a higher resolution version of it. This is an inference problem as we wish to predict or infer from a low resolution image (observation) how a high resolution image may appear (inference). Similarly, if an image is corrupted by noise, then denoising the image can be viewed as an inference problem. The key observation here is that images have structural properties. Specifically, in many real-world images, nearby pixels look similar. Such a similarity constraint can be captured by a nearest-neighbor graphical model of image. This is also known as a Markov random field (MRF). An example of an MRF is shown in Figure 6. The loopy belief propagation algorithm provides an efficient inference solution for such scenarios. 1.2 Inference, Complexity, and Graphs Here we provide the key topics that will be the focus of this course: inference problems of interest in graphical models and the role played by the structure of graphical models in designing inference algorithms. Inference problems. Consider a collection of random variables x = (x 1,..., x N ) and let the observations about them be represented by random variables y = (y 1,..., y N ). Let each of these random variables x i, 1 i N, take on a value in X (e.g., X = {0, 1} or R) and each observation variable y i, 1 i N take on a value in Y. Given observation y = y, the goal is to say something meaningful about possible realizations of x = x for x X N. There are two primary computation problems of interest: 4

1. Calculating (posterior) beliefs: p x,y (x, y) p x y (x y) = p y (y) p x,y (x, y) = _. (1) ' x / X p N x,y (x, y) The denominator in equation (1) is also called the partition function. In general, computing the partition function is expensive. This is also called the marginalization operation. For example, suppose X is a discrete set. Then, p x1 (x 1 ) = p x1,x 2 (x 1, x 2 ). (2) x 2 X But this requires X number of operations for a fixed x 1. There are X many values of x 1, so overall, the number of operations needed scales as X 2. In general, if we are thinking of N variables, then this starts scaling like X N. This is not surprising since, without any additional structure, a distribution over N variables with each variable taking on values in X requires storing a table of size X N, where each entry contains the probability of a particular realization. So without structure, computing posterior has complexity exponential in the number of variables N. 2. Calculating the most probable configuration also called the maximum a posteriori (MAP) estimate: Therefore, xˆ arg max p x y (x y). (3) x X N xˆ arg max p x y (x y) x X N p x,y(x, y) = arg max x X N p y (y) = arg max p x,y (x, y). (4) x X N Without any additional structure, the above optimization problem requires searching over the entire space X N, resulting in an exponential complexity in the number of variables N. Structure. Suppose now that the N random variables from before are independent, meaning that we have the following factorization: p x1,x 2,...,x N (x 1, x 2,..., x N ) = p x1 (x 1 )p x2 (x 2 ) p xn (x N ). (5) 5

Then posterior belief calculation can be done separately for each variable. Computing the posterior belief of a particular variable has complexity X. Similarly, MAP estimation can be done by finding each variable s assignment that maximizes its own probability. As there are N variables, the computational complexity of MAP estimation is thus N X. Thus, independence or some form of factorization enables efficient computation of both posterior beliefs (marginalization) and MAP estimation. By exploiting factorizations of joint probability distributions and representing these factorizations via graphical models, we will achieve huge computational efficiency gains. Graphical models. In this course, we will examine the following types of graphical models: 1. Directed acyclic graphs (DAG s), also called Bayesian networks 2. Undirected graphs, also called Markov Random Fields 3. Factor graphs which more explicitly capture algebraic constraints Recall that a graph constains a collection of nodes and edges. Each edge, in the directed case, has an orientation. The nodes represent variables, and edges represent constraints between variables. Later on, we will see that for some inference algorithms, we can also think of each node as a processor and each edge as a communication link. 6

MIT OpenCourseWare http://ocw.mit.edu 6.438 Algorithms for Inference Fall 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.