Machine Learning. Sourangshu Bhattacharya

Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Graphical Models & HMMs

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Bayesian Machine Learning - Lecture 6

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Machine Learning

COMP90051 Statistical Machine Learning

Graphical Models Part 1-2 (Reading Notes)

Lecture 4: Undirected Graphical Models

Lecture 9: Undirected Graphical Models Machine Learning

ECE521 W17 Tutorial 10

CS 343: Artificial Intelligence

Reasoning About Uncertainty

Lecture 5: Exact inference

Machine Learning Lecture 16

Max-Sum Inference Algorithm

Scene Grammars, Factor Graphs, and Belief Propagation

6 : Factor Graphs, Message Passing and Junction Trees

Scene Grammars, Factor Graphs, and Belief Propagation

Graphical Models. David M. Blei Columbia University. September 17, 2014

Chapter 8 of Bishop's Book: Graphical Models

Introduction to Graphical Models

3 : Representation of Undirected GMs

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Probabilistic Graphical Models

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Machine Learning

The Basics of Graphical Models

Machine Learning

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Machine Learning

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Building Classifiers using Bayesian Networks

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Directed Graphical Models

Algorithms for Markov Random Fields in Computer Vision

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Directed Graphical Models (Bayes Nets) (9/4/13)

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

COMP90051 Statistical Machine Learning

Lecture 3: Conditional Independence - Undirected

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

CS 188: Artificial Intelligence

Sequence Labeling: The Problem

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Introduction to information theory and coding - Lecture 1 on Graphical models

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Machine Learning

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Junction tree propagation - BNDG 4-4.6

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Exact Inference: Elimination and Sum Product (and hidden Markov models)

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Markov Logic: Representation

Graphical Models and Markov Blankets

CS 6784 Paper Presentation

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Probabilistic Graphical Models

Bayes Net Learning. EECS 474 Fall 2016

7. Boosting and Bagging Bagging

Collective classification in network data

From the Jungle to the Garden: Growing Trees for Markov Chain Monte Carlo Inference in Undirected Graphical Models

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

Large Scale Data Analysis Using Deep Learning

Graphical models and message-passing algorithms: Some introductory lectures

Probabilistic Graphical Models

Structured Learning. Jun Zhu

Integrating Probabilistic Reasoning with Constraint Satisfaction

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Markov/Conditional Random Fields, Graph Cut, and applications in Computer Vision

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Loopy Belief Propagation

CS 664 Flexible Templates. Daniel Huttenlocher

Transcription:

Machine Learning Sourangshu Bhattacharya

Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks General Factorization

Curve Fitting Re-visited

Maximum Likelihood Determine by minimizing sum-of-squares error,.

Curve Fitting Polynomial

Curve Fitting Plate

Predictive Distribution

MAP: A Step towards Bayes Determine by minimizing regularized sum-of-squares error,.

Bayesian Curve Fitting (3) Input variables and explicit hyperparameters

Bayesian Curve Fitting Learning Condition on data

Generative Models Causal process for generating images

Discrete Variables (1) General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K { 1) parameters

Discrete Variables (2) General joint distribution over M variables: K M { 1 parameters M -node Markov chain: K { 1 + (M { 1)K(K { 1) parameters

Conditional Independence a is independent of b given c Equivalently Notation

Conditional Independence: Example 1

Conditional Independence: Example 1

Conditional Independence: Example 2

Conditional Independence: Example 2

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c unobserved.

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c observed.

Am I out of fuel? and hence B = F = G = Battery (0=flat, 1=fully charged) Fuel Tank (0=empty, 1=full) Fuel Gauge Reading (0=empty, 1=full)

Am I out of fuel? Probability of an empty tank increased by observing G = 0.

Am I out of fuel? Probability of an empty tank reduced by observing B = 0. This referred to as explaining away.

D-separation A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked if it contains a node such that either a) the arrows on the path meet either head-to-tail or tailto-tail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d- separated from B by C. If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies.

D-separation: Example

D-separation: Example

D-separation: I.I.D. Data

Markov Random Fields Markov Blanket

Cliques and Maximal Cliques Clique Maximal Clique

Joint Distribution where is the potential over clique C and is the normalization coefficient; note: M K-state variables K M terms in Z. Energies and the Boltzmann distribution

Illustration: Image De-Noising (1) Original Image Noisy Image

Illustration: Image De-Noising (2)

Illustration: Image De-Noising (3) Noisy Image Restored Image (ICM)

Converting Directed to Undirected Graphs (1)

Converting Directed to Undirected Graphs (2) Additional links Moralization Moral Graph

Directed vs. Undirected Graphs (1)

Directed vs. Undirected Graphs (2)

Inference in Graphical Models

Inference in Graphical Models Posterior Marginalization: P x n y 1,, y n = P(x 1,, x N y 1,, y M ) x 1,,x n 1,x n+1,,x N M A P: x 1,, x N = max P(x 1,, x N y 1,, y M ) x 1,,x N

Conditional Random Fields Conditional Probability: P X Y = 1 Z exp(wt f X, Y ) f X, Y = [ψ 12 x 1, x 2, Y,, ψ n 1,n x n 1, x n, Y ]... X 1 X 2 X n Y 1 Y 2 Y n

Hidden Markov Model Observation sequence: Y 1,, Y n ; State sequence: X 1,, X n Transition Prob.: P X i+1 X i ; Emission prob.: P(Y i X i ). n 1 P X, Y = P(X 1 ) P(X i+1 X i ) P(Y j X j ) n i=1 j=1... X 1 X 2 X n Y 1 Y 2 Y n

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain To compute local marginals: Compute and store all forward messages,. Compute and store all backward messages,. Compute Z at any node x m Compute for all variables required.

Trees Undirected Tree Directed Tree Polytree

Factor Graphs

Factor Graphs from Directed Graphs

Factor Graphs from Undirected Graphs

The Sum-Product Algorithm (1) Objective: i. to obtain an efficient, exact inference algorithm for finding marginals; ii. in situations where several marginals are required, to allow computations to be shared efficiently. Key idea: Distributive Law

The Sum-Product Algorithm (2)

The Sum-Product Algorithm (3)

The Sum-Product Algorithm (4)

The Sum-Product Algorithm (5)

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (7) Initialization

The Sum-Product Algorithm (8) To compute local marginals: Pick an arbitrary node as root Compute and propagate messages from the leaf nodes to the root, storing received messages at every node. Compute and propagate messages from the root to the leaf nodes, storing received messages at every node. Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Sum-Product: Example (1)

Sum-Product: Example (2)

Sum-Product: Example (3)

Sum-Product: Example (4)

The Max-Sum Algorithm (1) Objective: an efficient algorithm for finding i. the value x max that maximises p(x); ii. the value of p(x max ). In general, maximum marginals joint maximum.

The Max-Sum Algorithm (2) Maximizing over a chain (max-product)

The Max-Sum Algorithm (3) Generalizes to tree-structured factor graph maximizing as close to the leaf nodes as possible

The Max-Sum Algorithm (4) Max-Product Max-Sum For numerical reasons, use Again, use distributive law

The Max-Sum Algorithm (5) Initialization (leaf nodes) Recursion

The Max-Sum Algorithm (6) Termination (root node) Back-track, for all nodes i with l factor nodes to the root (l=0)