Lecture 5: Exact inference

Similar documents
Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Machine Learning. Sourangshu Bhattacharya

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Lecture 4: Undirected Graphical Models

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Lecture 9: Undirected Graphical Models Machine Learning

CS 343: Artificial Intelligence

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks Inference

Probabilistic Graphical Models

Graphical Models. David M. Blei Columbia University. September 17, 2014

EE512 Graphical Models Fall 2009

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

10708 Graphical Models: Homework 2

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Markov Random Fields

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Junction tree propagation - BNDG 4-4.6

Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Lecture 11: May 1, 2000

Probabilistic Graphical Models

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

6 : Factor Graphs, Message Passing and Junction Trees

3 : Representation of Undirected GMs

CS 188: Artificial Intelligence

The Basics of Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Processing Probabilistic and Deterministic Graphical Models. Rina Dechter DRAFT

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Bayesian Networks Inference (continued) Learning

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

Introduction to Graphical Models

Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity.

Let G = (V, E) be a graph. If u, v V, then u is adjacent to v if {u, v} E. We also use the notation u v to denote that u is adjacent to v.

Greedy Algorithms. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms

Chapter 8 of Bishop's Book: Graphical Models

Approximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)

Constraint Satisfaction Problems

Introduction to Combinatorial Algorithms

Graphical Probability Models for Inference and Decision Making

Theorem 2.9: nearest addition algorithm

Decomposition of log-linear models

Topological parameters for time-space tradeoff

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

Sequence Labeling: The Problem

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Lecture 3: Conditional Independence - Undirected

Probabilistic Graphical Models

Learning Bounded Treewidth Bayesian Networks

ECE521 W17 Tutorial 10

CS 343: Artificial Intelligence

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Introduction to Graph Theory

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Markov Logic: Representation

Problem Set 2 Solutions

Loopy Belief Propagation

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

1 Minimal Examples and Extremal Problems

A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions

Directed Graphical Models

Probabilistic Graphical Models

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

Small Survey on Perfect Graphs

K 4 C 5. Figure 4.5: Some well known family of graphs

Chapter 23. Minimum Spanning Trees

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

Partha Sarathi Mandal

INFERENCE IN BAYESIAN NETWORKS

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

Section 8.2 Graph Terminology. Undirected Graphs. Definition: Two vertices u, v in V are adjacent or neighbors if there is an edge e between u and v.

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Learning Tractable Probabilistic Models Pedro Domingos

Learning Bayesian Networks (part 3) Goals for the lecture

Building Classifiers using Bayesian Networks

A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Graphical models and message-passing algorithms: Some introductory lectures

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

4 Factor graphs and Comparing Graphical Model Types

Transcription:

Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination

which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE): given evidence, find an variables? I.e. compute value assignments for query variables given evidence about Conditional probability query: what is the probability of different for a subset of variables? Likelihood: what is the probability of a given value assignment probability distribution: Bayesian networks can answer questions about the underlying Queries

Examples of MAP queries: In speech recognition, given a speech signal, one can attempt to reconstruct the most likely sequence of words that could have generated the signal. In classification, given the training data and a new example, we want to determine the most probable class label of the new example. values to the variables in given that : given a subset of variables, find the most likely assignment of Maximum a posteriori (MAP) query: given evidence, and Queries (continued)

efficiently. But for particular families of networks, inference can be done will work efficiently for all network configurations This implies that there is no general inference procedure that notes for details). whether is NP-hard (see Friedman and Koller s Given a Bayesian network and a random variable, deciding Complexity of inference

multiplications and additions for each of the values of. and has possible values, this requires operations: All the numbers required are in the CPTs. If has possible values How do we compute? Consider a simple chain of nodes: Likelihood inference in simple chains

use We, which is already computed, and the local CPT of node. Now how do we compute? Inference in simple chains (2)

out, we would have needed operations! If we would have generated the whole joint distribution and summed variables. values of variable ), and the algorithm is linear in the number of operations (where is the number of possible We compute iteratively. Each step only takes How do we compute? Inference in simple chains (3)

, with one entry for each value of This is a form of dynamic programming Then we can use this to compute a factor etc. can compute a factor The innermost summation depends only on the value of Suppose we want to compute : Let us examine the chain example again: Elimination of variables in chains.. So we.

graph is a tree. A Bayes network is called a polytree if the underlying undirected How do we compute? Consider the case when a node has more than one parent, e.g.: Pooling

depends on two variables. Note that in this case we have a more complex factor, which Suppose we want to compute. What if the network is not a polytree?

mentioning computations actually take place. Steps (a) and (b) eliminate variable ; this is where the 4. Return (d) Insert in (c) Let. (b) Let (a) Extract from all factors 3. For do: 2. Let 1. Initialize the set of factors: Given: A Bayes network and a set of query variables Variable elimination without evidence.

elimination Note that we can use any ordering of the variables during. Suppose we want to compute. So X-ray Dyspnea Abnormality in Chest Bronchitis Tuberculosis Lung Cancer Visit to Asia Smoking This example is taken from Koller and Friedman s notes: Example: Asia network

with the evidence. instead of. We eliminated the factor inconsistent Computing requires using Without knowing, computing required another factor: Suppose we know that. How do we compute? Predictive inference with evidence in chains

This can be viewed as a message passing from to.. can be computed using Bayes rule: the numerators for all values of. We do not need to compute, that comes out of summing We apply Bayes rule: that. How do we compute? Again the chain example:. Suppose we know Causal inference with evidence in chains

from (forward pass), and performs the computation. receives some information from (backward pass) and some computed using forward inference, just like before. is known from the CPT of node. can be Suppose we know that. How do we compute Causal inference with evidence in chains (2)?

How do we compute? We need. Inference with evidence in polytrees

Example: Asia network Suppose we observe that the person smokes and that the X-ray comes out negative. How does this change the variable elimination we did before? Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest Bronchitis X-ray Dyspnea First we reduce all factors to eliminate the parts inconsistent with the evidence

5. Return (d) Insert in (c) Let. (b) Let mentioning (a) Extract form all factors 4. For do: 3. Let portion (to be consistent with the evidence) 2. For each factor, if it contains, retain only the appropriate 1. Initialize the set of factors:. evidence. Given: A Bayes network, a set of query variables, and Variable elimination with evidence

variables is So to be efficient, it is important to have small factors. We need additions the maximum arity of a variable containing The size of a factor, where factor We need at most multiplications to create one entry in a Complexity of variable elimination is

, the induced graph generated by variable elimination. appear in an intermediate factor are connected by an edge if they both where and is an undirected graph over Given a Bayes net structure and an elimination ordering structure. We will try to understand the size of the factors in terms of the graph Induced graph

Example: Asia network For our previous example, let us construct the induced graph: Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest X-ray Dyspnea Note that the moralized graph is always a subgraph of the induced graph. Bronchitis

Cliques Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest Bronchitis X-ray Dyspnea A complete subgraph of is a subset of vertices such that each vertex is connected to every other vertex A clique is a maximal complete subgraph (one to which no vertices can be added)

Complexity of variable elimination Theorem: 1. Every clique in the induced graph corresponds to an intermediate factor in the computation 2. Every factor generated during variable elimination is a subset of some maximal clique. See Koller and Friedman notes for the proof details. Therefore, complexity is exponential in the size of the largest clique

clique is the largest family in the graph. The conclusion follows. induced graph is exactly the moral graph of the tree. So the largest Proof: We can order the nodes from the leaves inward. The network (which includes all the CPTs). for any can be solved in time linear in the size of the For the class of polytree networks, the problem of computing Consequence: Polytree inference

Heuristics for node ordering Maximum cardinality: Number the nodes from 1 to, always assigning the next number to the vertex having the largest set of previously numbered neighbors. Then eliminate nodes from to 1. Minimum discrepancy: Always eliminate the node that causes the fewest edges to be added to the induced graph Minimum size/weight: Eliminate the node that causes the smallest clique to be created (either in terms of number of nodes, or in terms of number of entries).

Summary General exact inference in Bayesian networks is NP-hard Variable elimination is a general algorithm for exact inference By analyzing variable elimination we can see the easy cases for inference: when the net is a polytree when the maximum clique of the induced graph is small Heuristics for ordering work pretty well in practice