Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Similar documents
Lecture 5: Exact inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

STA 4273H: Statistical Machine Learning

Machine Learning. Sourangshu Bhattacharya

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Probabilistic Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Lecture 4: Undirected Graphical Models

Lecture 9: Undirected Graphical Models Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Graphical Models. David M. Blei Columbia University. September 17, 2014

CS 343: Artificial Intelligence

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

10708 Graphical Models: Homework 2

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Bayesian Networks Inference

Probabilistic Graphical Models

EE512 Graphical Models Fall 2009

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Probabilistic Graphical Models

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

6 : Factor Graphs, Message Passing and Junction Trees

3 : Representation of Undirected GMs

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Markov Random Fields

The Basics of Graphical Models

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Junction tree propagation - BNDG 4-4.6

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Lecture 11: May 1, 2000

Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity.

Introduction to Graphical Models

Let G = (V, E) be a graph. If u, v V, then u is adjacent to v if {u, v} E. We also use the notation u v to denote that u is adjacent to v.

Greedy Algorithms. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

CS 188: Artificial Intelligence

Topological parameters for time-space tradeoff

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Lecture 3: Conditional Independence - Undirected

Probabilistic Graphical Models

Constraint Satisfaction Problems

Bayesian Networks Inference (continued) Learning

ECE521 W17 Tutorial 10

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Chapter 8 of Bishop's Book: Graphical Models

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Theorem 2.9: nearest addition algorithm

Problem Set 2 Solutions

Loopy Belief Propagation

Introduction to Combinatorial Algorithms

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

1 Minimal Examples and Extremal Problems

Decomposition of log-linear models

Sequence Labeling: The Problem

Approximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)

Learning Bounded Treewidth Bayesian Networks

Probabilistic Graphical Models

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

Directed Graphical Models

K 4 C 5. Figure 4.5: Some well known family of graphs

Chapter 23. Minimum Spanning Trees

Section 8.2 Graph Terminology. Undirected Graphs. Definition: Two vertices u, v in V are adjacent or neighbors if there is an edge e between u and v.

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Learning Bayesian Networks (part 3) Goals for the lecture

Building Classifiers using Bayesian Networks

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Graphical models and message-passing algorithms: Some introductory lectures

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

CS 343: Artificial Intelligence

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

4 Factor graphs and Comparing Graphical Model Types

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Introduction to Graph Theory

Small Survey on Perfect Graphs

Undirected Graphical Models. Raul Queiroz Feitosa

CONNECTIVITY AND NETWORKS

Reasoning About Uncertainty

COMP260 Spring 2014 Notes: February 4th

Colouring graphs with no odd holes

Kyle Gettig. Mentor Benjamin Iriarte Fourth Annual MIT PRIMES Conference May 17, 2014

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

Structural EM Learning Bayesian Networks and Parameters from Incomplete Data

Approximation Algorithms

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning

Transcription:

given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an variables? Ie compute value assignments for query variables given evidence about Conditional probability query: what is the probability of different for a subset of variables? Likelihood: what is the probability of a given value assignment probability distribution: Bayesian networks can answer questions about the underlying Queries 1 Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination Lecture 5: Exact inference 4 efficiently But for particular families of networks, inference can be done will work efficiently for all network configurations This implies that there is no general inference procedure that notes for details whether is NPhard (see Friedman and Koller s Given a Bayesian network and a random variable, deciding Complexity of inference 3 example want to determine the most probable class label of the new In classification, given the training data and a new example, we generated the signal to reconstruct the most likely sequence of words that could have In speech recognition, given a speech signal, one can attempt Examples of MAP queries: values to the variables in : given a subset of variables, find the most likely assignment of, and Queries (continued

, with one entry for each value of and 6 node We use, which is already computed, and the local CPT of Now how do we compute? Inference in simple chains (2 5 multiplications and additions for each of the values of has possible values, this requires operations: All the numbers required are in the CPTs If has possible values? Consider a simple chain of nodes: Likelihood inference in simple chains 8 This is a form of dynamic programming Then we can use this to compute a factor etc can compute a factor The innermost summation depends only on the value of So we Let us examine the chain example again: Suppose we want to compute : Elimination of variables in chains 7 out, we would have needed operations If we would have generated the whole joint distribution and summed variables values of variable, and the algorithm is linear in the number of operations (where is the number of possible We compute iteratively Each step only takes? Inference in simple chains (3

Suppose we want to compute * mentioning Given: A Bayes network and a set of query variables 10 depends on two variables Note that in this case we have a more complex factor, which What if the network is not a polytree? 9 graph is a tree A Bayes network is called a polytree if the underlying undirected? Consider the case when a node has more than one parent, eg: Pooling 12 elimination Note that we can use any ordering of the variables during 3 4 5 6 Suppose we want to compute So Xray This example is taken from Koller and Friedman s notes: Example: Asia network 11 computations actually take place Steps (a and (b eliminate variable ; this is where the 012 4 Return (d Insert ( in (c Let, /, * + (b Let ( (a Extract from all factors 3 For do: 2 Let 1 Initialize the set of factors: Variable elimination without evidence

, computing 14 This can be viewed as a message passing from to can be computed using Bayes rule: the numerators for all values of We do not need to compute, that comes out of summing We apply Bayes rule: that? Again the chain example: Suppose we know Causal inference with evidence in chains 13 with the evidence instead of We eliminated the factor inconsistent Computing requires using Without knowing required another factor: Suppose we know that? Predictive inference with evidence in chains 16? We need Inference with evidence in polytrees 15 from (forward pass, and performs the computation receives some information from (backward pass and some computed using forward inference, just like before is known from the CPT of node can be Suppose we know that? Causal inference with evidence in chains (2

in generated by variable elimination * mentioning, the induced graph containing 18 012 5 Return (d Insert, (c Let, /, * + (b Let (a Extract form all factors 4 For do: 3 Let portion (to be consistent with the evidence 2 For each factor, if it contains, retain only the appropriate 1 Initialize the set of factors: evidence Given: A Bayes network, a set of query variables, and Variable elimination with evidence 17 the evidence First we reduce all factors to eliminate the parts inconsistent with Xray we did before? comes out negative How does this change the variable elimination Suppose we observe that the person smokes and that the Xray Example: Asia network 20 appear in an intermediate factor where and * are connected by an edge if they both is an undirected graph over Given a Bayes net structure and an elimination ordering structure We will try to understand the size of the factors in terms of the graph Induced graph 19 So to be efficient, it is important to have small factors We need additions the maximum arity of a variable The size of a factor variables is, where is factor We need at most multiplications to create one entry in a Complexity of variable elimination

A complete subgraph of Example: Asia network For our previous example, let us construct the induced graph: Xray Note that the moralized graph is always a subgraph of the induced graph 21 Cliques Xray is a subset of vertices such that each vertex is connected to every other vertex A clique is a maximal complete subgraph (one to which no vertices can be added 22 24 clique is the largest family in the graph The conclusion follows induced graph is exactly the moral graph of the tree So the largest Proof: We can order the nodes from the leaves inward The network (which includes all the CPTs for any can be solved in time linear in the size of the For the class of polytree networks, the problem of computing Consequence: Polytree inference 23 Therefore, complexity is exponential in the size of the largest clique Theorem: 1 Every clique in the induced graph corresponds to an intermediate factor in the computation 2 Every factor generated during variable elimination is a subset of some maximal clique See Koller and Friedman notes for the proof details Complexity of variable elimination

Heuristics for node ordering Maximum cardinality: Number the nodes from 1 to, always assigning the next number to the vertex having the largest set of previously numbered neighbors Then eliminate nodes from to 1 Minimum discrepancy: Always eliminate the node that causes the fewest edges to be added to the induced graph Minimum size/weight: Eliminate the node that causes the smallest clique to be created (either in terms of number of nodes, or in terms of number of entries 25 Summary General exact inference in Bayesian networks is NPhard Variable elimination is a general algorithm for exact inference By analyzing variable elimination we can see the easy cases for inference: when the net is a polytree when the maximum clique of the induced graph is small Heuristics for ordering work pretty well in practice 26