Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination
which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE): given evidence, find an variables? I.e. compute value assignments for query variables given evidence about Conditional probability query: what is the probability of different for a subset of variables? Likelihood: what is the probability of a given value assignment probability distribution: Bayesian networks can answer questions about the underlying Queries
Examples of MAP queries: In speech recognition, given a speech signal, one can attempt to reconstruct the most likely sequence of words that could have generated the signal. In classification, given the training data and a new example, we want to determine the most probable class label of the new example. values to the variables in given that : given a subset of variables, find the most likely assignment of Maximum a posteriori (MAP) query: given evidence, and Queries (continued)
efficiently. But for particular families of networks, inference can be done will work efficiently for all network configurations This implies that there is no general inference procedure that notes for details). whether is NP-hard (see Friedman and Koller s Given a Bayesian network and a random variable, deciding Complexity of inference
multiplications and additions for each of the values of. and has possible values, this requires operations: All the numbers required are in the CPTs. If has possible values How do we compute? Consider a simple chain of nodes: Likelihood inference in simple chains
use We, which is already computed, and the local CPT of node. Now how do we compute? Inference in simple chains (2)
out, we would have needed operations! If we would have generated the whole joint distribution and summed variables. values of variable ), and the algorithm is linear in the number of operations (where is the number of possible We compute iteratively. Each step only takes How do we compute? Inference in simple chains (3)
, with one entry for each value of This is a form of dynamic programming Then we can use this to compute a factor etc. can compute a factor The innermost summation depends only on the value of Suppose we want to compute : Let us examine the chain example again: Elimination of variables in chains.. So we.
graph is a tree. A Bayes network is called a polytree if the underlying undirected How do we compute? Consider the case when a node has more than one parent, e.g.: Pooling
depends on two variables. Note that in this case we have a more complex factor, which Suppose we want to compute. What if the network is not a polytree?
mentioning computations actually take place. Steps (a) and (b) eliminate variable ; this is where the 4. Return (d) Insert in (c) Let. (b) Let (a) Extract from all factors 3. For do: 2. Let 1. Initialize the set of factors: Given: A Bayes network and a set of query variables Variable elimination without evidence.
elimination Note that we can use any ordering of the variables during. Suppose we want to compute. So X-ray Dyspnea Abnormality in Chest Bronchitis Tuberculosis Lung Cancer Visit to Asia Smoking This example is taken from Koller and Friedman s notes: Example: Asia network
with the evidence. instead of. We eliminated the factor inconsistent Computing requires using Without knowing, computing required another factor: Suppose we know that. How do we compute? Predictive inference with evidence in chains
This can be viewed as a message passing from to.. can be computed using Bayes rule: the numerators for all values of. We do not need to compute, that comes out of summing We apply Bayes rule: that. How do we compute? Again the chain example:. Suppose we know Causal inference with evidence in chains
from (forward pass), and performs the computation. receives some information from (backward pass) and some computed using forward inference, just like before. is known from the CPT of node. can be Suppose we know that. How do we compute Causal inference with evidence in chains (2)?
How do we compute? We need. Inference with evidence in polytrees
Example: Asia network Suppose we observe that the person smokes and that the X-ray comes out negative. How does this change the variable elimination we did before? Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest Bronchitis X-ray Dyspnea First we reduce all factors to eliminate the parts inconsistent with the evidence
5. Return (d) Insert in (c) Let. (b) Let mentioning (a) Extract form all factors 4. For do: 3. Let portion (to be consistent with the evidence) 2. For each factor, if it contains, retain only the appropriate 1. Initialize the set of factors:. evidence. Given: A Bayes network, a set of query variables, and Variable elimination with evidence
variables is So to be efficient, it is important to have small factors. We need additions the maximum arity of a variable containing The size of a factor, where factor We need at most multiplications to create one entry in a Complexity of variable elimination is
, the induced graph generated by variable elimination. appear in an intermediate factor are connected by an edge if they both where and is an undirected graph over Given a Bayes net structure and an elimination ordering structure. We will try to understand the size of the factors in terms of the graph Induced graph
Example: Asia network For our previous example, let us construct the induced graph: Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest X-ray Dyspnea Note that the moralized graph is always a subgraph of the induced graph. Bronchitis
Cliques Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest Bronchitis X-ray Dyspnea A complete subgraph of is a subset of vertices such that each vertex is connected to every other vertex A clique is a maximal complete subgraph (one to which no vertices can be added)
Complexity of variable elimination Theorem: 1. Every clique in the induced graph corresponds to an intermediate factor in the computation 2. Every factor generated during variable elimination is a subset of some maximal clique. See Koller and Friedman notes for the proof details. Therefore, complexity is exponential in the size of the largest clique
clique is the largest family in the graph. The conclusion follows. induced graph is exactly the moral graph of the tree. So the largest Proof: We can order the nodes from the leaves inward. The network (which includes all the CPTs). for any can be solved in time linear in the size of the For the class of polytree networks, the problem of computing Consequence: Polytree inference
Heuristics for node ordering Maximum cardinality: Number the nodes from 1 to, always assigning the next number to the vertex having the largest set of previously numbered neighbors. Then eliminate nodes from to 1. Minimum discrepancy: Always eliminate the node that causes the fewest edges to be added to the induced graph Minimum size/weight: Eliminate the node that causes the smallest clique to be created (either in terms of number of nodes, or in terms of number of entries).
Summary General exact inference in Bayesian networks is NP-hard Variable elimination is a general algorithm for exact inference By analyzing variable elimination we can see the easy cases for inference: when the net is a polytree when the maximum clique of the induced graph is small Heuristics for ordering work pretty well in practice