Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an variables? Ie compute value assignments for query variables given evidence about Conditional probability query: what is the probability of different for a subset of variables? Likelihood: what is the probability of a given value assignment probability distribution: Bayesian networks can answer questions about the underlying Queries 1 Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination Lecture 5: Exact inference 4 efficiently But for particular families of networks, inference can be done will work efficiently for all network configurations This implies that there is no general inference procedure that notes for details whether is NPhard (see Friedman and Koller s Given a Bayesian network and a random variable, deciding Complexity of inference 3 example want to determine the most probable class label of the new In classification, given the training data and a new example, we generated the signal to reconstruct the most likely sequence of words that could have In speech recognition, given a speech signal, one can attempt Examples of MAP queries: values to the variables in : given a subset of variables, find the most likely assignment of, and Queries (continued

, with one entry for each value of and 6 node We use, which is already computed, and the local CPT of Now how do we compute? Inference in simple chains (2 5 multiplications and additions for each of the values of has possible values, this requires operations: All the numbers required are in the CPTs If has possible values? Consider a simple chain of nodes: Likelihood inference in simple chains 8 This is a form of dynamic programming Then we can use this to compute a factor etc can compute a factor The innermost summation depends only on the value of So we Let us examine the chain example again: Suppose we want to compute : Elimination of variables in chains 7 out, we would have needed operations If we would have generated the whole joint distribution and summed variables values of variable, and the algorithm is linear in the number of operations (where is the number of possible We compute iteratively Each step only takes? Inference in simple chains (3

Suppose we want to compute * mentioning Given: A Bayes network and a set of query variables 10 depends on two variables Note that in this case we have a more complex factor, which What if the network is not a polytree? 9 graph is a tree A Bayes network is called a polytree if the underlying undirected? Consider the case when a node has more than one parent, eg: Pooling 12 elimination Note that we can use any ordering of the variables during 3 4 5 6 Suppose we want to compute So Xray This example is taken from Koller and Friedman s notes: Example: Asia network 11 computations actually take place Steps (a and (b eliminate variable ; this is where the 012 4 Return (d Insert ( in (c Let, /, * + (b Let ( (a Extract from all factors 3 For do: 2 Let 1 Initialize the set of factors: Variable elimination without evidence

, computing 14 This can be viewed as a message passing from to can be computed using Bayes rule: the numerators for all values of We do not need to compute, that comes out of summing We apply Bayes rule: that? Again the chain example: Suppose we know Causal inference with evidence in chains 13 with the evidence instead of We eliminated the factor inconsistent Computing requires using Without knowing required another factor: Suppose we know that? Predictive inference with evidence in chains 16? We need Inference with evidence in polytrees 15 from (forward pass, and performs the computation receives some information from (backward pass and some computed using forward inference, just like before is known from the CPT of node can be Suppose we know that? Causal inference with evidence in chains (2

in generated by variable elimination * mentioning, the induced graph containing 18 012 5 Return (d Insert, (c Let, /, * + (b Let (a Extract form all factors 4 For do: 3 Let portion (to be consistent with the evidence 2 For each factor, if it contains, retain only the appropriate 1 Initialize the set of factors: evidence Given: A Bayes network, a set of query variables, and Variable elimination with evidence 17 the evidence First we reduce all factors to eliminate the parts inconsistent with Xray we did before? comes out negative How does this change the variable elimination Suppose we observe that the person smokes and that the Xray Example: Asia network 20 appear in an intermediate factor where and * are connected by an edge if they both is an undirected graph over Given a Bayes net structure and an elimination ordering structure We will try to understand the size of the factors in terms of the graph Induced graph 19 So to be efficient, it is important to have small factors We need additions the maximum arity of a variable The size of a factor variables is, where is factor We need at most multiplications to create one entry in a Complexity of variable elimination

A complete subgraph of Example: Asia network For our previous example, let us construct the induced graph: Xray Note that the moralized graph is always a subgraph of the induced graph 21 Cliques Xray is a subset of vertices such that each vertex is connected to every other vertex A clique is a maximal complete subgraph (one to which no vertices can be added 22 24 clique is the largest family in the graph The conclusion follows induced graph is exactly the moral graph of the tree So the largest Proof: We can order the nodes from the leaves inward The network (which includes all the CPTs for any can be solved in time linear in the size of the For the class of polytree networks, the problem of computing Consequence: Polytree inference 23 Therefore, complexity is exponential in the size of the largest clique Theorem: 1 Every clique in the induced graph corresponds to an intermediate factor in the computation 2 Every factor generated during variable elimination is a subset of some maximal clique See Koller and Friedman notes for the proof details Complexity of variable elimination

Heuristics for node ordering Maximum cardinality: Number the nodes from 1 to, always assigning the next number to the vertex having the largest set of previously numbered neighbors Then eliminate nodes from to 1 Minimum discrepancy: Always eliminate the node that causes the fewest edges to be added to the induced graph Minimum size/weight: Eliminate the node that causes the smallest clique to be created (either in terms of number of nodes, or in terms of number of entries 25 Summary General exact inference in Bayesian networks is NPhard Variable elimination is a general algorithm for exact inference By analyzing variable elimination we can see the easy cases for inference: when the net is a polytree when the maximum clique of the induced graph is small Heuristics for ordering work pretty well in practice 26