INFERENCE IN BAYESIAN NETWORKS

Size: px
Start display at page:

Download "INFERENCE IN BAYESIAN NETWORKS"

Transcription

1 INFERENCE IN BAYESIAN NETWORKS Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Master Universitario en Inteligencia Artificial

2 C.Bielza, P.Larrañaga -UPM- 2 Conceptos básicos Inference in Bayesian networks Types of queries : Brute-force computation Variable elimination algorithm Message passing algorithm Approximate inference: Logic sampling Likelihood weighting Markov chain Monte Carlo (MCMC)

3 C.Bielza, P.Larrañaga -UPM- 3 Types of queries Queries: posterior probabilities Given some evidence e (observations), Posterior probability of a target variable(s) X : Other names: probability propagation, belief updating or revision Burgl. Earth.? Vector Alarm News WCalls

4 C.Bielza, P.Larrañaga -UPM- 4 Types of queries Semantically, for any kind of reasoning Predictive reasoning or deductive (causal inference): predict effects Burgl. Earth. Symptoms Disease? WCalls Alarm News Diagnostic reasoning (diagnostic inference): diagnose the causes? Burgl. Earth. Target variable is usually a descendant of the evidence Disease Symptoms WCalls Alarm News Target variable is usually an ancestor of the evidence

5 C.Bielza, P.Larrañaga -UPM- 5 Types of queries for any kind of reasoning Intercausal reasoning: between causes of a common effect Burgl. Earth.? WCalls Alarm B and E are independent of each other Suppose that A=Yes It raises the Prob. for both possible causes B and E Suppose then that B=Yes This explains the observed A, which in turn lowers the Prob. that E=Yes News Two causes initially independent. If the effect is known, the presence of one explanatory cause renders the alternative cause less likely (it is explained away)

6 C.Bielza, P.Larrañaga -UPM- 6 Types of queries for any kind of reasoning Bidirectional reasoning (mixed inference): combine 2 or more of the above Burgl.? Earth. Diagnostic and predictive reasoning Alarm News WCalls Diagnostic and intercausal reasoning The arc direction between variables does not restrict the type of query to be asked: probabilistic inference can combine evidence from all parts of the network

7 C.Bielza, P.Larrañaga -UPM- 7 Types of queries More queries: joint and likelihood Posterior joint: conditional prob. of several variables The size of the answer to query is exponential in the number of variables in the joint Likelihood of the evidence: the simplest query, i.e. the prob. of the evidence

8 C.Bielza, P.Larrañaga -UPM- 8 Types of queries More queries: maximum a posteriori (MAP) Most likely configurations (abductive inference): event that best explains the evidence Total abduction: search for all the unobserved Burgl.? Earth.? In general, cannot be computed component-wise, with max P(x i e) Partial abduction: search for Alarm WCalls subset. of unobserved (explanation set)?? Burgl. Earth. News?? Alarm News WCalls K most likely explanations

9 C.Bielza, P.Larrañaga -UPM- 9 Types of queries More queries: maximum a posteriori (MAP) MAP is equivalent to Use MAP for: Classification: find most likely label, given the evidence Explanation: what is the most likely scenario, given the evidence

10 C.Bielza, P.Larrañaga -UPM- 10 Types of queries More queries: decision-making Optimal decisions (of maximum expected utility), with influence diagrams

11 [Pearl 88; Lauritzen & Spiegelhalter 88] Brute-force computation of P(X e) First, consider P(X i ), without observed evidence e. Conceptually simple but computationally complex For a BN with n variables, each with its P(X j Pa(X j )): Brute-force approach But this amounts to computing the JPD, often very inefficient and even intractable computationally CHALLENGE: Without computing the JDP, exploit the factorization encoded by the BN and the distributive law (local computations) C.Bielza, P.Larrañaga -UPM- 11

12 C.Bielza, P.Larrañaga -UPM- 12 Easy inference cases: simple forward inference Computing prior requires simple forward propagation of probabilities P(J)= M,E P(J M,E)P(M,E) (marginalization)? = M,E P(J M)P(M E)P(E) (chain rule and cond. indep.) = M P(J M) E P(M E)P(E) (distributive law) All terms used are CPTs in the BN; Only ancestors of J are considered

13 C.Bielza, P.Larrañaga -UPM- 13 Easy inference cases: simple forward inference Same idea applies when we have upstream evidence? P(J E) = M P(J M,E) P(M E) = M P(J M) P(M E)

14 C.Bielza, P.Larrañaga -UPM- 14 Improving brute-force Use the JPD factorization and the distributive law? Table with 32 inputs (JPD) (if binary variables)

15 C.Bielza, P.Larrañaga -UPM- 15 Improving brute-force Arrange computations effectively, moving some additions over X 5 and X 3 : over X 4 : Biggest table with 8 (like the BN)

16 C.Bielza, P.Larrañaga -UPM- 16 Improving brute-force Iteratively Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product

17 C.Bielza, P.Larrañaga -UPM- 17 Improving brute-force I.e., comparing both: 1 Brute-force approach Table with 32 entries. 52 multiplications (tables in a suitable way) and 30 additions (marginalizations: 16, 8, 4, 2) 2 Factoriz. & distributive 1 table with 8 and 3 with 4 entries. 14 multiplications and 14 additions (marginalizations)

18 C.Bielza, P.Larrañaga -UPM- 18 Complexity of exact inference in BNs In BN without loops (cycles in the underlying undirected graph) polytrees-, inference is easy: you can arrange the additions so as not to create tables bigger than those included in the BN Complexity of the previous method is exponential in the width (N. of variables) of the biggest CPT used in the process Otherwise, in general BNs, inference is NP-complete [Cooper 1990] Does not mean we cannot solve inference; implies that we cannot find a general procedure that works efficiently for all networks The key for efficient inference lies in finding a good summation order (elimination/deletion order )

19 C.Bielza, P.Larrañaga -UPM- 19 Recall Cycle Loop

20 C.Bielza, P.Larrañaga -UPM- 20 Recall Polytree=DAG without loops There is only one path between any pair of nodes= =singly connected graph Tree=each node with one parent, except the root node

21 C.Bielza, P.Larrañaga -UPM- 21 Recall: types of directed graphs

22 C.Bielza, P.Larrañaga -UPM- 22 Variable elimination algorithm Wanted: A list with all functions of the problem Select an elimination order of all variables (except i) For each X k from, if F is the set of functions that involve X k : Delete F from the list Compute ONE variable Eliminate X k = combine all the functions that contain this variable and marginalize out X k Add f to the list Output: combination (multiplication) of all functions in the current list

23 C.Bielza, P.Larrañaga -UPM- 23 Variable elimination algorithm Repeat the algorithm for each target variable

24 C.Bielza, P.Larrañaga -UPM- 24 Example with Asia network Visit to Asia (A) Smoking (S) Tuberculosis (T) Lung Cancer (L) Tub. or Lung Canc (E) Bronchitis (B) X-Ray (X) Dyspnea (D)

25 C.Bielza, P.Larrañaga -UPM- 25 Brute-force approach Compute P(D) by brute-force: P( d) x b e l t s a P( a, s, t, l, e, b, x, d) Complexity is exponential in the size of the graph (number of variables *number of states for each variable)

26 C.Bielza, P.Larrañaga -UPM- 26 not necessarily a probability term

27 C.Bielza, P.Larrañaga -UPM- 27 4

28 Variable elimination algorithm Size = 8 Local computations (due to moving the additions) Importance of the elimination ordering, but finding an optimal (minimum cost) is NP-hard [Arnborg et al. 87] (heuristics for good sequences) Discard parts of the net (irrelevant for the query): we can prune all variables such that given e, they re c.i. of the target variable (Bayes-Ball algorithm, Shachter 98) Complexity is exponential in the max N. of var. in factors of the summation C.Bielza, P.Larrañaga -UPM- 28

29 C.Bielza, P.Larrañaga -UPM- 29 Now, with evidence e We have observed For each E i identify the f functions in which it is included and:

30 C.Bielza, P.Larrañaga -UPM- 30 VE algorithm: dealing with evidence Queries Brute-force VE Message Approx V S L T A B X D Suppose get evidence e (instantiation to an observed value): V = t, S = f, D = t We want to compute P(L, V = t, S = f, D = t) ), ( ) ( ), ( ) ( ) ( ) ( ) ( ) ( B A D P A X P L T A P S B P S L P V P T S P P V A function of T, L, B, A, X only?

31 C.Bielza, P.Larrañaga -UPM- 31 VE algorithm: dealing with evidence Since we know that V = t, we don t need to eliminate V Instead, we can replace the factors P(V) and P(T V) with f P V t ) f ( T ) P ( T V t ) P ( V ) ( p( T V ) These select the appropriate parts of the original factors given the evidence Note that f p(v) is a constant, and thus does not appear in elimination of other variables

32 C.Bielza, P.Larrañaga -UPM- 32 VE algorithm: dealing with evidence Queries Brute-force VE Message Approx Initial factors, after setting evidence: ), ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( ) ( B A f A X P L T A P B f L f T f f f b a d P b s P s l P v t P s P v P Eliminating X, we get: ), ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( ) ( B A f A f L T A P B f L f T f f f b a d P x b s P s l P v t P s P v P

33 C.Bielza, P.Larrañaga -UPM- 33 VE algorithm: dealing with evidence Queries Brute-force VE Message Approx Eliminating A, we get: Eliminating T, we get: ), ( ) ( ), ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( B A f A f L A f B f L f f f b a d P x t b s P s l P s P v P ), ( ) ( ) ( ) ( ) ( ) ( ) ( L B f B f L f f f a b s P s l P s P v P ) ( ) ( ) ( ) ( ) ( L f L f f f b s l P s P v P Eliminating B, we get:

34 C.Bielza, P.Larrañaga -UPM- 34 Message passing algorithm Operates passing messages among the nodes of the network. Nodes act as processors that receive, calculate and send information. Called propagation algorithms Clique tree propagation, based on the same principle as VE but with a sophisticated caching strategy that: Enables to compute the posterior prob. distr. of all variables in twice the time it takes to compute that of one single variable Works in an intuitive appealing fashion, namely message propagation

35 C.Bielza, P.Larrañaga -UPM- 35 Basic operations for a node Ask info(i,j): Target node i asks info to node j. Does it for all neighbors j. They do the same until there are no nodes to ask Send-message(i,j): Each node sends a message to the node that asked him the info until reaching the target node A message is defined over the intersection of domains of f i and f j. It is computed as: And finally, we calculate locally at each node i: Target combines all received info with his info and marginalize over the target variable

36 C.Bielza, P.Larrañaga -UPM- 36 CollectEvidence Procedure for X 2 Ask

37 C.Bielza, P.Larrañaga -UPM- 37 P(X 2 ) as a message passing algorithm?

38 C.Bielza, P.Larrañaga -UPM- 38 Correspondence VE & message passing algorithm Direct correspondence:? Mess. VE

39 C.Bielza, P.Larrañaga -UPM- 39 Computing prob. P(X i e) of all (unobserved) variables i at a time We can perform the previous process for each node: but many messages are repeated! Or, we can use 2 rounds of messages as follows: Select a node as a root (or pivot) Ask or collect evidence from the leaves toward the root (messages in downward direction). As VE. Distribute evidence from the root toward the leaves (messages in upward direction) Calculate marginal distributions at each node by local computation, i.e. using its incoming messages This algorithm never constructs tables larger than those in the BN

40 C.Bielza, P.Larrañaga -UPM- 40 Message passing algorithm First sweep: CollectEvidence X 4 Root node Second sweep: DistributeEvidence

41 C.Bielza, P.Larrañaga -UPM- 41 Networks with loops If net is not a polytree, it does not work Request/messages go in a cycle indefinitely (info goes through 2 paths and is counted twice) Independence assumptions applied in the algorithm cannot be used here (now any node separates the graph into 2 unconnected parts (polytrees) does not hold) Alternatives??

42 C.Bielza, P.Larrañaga -UPM- 42 Alternative 1: conditioning method Cut the multiple paths between nodes by instantiating some variables included in the loops we ll have a polytree and its algorithms may be applied Alternative 2: clustering methods Group variables in an auxiliary, simpler representation, and structure the clusters having finally a polytree over this secondary structure Clique tree or junction tree (usually)

43 C.Bielza, P.Larrañaga -UPM- 43 Complexity Complexity of propagation algorithms in polytrees is linear in the size (nodes+arcs) of the network [brute-force is exponential] in multiply-connected BNs is an NP-complete problem (both alternatives have this complexity, and none of them is the best; they re complementary mixed algorithms)

44 C.Bielza, P.Larrañaga -UPM- 44 Alternative 1: conditioning method Without loops, any node can separate the graph into 2 unconnected parts 1. Parents and nodes to which it is connected passing through its parents C D C D D Both sets of nodes are c.i. given D. This idea is used in the message passing algorithm 2. Children and nodes to which it is connected passing through its children

45 C.Bielza, P.Larrañaga -UPM- 45 Alternative 1: conditioning method With loops, we can t. But we can cut the loops 1. Fix the (arbitrary) state of some nodes called cutset-: {C} C=c C D It becomes a polytree F 2. Absorb evidence topology changes: arc from C removed and P(F C=c, D) at F 3. Many algorithms Computed as a polytree, for each c Minimize cutset size (heuristics; it s NP-complete)

46 C.Bielza, P.Larrañaga -UPM- 46 Alternative 1: conditioning method Other example of cutset: A B C We can cut the loop by considering {A} the cutset: Option 1: P(B)=P(B A=a) B A A=a A=a C B C A Option 2: P(C)=P(C A=a) Only one arc is absorbed (otherwise the graph becomes unconnected)

47 C.Bielza, P.Larrañaga -UPM- 47 Alternative 2: clustering methods [Lauritzen & Spiegelhalter 88] Method implemented in the main BN software packages Transform the BN into a probabilistically equivalent polytree by merging nodes, removing the multiple paths between two nodes S M C B H Create a new node Z, that combines S and B Metastatic cancer (M) is a possible cause of brain tumors (B) and an explanation for increased total serum calcium (S). In turn, either of these could explain a patient falling into a coma (C). Severe headache (H) is also associated with brain tumors. M Z=S,B C States of Z: {tt,ft,tf,ff} P(Z M)=P(S M)P(B M) since they are c.i. given M H P(H Z)=P(H B) since H c.i. of S given B

48 COMPILATION C.Bielza, P.Larrañaga -UPM- 48 Alternative 2: clustering methods Steps for the JUNCTION TREE CLUSTERING ALGORITHM: Transform BN into a polytree (slow, much memory if dense, but only once) Belief updating (fast) 1. Moralize the BN 2. Triangulate the moral graph and obtain the cliques 3. Create the junction tree and its separators 4. Compute new parameters 5. Message passing algorithm

49 C.Bielza, P.Larrañaga -UPM- 49 Alternative 2: clustering methods 1. MORALIZE the BN: Connect ( marry ) all parents with a common child, remove arrows obtain the moral graph M M S B S B C H C H Keep the dependencies that are lost when transforming the DAG into undirected (independence in the moral graph implies independence in the BN)

50 C.Bielza, P.Larrañaga -UPM- 50 Alternative 2: clustering methods 2. TRIANGULATE the moral graph (needed for having a junction tree): Add edges so that every cycle of length >3 contains a chord edge between 2 nonconsecutive nodes (i.e. there is a subcycle composed of exactly 3 of its nodes) produces a triangulated or chordal graph S M B We don t create functions defined over non-joined groups Not necessary here (it is already triangulated) C H Different triangulations produce different clusters (and different CPT size at the compound node). Finding an optimal triangulation is NP-complete heuristics Preserve the original topology as much as possible add few edges

51 C.Bielza, P.Larrañaga -UPM- 51 Alternative 2: clustering methods 2. TRIANGULATE the moral graph: Added edges are called fill-ins, obtained by the fill-in process guided by a deletion sequence, where before deleting a node X and all its edges, we add new edges to make complete the subgraph given by X and its neighbors 1 ordering ={1,2,3,4,5,6}: Moral graph 6 6 Triangulation-via-elimination 6 Triangulated graph

52 C.Bielza, P.Larrañaga -UPM- 52 Alternative 2: clustering methods Triangulate a graph does not mean to divide it into triangles Triangulated Don t do this!

53 C.Bielza, P.Larrañaga -UPM- 53 Alternative 2: clustering methods Not triangulated Triangulated

54 C.Bielza, P.Larrañaga -UPM- 54 Alternative 2: clustering methods 2. Triangulate the moral graph and obtain the cliques: Clique = maximal complete subgraph (all nodes are pairwise linked and it is not a subset of other complete set) Identify them during the fill-in process (complete subgraphs that are maximal) S M C B H {M,S,B} {S,B,C} {B,H} {1,2,3} {2,3,4} {3,4,5} {4,5,6}

55 C.Bielza, P.Larrañaga -UPM- 55 Alternative 2: clustering methods 3. Create the JUNCTION TREE and its separators: JT is an undirected tree that contains all the cliques as nodes JT must satisfy the following property: Given two nodes X and Y, X Y must be contained in all nodes on the path between X and Y Separator: intersections of the adjacent nodes Hypergraph not a JT (B) a JT

56 C.Bielza, P.Larrañaga -UPM- 56 Alternative 2: clustering methods In the examples: 1,2,3 M,S,B 2,3 2,3,4 S,B B 3,4 S,B,C B,H 3,4,5 4,5 4,5,6 Order the cliques and try to link them s.t. create bigger separators

57 C.Bielza, P.Larrañaga -UPM- 57 Alternative 2: clustering methods 4. Compute NEW PARAMETERS (new CPTs): Each potential is attached to a node (clique) containing its domain If a node is not attached with any function, attach the identity function to it. Whenever there are more than one potential attached, the potential at the node is the product of all of them Result: the product of all the node CPTs on the junction tree is the product of all the CPTs in the original BN (same info, the JPD, and different representation)

58 C.Bielza, P.Larrañaga -UPM- 58 Alternative 2: clustering methods In the examples: C 1 M,S,B S,B B C 2 C 3 S,B,C B,H 1,2,3 2,3 2,3,4 3,4 3,4,5 4,5 4,5,6 C 1 C 2 C 3 C 4

59 C.Bielza, P.Larrañaga -UPM- 59 Alternative 2: clustering methods Another example:

60 C.Bielza, P.Larrañaga -UPM- 60 Alternative 2: clustering methods 5. MESSAGE passing algorithm over the JT Applying the propagation algorithm over the JT, we have the Shenoy-Shafer architecture Store 2 messages at each separator (one for each direction) Computing messages: S ij Residual set After a full-propagation (upward+downward) all the separators are full and: Then marginalize

61 C.Bielza, P.Larrañaga -UPM- 61 Alternative 2: clustering methods With evidence, as always: Suppose A=y, X=y:

62 C.Bielza, P.Larrañaga -UPM- 62 Alternative 2: clustering methods If there is only one query variable Q, find a clique C Q that contains Q and use it as a pivot in inference E.g.: compute P(L A=y, X=y) Possible pivots

63 C.Bielza, P.Larrañaga -UPM- 63 Alternative 2: clustering methods Message passing: from leaves to pivot (Shenoy-Shafer) 1. Collect evidence Answer: Pivot

64 C.Bielza, P.Larrañaga -UPM- 64 Alternative 2: clustering methods Message passing: from pivot to leaves (Shenoy-Shafer) 2. Distribute evidence Not f5 Not f1 Complexity is exponential in the maximum clique size

65 C.Bielza, P.Larrañaga -UPM- 65 Alternative 2: clustering methods Summary: DAG Moral Graph Triangulated Graph Identifying Cliques Junction Tree Message passing

66 C.Bielza, P.Larrañaga -UPM- 66 Queries Brute-force VE Message Approx Inferencia Approximate aproximada inference Why? Because exact inference is intractable (NP-complete) with large (+40) and densely connected BNs the associated cliques for the junction tree algorithm or the intermediate factors in the VE algorithm will grow in size, generating an exponential blowup in the number of computations performed Both deterministic and stochastic simulation to find approximate answers

67 C.Bielza, P.Larrañaga -UPM- 67 Inferencia Approximate aproximada inference Deterministic algorithms to simplify the model Eliminate arcs that encode almost independent nodes (weak dependences measured using the Kullback-Leibler divergence) [Engelen 97] Eliminate nodes that are far away from the target node (localized partial evaluation algorithm) [Draper 95] Replace low Ps by zeros [Jensen and Andersen 90] Reduce cardinality of CPTs (state space abstraction) [Wellman & Liu 94] Use alternative representations of CPTs joining similar probabilities: using rules [Poole 98] or probability trees [Cano et al 03]

68 C.Bielza, P.Larrañaga -UPM- 68 Inferencia Approximate aproximada inference Stochastic simulation Uses the network to generate a large number of cases (full instantiations) from the network distribution P(X i e) is estimated using these cases by counting observed frequencies in the samples. By the Law of Large Numbers, estimate converges to the exact probability as more cases are generated Approximate propagation in BNs within an arbitrary tolerance or accuracy is an NP-complete problem In practice, if e is not too unlikely, convergence is quickly

69 C.Bielza, P.Larrañaga -UPM- 69 Inferencia Approximate aproximada inference Probabilistic logic sampling [Henrion 88] Given an ancestral ordering of the nodes (parents before children), generate from X once we have generated from its parents (i.e. from the root nodes down to the leaves) When all the nodes have been visited, we have a case, an instantiation of all the nodes in the BN Use conditional prob. given the known values of the parents A forward sampling algorithm Repeat and use the observed frequencies to estimate P(X i e)

70 C.Bielza, P.Larrañaga -UPM- 70 Inferencia Approximate aproximada inference Probabilistic logic sampling Suppose we obtain the following samples: (0,1,1,1,1,1), (0,1,0,1,1,1), (1,0,0,1,1,1), (0,0,1,1,1,0), (1,1,1,1,0,0) Then: With evidence, e.g. X 2 =1, we discard the third and fourth samples and we would repeat until having a sample of size 5 as desired (0,1,1,1,1,1), (0,1,0,1,1,1), (1,1,0,0,1,1), (1,1,1,1,1,0), (1,1,1,1,0,0)

71 C.Bielza, P.Larrañaga -UPM- 71 Inferencia Approximate aproximada inference Probabilistic logic sampling It works since there is a simulation algorithm with the following idea to simulate from (X 1,,X r ): If each factor of the factorization of the chain rule is simple to sample from, then For i=1 to r Generate x i ~X i x 1,,x i-1 Return (x 1,,x r )

72 C.Bielza, P.Larrañaga -UPM- 72 Inferencia Approximate aproximada inference Likelihood weighting [Fung & Chang 90; Shachter & Peot 90] PLS easily generalized to more than one query node When approximating P(X i e), reject all the samples not consistent with e Problem: if e is unlikely, most of the cases discarded (they don t contribute to the counts in the freq.) inefficient Example: if we observe X 2 =1 and P(X 2 =1)=0.0064, we need trials to get 64 valid samples. Thus, obtaining a significant number of samples becomes intractable Likelihood weighting to avoid so many rejections of PLS

73 C.Bielza, P.Larrañaga -UPM- 73 Inferencia Approximate aproximada inference Likelihood weighting Likelihood weighting: Don t generate from E; fix its values E=e Generate from the rest as in PLS Instead of adding 1 to the run count, the CPTs for the evidence nodes are used to determine how likely that evidence combination is: For a sample i, assign a weight w i given by the likelihood of the evidence given its parents In PLS, w i =1 for samples consistent with e and w i =0 otherwise

74 C.Bielza, P.Larrañaga -UPM- 74 Inferencia Approximate aproximada inference Likelihood weighting: example P(C=n B=y,E=y)?? e A P(A=y)=0.2 A=n C=n P(C=y A=n)=0.7 C B B=y P(B=y A=n)=0.4 E=y P(E=y C=n,B=y)=0.8 (A=n,B=y,C=n,D=y,E=y) with w 1 =0.4*0.8=0.32 (A=n,B=y,C=y,D=n,E=y) with w 2 =0.88 (A=y,B=y,C=y,D=y,E=y) with w 3 =0.80 E D P(D=y B=y)=0.7 D=y

75 C.Bielza, P.Larrañaga -UPM- 75 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC): basics Designed for cases in which sampling from a distribution ( ) is not easy, i.e. with MCMC we simulate draws from complex prob. distributions General description: Select a Markov chain on, with stationary distribution ( ) Start at a point 0 and generate 1,, n from the chain until convergence Eliminate an initial transient 1,, k and use k+1,, n as an approximate sample from ( ) Two issues: How to design a Markov chain with stationary distribution Metropolis-Hastings algorithm & its special cases (we only see Gibbs sampler) How to judge the convergence of the Markov chain a number of criteria

76 C.Bielza, P.Larrañaga -UPM- 76 Inferencia Approximate aproximada inference MCMC: Gibbs sampler

77 C.Bielza, P.Larrañaga -UPM- 77 Inferencia Approximate aproximada inference MCMC: Gibbs sampler From the conditional prob., we sample from the JPD The chain moves from i to i+1 one coordinate at a time (or one group of coordinates at a time less corr among pars) Bivariate: =( 1, 2 ) 2 1 ( ) 0 2 1

78 C.Bielza, P.Larrañaga -UPM- 78 Inferencia Approximate aproximada inference MCMC in BNs In BNs, Gibbs means, for each X i E, sampling from: Gibbs Example: patient with severe headache and not in a coma P(B=b H=h, C= c)? M Only its Markov blanket is involved Theorem [Pearl 97] S B C H

79 C.Bielza, P.Larrañaga -UPM- 79 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC) Analytically, P= Gibbs sampling: M S B C H Only visit unobserved nodes Normalizing constants ij only computed once

80 C.Bielza, P.Larrañaga -UPM- 80 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC) E.g. one cycle would be:

81 C.Bielza, P.Larrañaga -UPM- 81 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC) ~0.032 after 500 iterations, accumulate 1000 values

82 C.Bielza, P.Larrañaga -UPM- 82 Inferencia Approximate aproximada inference Assessing approximate inference algorithms Measure the quality of different approximations (compare algorithms) Kullback-Leibler divergence between a true distribution P and the estimated distrib. P of a node with states i: KL=0 if P=P For several query nodes, X and Y, and Z the evidence, we should use KL(P(X,Y Z),P (X,Y Z))

83 Software C.Bielza, P.Larrañaga -UPM- 83

84 Software C.Bielza, P.Larrañaga -UPM- 84

85 C.Bielza, P.Larrañaga -UPM- 85 Software genie.sis.pitt.edu

86 Software C.Bielza, P.Larrañaga -UPM- 86

87 C.Bielza, P.Larrañaga -UPM- 87 Software http.cs.berkeley.edu/~murphyk/

88 C.Bielza, P.Larrañaga -UPM- 88 Software leo.ugr.es/elvira

89 Examples C.Bielza, P.Larrañaga -UPM- 89

90 C.Bielza, P.Larrañaga -UPM- 90 Examples Increase if S=yes

91 Examples C.Bielza, P.Larrañaga -UPM- 91

92 C.Bielza, P.Larrañaga -UPM- 92 Examples Increase

93 C.Bielza, P.Larrañaga -UPM- 93 Examples Increase

94 Examples C.Bielza, P.Larrañaga -UPM- 94

95 Examples Increase Increase C.Bielza, P.Larrañaga -UPM- 95

96 C.Bielza, P.Larrañaga -UPM- 96 Texts and readings: general T.Verma, J.Pearl (1990) Causal networks: Semantics and expresiveness, UAI-4, S.Lauritzen, D.Spiegelhalter (1988) Local computations with probabilities on graphical structures and their applications to expert systems, J. of the Royal Stat. Soc., Series B,

97 C.Bielza, P.Larrañaga -UPM- 97 Texts and readings: deterministic algorithms for approx inference

98 C.Bielza, P.Larrañaga -UPM- 98 Texts and readings: stochastic simulation for approx inference M.Henrion (1988) Propagating uncertainty in BNs by logic sampling, UAI- 98, R.Fung, K.Chang (1990) Weighing and integrating evidence for stochastic simulation in Bayesian networks, UAI-5, R.Shachter, M.Peot (1990) Simulation approaches to general probabilistic inference on belief networks, UAI-5, D.Gamerman (1997) Markov Chain Monte Carlo. Chapman & Hall. G.Casella & E.George (1992) Explaining the Gibbs sampler, The Amer. Statistician 46, M.K.Cowles, B.P.Carlin (1996) MCMC convergence diagnostics: A comparative review, J. of the Amer. Statis. Assoc. 91, G.O.Roberts, A.F.M.Smith (1994) Simple conditions for the convergence of the Gibbs sampler and M-H algorithms, Stochastic Processes and their Applications 49,

99 C.Bielza, P.Larrañaga -UPM- 99 Possible projects/readings 1 Canonical models for the CPTs: Noisy OR modelisations S.Srinivas (1993) A generalization of the noisy OR model. UAI-93 F.J.Díez (1993) Parameter adjustment in Bayes networks. The generalized noisy-or gate. UAI in Neapolitan s book 2 Context-specific independence: X and Y are c.i. given Z in context C=c if P(X Y,Z,C=c)=P(X Z,C=c) C.Boutilier, N.Friedman, M.Goldszmidt, D.Koller (1996) Context-specific independence in Bayesian networks. UAI-96, Modeling tricks: parent divorcing, time-stamped models, expert disagreements, interventions 2.3 in Jensen s book

100 C.Bielza, P.Larrañaga -UPM- 100 Possible projects/readings 4 Abductive inference J.A.Gámez (2004) Abductive inference in Bayesian networks: A review. In Gámez, J.A., Moral, S., Salmerón, A., eds.: Advances in Bayesian Networks, Springer, Partial abduction L.M. de Campos, J. A. Gámez and S. Moral (2002): Partial abductive inference in Bayesian belief networks - An evolutionary computation approach by using problem-specific genetic operators. IEEE Trans. Evolutionary Computation 6(2): R. Marinescu, R. Dechter (2009) And/or branch-and-bound search for combinatorial optimization in graphical models. Artificial Intelligence 173,

101 Possible projects/readings 6 Approximate inference L.Hernández, S.Moral, A.Salmerón (1998) A Monte Carlo algorithm for probabilistic propagation in belief networks based on importance sampling and stratified simulation techniques, Int. J. of Approx. Reasoning 18, C.Yuan, M.Druzdzel (2005) Importance sampling algorithms for Bayesian networks: Principles and performance, Mathematical and Computer Modeling 43, S. Moral, A. Salmerón (2005) Dynamic importance sampling in BNs based on probability trees. International Journal of Approximate Reasoning 38(3), A. Cano, M. Gómez, S. Moral, C. Pérez-Ariza (2009) Recursive probability trees for Bayesian networks. Proceedings XIII CAEPIA, 1-10 (decomposition of potentials) T. Heskes, O. Zoeter (2002) Expectation propagation for approximate inference in dynamic BNs, Proc. 18th Conf. UAI-002, A. Cano, M. Gómez, S. Moral (2011) Approximate inference in Bayesian networks using binary probability trees, International Journal of Approximate Reasoning 52, C.Bielza, P.Larrañaga -UPM- 101

102 C.Bielza, P.Larrañaga -UPM- 102 Possible projects/readings 7 Hugin architecture: potentials in the cliques are changed dynamically and there s a division in the separators Lauritzen and Spiegelhalter (1988) F.Jensen, S.Lauritzen, K.Olesen (1990) Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly 4, Lazy propagation: dissolves the differences between Shenoy-Shafer and Hugin propagation A.Madsen, F.Jensen (1999) Lazy evaluation of symmetric Bayesian decision problems, UAI-99,

103 C.Bielza, P.Larrañaga -UPM- 103 Possible projects/readings 9 More on graph theory: properties of c.i., equivalence graph-lists of c.i. statements-factorization of the JPD Chaps 5 (5.3, 5.4, 5.6) and 6 of Castillo et al s book Chap5 of Jensen s book 10 Inference in hybrid networks (discrete & continuous variables) T. Heskes, O. Zoeter (2003) Generalized belief propagation for approximate inference in hybrid Bayesian networks, Proc. 9th Int. Workshop on AI and Statistics R. Rumí, A. Salmerón (2007). Approximate probability propagation with mixtures of truncated exponentials. Int. J. Approx. Reas. 45,

104 INFERENCE IN BAYESIAN NETWORKS Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Master Universitario en Inteligencia Artificial C.Bielza, P.Larrañaga -UPM-

Lecture 5: Exact inference

Lecture 5: Exact inference Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

A New Approach For Convert Multiply-Connected Trees in Bayesian networks A New Approach For Convert Multiply-Connected Trees in Bayesian networks 1 Hussein Baloochian, Alireza khantimoory, 2 Saeed Balochian 1 Islamic Azad university branch of zanjan 2 Islamic Azad university

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 8 Junction Trees CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9) 4

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil " Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Junction tree propagation - BNDG 4-4.6

Junction tree propagation - BNDG 4-4.6 Junction tree propagation - BNDG 4-4. Finn V. Jensen and Thomas D. Nielsen Junction tree propagation p. 1/2 Exact Inference Message Passing in Join Trees More sophisticated inference technique; used in

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

More information

Approximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)

Approximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued) Approximate (Monte Carlo) Inference in Bayes Nets Basic idea: Let s repeatedly sample according to the distribution represented by the Bayes Net. If in 400/1000 draws, the variable X is true, then we estimate

More information

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include

More information

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000 Belief propagation in a bucket-tree Handouts, 275B Fall-2000 Rina Dechter November 1, 2000 1 From bucket-elimination to tree-propagation The bucket-elimination algorithm, elim-bel, for belief updating

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell

More information

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2

More information

Lecture 9: Undirected Graphical Models Machine Learning

Lecture 9: Undirected Graphical Models Machine Learning Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?

More information

Topological parameters for time-space tradeoff

Topological parameters for time-space tradeoff Artificial Intelligence 125 (2001) 93 118 Topological parameters for time-space tradeoff Rina Dechter a,, Yousri El Fattah b a Information & Computer Science, University of California, Irvine, CA 92717,

More information

Node Aggregation for Distributed Inference in Bayesian Networks

Node Aggregation for Distributed Inference in Bayesian Networks Node Aggregation for Distributed Inference in Bayesian Networks Kuo-Chu Chang and Robert Fung Advanced Decision Systmes 1500 Plymouth Street Mountain View, California 94043-1230 Abstract This study describes

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Inference Exact: VE Exact+Approximate: BP Readings: Barber 5 Dhruv Batra

More information

Machine Learning Lecture 16

Machine Learning Lecture 16 ourse Outline Machine Learning Lecture 16 undamentals (2 weeks) ayes ecision Theory Probability ensity stimation Undirected raphical Models & Inference 28.06.2016 iscriminative pproaches (5 weeks) Linear

More information

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node Recall from last time Lecture 4: Wrap-up of Bayes net representation. Markov networks Markov blanket, moral graph Independence maps and perfect maps Undirected graphical models (Markov networks) A Bayes

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Independence Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

Introduction to Bayesian networks

Introduction to Bayesian networks PhD seminar series Probabilistics in Engineering : Bayesian networks and Bayesian hierarchical analysis in engineering Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Introduction to Bayesian

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference* David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

Expectation Propagation

Expectation Propagation Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,

More information

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A Inference II Daphne Koller Stanford University CS228 Handout #13 In the previous chapter, we showed how efficient inference can be done in a BN using an algorithm called Variable Elimination, that sums

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical

More information

International Journal of Approximate Reasoning

International Journal of Approximate Reasoning International Journal of Approximate Reasoning 52 (2) 49 62 Contents lists available at ScienceDirect International Journal of Approximate Reasoning journal homepage: www.elsevier.com/locate/ijar Approximate

More information

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy

More information

Tree-structured approximations by expectation propagation

Tree-structured approximations by expectation propagation Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 22, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 1 / 22 If the graph is non-chordal, then

More information

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees OSU CS 536 Probabilistic Graphical Models Loopy Belief Propagation and Clique Trees / Join Trees Slides from Kevin Murphy s Graphical Model Tutorial (with minor changes) Reading: Koller and Friedman Ch

More information

A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions

A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions Appeared in: G. F. Cooper & S. Moral (eds.), Uncertainty in Artificial Intelligence, Vol. 14, 1999, pp. 328--337, Morgan Kaufmann, San Francisco, CA. A Comparison of Lauritzen-Spiegelhalter, Hugin, and

More information

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian

More information

Efficient Probabilistic Inference Algorithms for Cooperative Multiagent Systems

Efficient Probabilistic Inference Algorithms for Cooperative Multiagent Systems University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2010 Efficient Probabilistic Inference Algorithms for Cooperative Multiagent Systems Hongxuan Jin University of Windsor

More information

Computational Intelligence

Computational Intelligence Computational Intelligence A Logical Approach Problems for Chapter 10 Here are some problems to help you understand the material in Computational Intelligence: A Logical Approach. They are designed to

More information

Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning

Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning Rina Dechter* Department of Information and Computer Science University of California, Irvine dechter@ics. uci. edu Abstract

More information

Warm-up as you walk in

Warm-up as you walk in arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft

More information

Mean Field and Variational Methods finishing off

Mean Field and Variational Methods finishing off Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708

More information

Mean Field and Variational Methods finishing off

Mean Field and Variational Methods finishing off Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708

More information

Variational Methods for Graphical Models

Variational Methods for Graphical Models Chapter 2 Variational Methods for Graphical Models 2.1 Introduction The problem of probabb1istic inference in graphical models is the problem of computing a conditional probability distribution over the

More information

6 : Factor Graphs, Message Passing and Junction Trees

6 : Factor Graphs, Message Passing and Junction Trees 10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical

More information

Dynamic J ointrees. Figure 1: Belief networks and respective jointrees.

Dynamic J ointrees. Figure 1: Belief networks and respective jointrees. 97 Dynamic J ointrees Adnan Darwiche Department of Mathematics American University of Beirut PO Box 11-236 Beirut, Lebanon darwiche@aub. edu.lb Abstract It is well known that one can ignore parts of a

More information

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake. Bayes Nets Independence With joint probability distributions we can compute many useful things, but working with joint PD's is often intractable. The naïve Bayes' approach represents one (boneheaded?)

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction

More information

A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions

A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions 328 A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions Vasilica Lepar Institute of Informatics University of Fribourg Site

More information

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at

More information

Iterative Algorithms for Graphical Models 1

Iterative Algorithms for Graphical Models 1 Iterative Algorithms for Graphical Models Robert Mateescu School of Information and Computer Science University of California, Irvine mateescu@ics.uci.edu http://www.ics.uci.edu/ mateescu June 30, 2003

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 25, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 25, 2011 1 / 17 Clique Trees Today we are going to

More information

Application of a hill-climbing algorithm to exact and approximate inference in credal networks

Application of a hill-climbing algorithm to exact and approximate inference in credal networks 4th International Symposium on Imprecise Probabilities and Their Applications, Pittsburgh, Pennsylvania, 2005 Application of a hill-climbing algorithm to exact and approximate inference in credal networks

More information

Bayesian Networks Inference

Bayesian Networks Inference Bayesian Networks Inference Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 5 th, 2007 2005-2007 Carlos Guestrin 1 General probabilistic inference Flu Allergy Query: Sinus

More information

Processing Probabilistic and Deterministic Graphical Models. Rina Dechter DRAFT

Processing Probabilistic and Deterministic Graphical Models. Rina Dechter DRAFT Processing Probabilistic and Deterministic Graphical Models Rina Dechter DRAFT May 1, 2013 c Rina Dechter 2 Contents 1 Introduction 7 1.1 Probabilistic vs Deterministic Models..................... 7 1.2

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Exact Inference: Elimination and Sum Product (and hidden Markov models) Exact Inference: Elimination and Sum Product (and hidden Markov models) David M. Blei Columbia University October 13, 2015 The first sections of these lecture notes follow the ideas in Chapters 3 and 4

More information

3 : Representation of Undirected GMs

3 : Representation of Undirected GMs 0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

More information

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER 2011 2401 Probabilistic Image Modeling With an Extended Chain Graph for Human Activity Recognition and Image Segmentation Lei Zhang, Member,

More information

Directed Graphical Models (Bayes Nets) (9/4/13)

Directed Graphical Models (Bayes Nets) (9/4/13) STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Review I" CMPSCI 383 December 6, 2011!

Review I CMPSCI 383 December 6, 2011! Review I" CMPSCI 383 December 6, 2011! 1 General Information about the Final" Closed book closed notes! Includes midterm material too! But expect more emphasis on later material! 2 What you should know!

More information

BAYESIAN NETWORKS STRUCTURE LEARNING

BAYESIAN NETWORKS STRUCTURE LEARNING BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian

More information

Graphical Models and Markov Blankets

Graphical Models and Markov Blankets Stephan Stahlschmidt Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Why Graphical Models? Illustration

More information

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012 Exam Topics Artificial Intelligence Recap & Expectation Maximization CSE 473 Dan Weld BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint

More information

Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity.

Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity. Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity. University of California, Irvine CS199: Individual Study with Rina Dechter Héctor Otero Mediero

More information

Graphical Probability Models for Inference and Decision Making

Graphical Probability Models for Inference and Decision Making Graphical Probability Models for Inference and Decision Making Unit 4: Inference in Graphical Models The Junction Tree Algorithm Instructor: Kathryn Blackmond Laskey Unit 4 (v2) - 1 - Learning Objectives

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

Dynamic Bayesian network (DBN)

Dynamic Bayesian network (DBN) Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm OU 56 Probabilistic Graphical Models Loopy Belief Propagation and lique Trees / Join Trees lides from Kevin Murphy s Graphical Model Tutorial (with minor changes) eading: Koller and Friedman h 0 Part I:

More information

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg Ch9: Exact Inference: Variable Elimination Shimi Salant Barak Sternberg Part 1 Reminder introduction (1/3) We saw two ways to represent (finite discrete) distributions via graphical data structures: Bayesian

More information

A Factor Tree Inference Algorithm for Bayesian Networks and its Applications

A Factor Tree Inference Algorithm for Bayesian Networks and its Applications A Factor Tree Infere Algorithm for Bayesian Nets and its Applications Wenhui Liao, Weihong Zhang and Qiang Ji Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute,

More information

Distributed Multi-agent Probabilistic Reasoning With Bayesian Networks

Distributed Multi-agent Probabilistic Reasoning With Bayesian Networks Distributed Multi-agent Probabilistic Reasoning With Bayesian Networks Yang Xiang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2, yxiang@cs.uregina.ca Abstract.

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

Parameter Control of Genetic Algorithms by Learning and Simulation of Bayesian Networks

Parameter Control of Genetic Algorithms by Learning and Simulation of Bayesian Networks Submitted Soft Computing Parameter Control of Genetic Algorithms by Learning and Simulation of Bayesian Networks C. Bielza,*, J.A. Fernández del Pozo, P. Larrañaga Universidad Politécnica de Madrid, Departamento

More information

Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical Models. David M. Blei Columbia University. September 17, 2014 Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,

More information

Markov Logic: Representation

Markov Logic: Representation Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into

More information