INFERENCE IN BAYESIAN NETWORKS
|
|
- Megan Butler
- 6 years ago
- Views:
Transcription
1 INFERENCE IN BAYESIAN NETWORKS Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Master Universitario en Inteligencia Artificial
2 C.Bielza, P.Larrañaga -UPM- 2 Conceptos básicos Inference in Bayesian networks Types of queries : Brute-force computation Variable elimination algorithm Message passing algorithm Approximate inference: Logic sampling Likelihood weighting Markov chain Monte Carlo (MCMC)
3 C.Bielza, P.Larrañaga -UPM- 3 Types of queries Queries: posterior probabilities Given some evidence e (observations), Posterior probability of a target variable(s) X : Other names: probability propagation, belief updating or revision Burgl. Earth.? Vector Alarm News WCalls
4 C.Bielza, P.Larrañaga -UPM- 4 Types of queries Semantically, for any kind of reasoning Predictive reasoning or deductive (causal inference): predict effects Burgl. Earth. Symptoms Disease? WCalls Alarm News Diagnostic reasoning (diagnostic inference): diagnose the causes? Burgl. Earth. Target variable is usually a descendant of the evidence Disease Symptoms WCalls Alarm News Target variable is usually an ancestor of the evidence
5 C.Bielza, P.Larrañaga -UPM- 5 Types of queries for any kind of reasoning Intercausal reasoning: between causes of a common effect Burgl. Earth.? WCalls Alarm B and E are independent of each other Suppose that A=Yes It raises the Prob. for both possible causes B and E Suppose then that B=Yes This explains the observed A, which in turn lowers the Prob. that E=Yes News Two causes initially independent. If the effect is known, the presence of one explanatory cause renders the alternative cause less likely (it is explained away)
6 C.Bielza, P.Larrañaga -UPM- 6 Types of queries for any kind of reasoning Bidirectional reasoning (mixed inference): combine 2 or more of the above Burgl.? Earth. Diagnostic and predictive reasoning Alarm News WCalls Diagnostic and intercausal reasoning The arc direction between variables does not restrict the type of query to be asked: probabilistic inference can combine evidence from all parts of the network
7 C.Bielza, P.Larrañaga -UPM- 7 Types of queries More queries: joint and likelihood Posterior joint: conditional prob. of several variables The size of the answer to query is exponential in the number of variables in the joint Likelihood of the evidence: the simplest query, i.e. the prob. of the evidence
8 C.Bielza, P.Larrañaga -UPM- 8 Types of queries More queries: maximum a posteriori (MAP) Most likely configurations (abductive inference): event that best explains the evidence Total abduction: search for all the unobserved Burgl.? Earth.? In general, cannot be computed component-wise, with max P(x i e) Partial abduction: search for Alarm WCalls subset. of unobserved (explanation set)?? Burgl. Earth. News?? Alarm News WCalls K most likely explanations
9 C.Bielza, P.Larrañaga -UPM- 9 Types of queries More queries: maximum a posteriori (MAP) MAP is equivalent to Use MAP for: Classification: find most likely label, given the evidence Explanation: what is the most likely scenario, given the evidence
10 C.Bielza, P.Larrañaga -UPM- 10 Types of queries More queries: decision-making Optimal decisions (of maximum expected utility), with influence diagrams
11 [Pearl 88; Lauritzen & Spiegelhalter 88] Brute-force computation of P(X e) First, consider P(X i ), without observed evidence e. Conceptually simple but computationally complex For a BN with n variables, each with its P(X j Pa(X j )): Brute-force approach But this amounts to computing the JPD, often very inefficient and even intractable computationally CHALLENGE: Without computing the JDP, exploit the factorization encoded by the BN and the distributive law (local computations) C.Bielza, P.Larrañaga -UPM- 11
12 C.Bielza, P.Larrañaga -UPM- 12 Easy inference cases: simple forward inference Computing prior requires simple forward propagation of probabilities P(J)= M,E P(J M,E)P(M,E) (marginalization)? = M,E P(J M)P(M E)P(E) (chain rule and cond. indep.) = M P(J M) E P(M E)P(E) (distributive law) All terms used are CPTs in the BN; Only ancestors of J are considered
13 C.Bielza, P.Larrañaga -UPM- 13 Easy inference cases: simple forward inference Same idea applies when we have upstream evidence? P(J E) = M P(J M,E) P(M E) = M P(J M) P(M E)
14 C.Bielza, P.Larrañaga -UPM- 14 Improving brute-force Use the JPD factorization and the distributive law? Table with 32 inputs (JPD) (if binary variables)
15 C.Bielza, P.Larrañaga -UPM- 15 Improving brute-force Arrange computations effectively, moving some additions over X 5 and X 3 : over X 4 : Biggest table with 8 (like the BN)
16 C.Bielza, P.Larrañaga -UPM- 16 Improving brute-force Iteratively Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product
17 C.Bielza, P.Larrañaga -UPM- 17 Improving brute-force I.e., comparing both: 1 Brute-force approach Table with 32 entries. 52 multiplications (tables in a suitable way) and 30 additions (marginalizations: 16, 8, 4, 2) 2 Factoriz. & distributive 1 table with 8 and 3 with 4 entries. 14 multiplications and 14 additions (marginalizations)
18 C.Bielza, P.Larrañaga -UPM- 18 Complexity of exact inference in BNs In BN without loops (cycles in the underlying undirected graph) polytrees-, inference is easy: you can arrange the additions so as not to create tables bigger than those included in the BN Complexity of the previous method is exponential in the width (N. of variables) of the biggest CPT used in the process Otherwise, in general BNs, inference is NP-complete [Cooper 1990] Does not mean we cannot solve inference; implies that we cannot find a general procedure that works efficiently for all networks The key for efficient inference lies in finding a good summation order (elimination/deletion order )
19 C.Bielza, P.Larrañaga -UPM- 19 Recall Cycle Loop
20 C.Bielza, P.Larrañaga -UPM- 20 Recall Polytree=DAG without loops There is only one path between any pair of nodes= =singly connected graph Tree=each node with one parent, except the root node
21 C.Bielza, P.Larrañaga -UPM- 21 Recall: types of directed graphs
22 C.Bielza, P.Larrañaga -UPM- 22 Variable elimination algorithm Wanted: A list with all functions of the problem Select an elimination order of all variables (except i) For each X k from, if F is the set of functions that involve X k : Delete F from the list Compute ONE variable Eliminate X k = combine all the functions that contain this variable and marginalize out X k Add f to the list Output: combination (multiplication) of all functions in the current list
23 C.Bielza, P.Larrañaga -UPM- 23 Variable elimination algorithm Repeat the algorithm for each target variable
24 C.Bielza, P.Larrañaga -UPM- 24 Example with Asia network Visit to Asia (A) Smoking (S) Tuberculosis (T) Lung Cancer (L) Tub. or Lung Canc (E) Bronchitis (B) X-Ray (X) Dyspnea (D)
25 C.Bielza, P.Larrañaga -UPM- 25 Brute-force approach Compute P(D) by brute-force: P( d) x b e l t s a P( a, s, t, l, e, b, x, d) Complexity is exponential in the size of the graph (number of variables *number of states for each variable)
26 C.Bielza, P.Larrañaga -UPM- 26 not necessarily a probability term
27 C.Bielza, P.Larrañaga -UPM- 27 4
28 Variable elimination algorithm Size = 8 Local computations (due to moving the additions) Importance of the elimination ordering, but finding an optimal (minimum cost) is NP-hard [Arnborg et al. 87] (heuristics for good sequences) Discard parts of the net (irrelevant for the query): we can prune all variables such that given e, they re c.i. of the target variable (Bayes-Ball algorithm, Shachter 98) Complexity is exponential in the max N. of var. in factors of the summation C.Bielza, P.Larrañaga -UPM- 28
29 C.Bielza, P.Larrañaga -UPM- 29 Now, with evidence e We have observed For each E i identify the f functions in which it is included and:
30 C.Bielza, P.Larrañaga -UPM- 30 VE algorithm: dealing with evidence Queries Brute-force VE Message Approx V S L T A B X D Suppose get evidence e (instantiation to an observed value): V = t, S = f, D = t We want to compute P(L, V = t, S = f, D = t) ), ( ) ( ), ( ) ( ) ( ) ( ) ( ) ( B A D P A X P L T A P S B P S L P V P T S P P V A function of T, L, B, A, X only?
31 C.Bielza, P.Larrañaga -UPM- 31 VE algorithm: dealing with evidence Since we know that V = t, we don t need to eliminate V Instead, we can replace the factors P(V) and P(T V) with f P V t ) f ( T ) P ( T V t ) P ( V ) ( p( T V ) These select the appropriate parts of the original factors given the evidence Note that f p(v) is a constant, and thus does not appear in elimination of other variables
32 C.Bielza, P.Larrañaga -UPM- 32 VE algorithm: dealing with evidence Queries Brute-force VE Message Approx Initial factors, after setting evidence: ), ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( ) ( B A f A X P L T A P B f L f T f f f b a d P b s P s l P v t P s P v P Eliminating X, we get: ), ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( ) ( B A f A f L T A P B f L f T f f f b a d P x b s P s l P v t P s P v P
33 C.Bielza, P.Larrañaga -UPM- 33 VE algorithm: dealing with evidence Queries Brute-force VE Message Approx Eliminating A, we get: Eliminating T, we get: ), ( ) ( ), ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( B A f A f L A f B f L f f f b a d P x t b s P s l P s P v P ), ( ) ( ) ( ) ( ) ( ) ( ) ( L B f B f L f f f a b s P s l P s P v P ) ( ) ( ) ( ) ( ) ( L f L f f f b s l P s P v P Eliminating B, we get:
34 C.Bielza, P.Larrañaga -UPM- 34 Message passing algorithm Operates passing messages among the nodes of the network. Nodes act as processors that receive, calculate and send information. Called propagation algorithms Clique tree propagation, based on the same principle as VE but with a sophisticated caching strategy that: Enables to compute the posterior prob. distr. of all variables in twice the time it takes to compute that of one single variable Works in an intuitive appealing fashion, namely message propagation
35 C.Bielza, P.Larrañaga -UPM- 35 Basic operations for a node Ask info(i,j): Target node i asks info to node j. Does it for all neighbors j. They do the same until there are no nodes to ask Send-message(i,j): Each node sends a message to the node that asked him the info until reaching the target node A message is defined over the intersection of domains of f i and f j. It is computed as: And finally, we calculate locally at each node i: Target combines all received info with his info and marginalize over the target variable
36 C.Bielza, P.Larrañaga -UPM- 36 CollectEvidence Procedure for X 2 Ask
37 C.Bielza, P.Larrañaga -UPM- 37 P(X 2 ) as a message passing algorithm?
38 C.Bielza, P.Larrañaga -UPM- 38 Correspondence VE & message passing algorithm Direct correspondence:? Mess. VE
39 C.Bielza, P.Larrañaga -UPM- 39 Computing prob. P(X i e) of all (unobserved) variables i at a time We can perform the previous process for each node: but many messages are repeated! Or, we can use 2 rounds of messages as follows: Select a node as a root (or pivot) Ask or collect evidence from the leaves toward the root (messages in downward direction). As VE. Distribute evidence from the root toward the leaves (messages in upward direction) Calculate marginal distributions at each node by local computation, i.e. using its incoming messages This algorithm never constructs tables larger than those in the BN
40 C.Bielza, P.Larrañaga -UPM- 40 Message passing algorithm First sweep: CollectEvidence X 4 Root node Second sweep: DistributeEvidence
41 C.Bielza, P.Larrañaga -UPM- 41 Networks with loops If net is not a polytree, it does not work Request/messages go in a cycle indefinitely (info goes through 2 paths and is counted twice) Independence assumptions applied in the algorithm cannot be used here (now any node separates the graph into 2 unconnected parts (polytrees) does not hold) Alternatives??
42 C.Bielza, P.Larrañaga -UPM- 42 Alternative 1: conditioning method Cut the multiple paths between nodes by instantiating some variables included in the loops we ll have a polytree and its algorithms may be applied Alternative 2: clustering methods Group variables in an auxiliary, simpler representation, and structure the clusters having finally a polytree over this secondary structure Clique tree or junction tree (usually)
43 C.Bielza, P.Larrañaga -UPM- 43 Complexity Complexity of propagation algorithms in polytrees is linear in the size (nodes+arcs) of the network [brute-force is exponential] in multiply-connected BNs is an NP-complete problem (both alternatives have this complexity, and none of them is the best; they re complementary mixed algorithms)
44 C.Bielza, P.Larrañaga -UPM- 44 Alternative 1: conditioning method Without loops, any node can separate the graph into 2 unconnected parts 1. Parents and nodes to which it is connected passing through its parents C D C D D Both sets of nodes are c.i. given D. This idea is used in the message passing algorithm 2. Children and nodes to which it is connected passing through its children
45 C.Bielza, P.Larrañaga -UPM- 45 Alternative 1: conditioning method With loops, we can t. But we can cut the loops 1. Fix the (arbitrary) state of some nodes called cutset-: {C} C=c C D It becomes a polytree F 2. Absorb evidence topology changes: arc from C removed and P(F C=c, D) at F 3. Many algorithms Computed as a polytree, for each c Minimize cutset size (heuristics; it s NP-complete)
46 C.Bielza, P.Larrañaga -UPM- 46 Alternative 1: conditioning method Other example of cutset: A B C We can cut the loop by considering {A} the cutset: Option 1: P(B)=P(B A=a) B A A=a A=a C B C A Option 2: P(C)=P(C A=a) Only one arc is absorbed (otherwise the graph becomes unconnected)
47 C.Bielza, P.Larrañaga -UPM- 47 Alternative 2: clustering methods [Lauritzen & Spiegelhalter 88] Method implemented in the main BN software packages Transform the BN into a probabilistically equivalent polytree by merging nodes, removing the multiple paths between two nodes S M C B H Create a new node Z, that combines S and B Metastatic cancer (M) is a possible cause of brain tumors (B) and an explanation for increased total serum calcium (S). In turn, either of these could explain a patient falling into a coma (C). Severe headache (H) is also associated with brain tumors. M Z=S,B C States of Z: {tt,ft,tf,ff} P(Z M)=P(S M)P(B M) since they are c.i. given M H P(H Z)=P(H B) since H c.i. of S given B
48 COMPILATION C.Bielza, P.Larrañaga -UPM- 48 Alternative 2: clustering methods Steps for the JUNCTION TREE CLUSTERING ALGORITHM: Transform BN into a polytree (slow, much memory if dense, but only once) Belief updating (fast) 1. Moralize the BN 2. Triangulate the moral graph and obtain the cliques 3. Create the junction tree and its separators 4. Compute new parameters 5. Message passing algorithm
49 C.Bielza, P.Larrañaga -UPM- 49 Alternative 2: clustering methods 1. MORALIZE the BN: Connect ( marry ) all parents with a common child, remove arrows obtain the moral graph M M S B S B C H C H Keep the dependencies that are lost when transforming the DAG into undirected (independence in the moral graph implies independence in the BN)
50 C.Bielza, P.Larrañaga -UPM- 50 Alternative 2: clustering methods 2. TRIANGULATE the moral graph (needed for having a junction tree): Add edges so that every cycle of length >3 contains a chord edge between 2 nonconsecutive nodes (i.e. there is a subcycle composed of exactly 3 of its nodes) produces a triangulated or chordal graph S M B We don t create functions defined over non-joined groups Not necessary here (it is already triangulated) C H Different triangulations produce different clusters (and different CPT size at the compound node). Finding an optimal triangulation is NP-complete heuristics Preserve the original topology as much as possible add few edges
51 C.Bielza, P.Larrañaga -UPM- 51 Alternative 2: clustering methods 2. TRIANGULATE the moral graph: Added edges are called fill-ins, obtained by the fill-in process guided by a deletion sequence, where before deleting a node X and all its edges, we add new edges to make complete the subgraph given by X and its neighbors 1 ordering ={1,2,3,4,5,6}: Moral graph 6 6 Triangulation-via-elimination 6 Triangulated graph
52 C.Bielza, P.Larrañaga -UPM- 52 Alternative 2: clustering methods Triangulate a graph does not mean to divide it into triangles Triangulated Don t do this!
53 C.Bielza, P.Larrañaga -UPM- 53 Alternative 2: clustering methods Not triangulated Triangulated
54 C.Bielza, P.Larrañaga -UPM- 54 Alternative 2: clustering methods 2. Triangulate the moral graph and obtain the cliques: Clique = maximal complete subgraph (all nodes are pairwise linked and it is not a subset of other complete set) Identify them during the fill-in process (complete subgraphs that are maximal) S M C B H {M,S,B} {S,B,C} {B,H} {1,2,3} {2,3,4} {3,4,5} {4,5,6}
55 C.Bielza, P.Larrañaga -UPM- 55 Alternative 2: clustering methods 3. Create the JUNCTION TREE and its separators: JT is an undirected tree that contains all the cliques as nodes JT must satisfy the following property: Given two nodes X and Y, X Y must be contained in all nodes on the path between X and Y Separator: intersections of the adjacent nodes Hypergraph not a JT (B) a JT
56 C.Bielza, P.Larrañaga -UPM- 56 Alternative 2: clustering methods In the examples: 1,2,3 M,S,B 2,3 2,3,4 S,B B 3,4 S,B,C B,H 3,4,5 4,5 4,5,6 Order the cliques and try to link them s.t. create bigger separators
57 C.Bielza, P.Larrañaga -UPM- 57 Alternative 2: clustering methods 4. Compute NEW PARAMETERS (new CPTs): Each potential is attached to a node (clique) containing its domain If a node is not attached with any function, attach the identity function to it. Whenever there are more than one potential attached, the potential at the node is the product of all of them Result: the product of all the node CPTs on the junction tree is the product of all the CPTs in the original BN (same info, the JPD, and different representation)
58 C.Bielza, P.Larrañaga -UPM- 58 Alternative 2: clustering methods In the examples: C 1 M,S,B S,B B C 2 C 3 S,B,C B,H 1,2,3 2,3 2,3,4 3,4 3,4,5 4,5 4,5,6 C 1 C 2 C 3 C 4
59 C.Bielza, P.Larrañaga -UPM- 59 Alternative 2: clustering methods Another example:
60 C.Bielza, P.Larrañaga -UPM- 60 Alternative 2: clustering methods 5. MESSAGE passing algorithm over the JT Applying the propagation algorithm over the JT, we have the Shenoy-Shafer architecture Store 2 messages at each separator (one for each direction) Computing messages: S ij Residual set After a full-propagation (upward+downward) all the separators are full and: Then marginalize
61 C.Bielza, P.Larrañaga -UPM- 61 Alternative 2: clustering methods With evidence, as always: Suppose A=y, X=y:
62 C.Bielza, P.Larrañaga -UPM- 62 Alternative 2: clustering methods If there is only one query variable Q, find a clique C Q that contains Q and use it as a pivot in inference E.g.: compute P(L A=y, X=y) Possible pivots
63 C.Bielza, P.Larrañaga -UPM- 63 Alternative 2: clustering methods Message passing: from leaves to pivot (Shenoy-Shafer) 1. Collect evidence Answer: Pivot
64 C.Bielza, P.Larrañaga -UPM- 64 Alternative 2: clustering methods Message passing: from pivot to leaves (Shenoy-Shafer) 2. Distribute evidence Not f5 Not f1 Complexity is exponential in the maximum clique size
65 C.Bielza, P.Larrañaga -UPM- 65 Alternative 2: clustering methods Summary: DAG Moral Graph Triangulated Graph Identifying Cliques Junction Tree Message passing
66 C.Bielza, P.Larrañaga -UPM- 66 Queries Brute-force VE Message Approx Inferencia Approximate aproximada inference Why? Because exact inference is intractable (NP-complete) with large (+40) and densely connected BNs the associated cliques for the junction tree algorithm or the intermediate factors in the VE algorithm will grow in size, generating an exponential blowup in the number of computations performed Both deterministic and stochastic simulation to find approximate answers
67 C.Bielza, P.Larrañaga -UPM- 67 Inferencia Approximate aproximada inference Deterministic algorithms to simplify the model Eliminate arcs that encode almost independent nodes (weak dependences measured using the Kullback-Leibler divergence) [Engelen 97] Eliminate nodes that are far away from the target node (localized partial evaluation algorithm) [Draper 95] Replace low Ps by zeros [Jensen and Andersen 90] Reduce cardinality of CPTs (state space abstraction) [Wellman & Liu 94] Use alternative representations of CPTs joining similar probabilities: using rules [Poole 98] or probability trees [Cano et al 03]
68 C.Bielza, P.Larrañaga -UPM- 68 Inferencia Approximate aproximada inference Stochastic simulation Uses the network to generate a large number of cases (full instantiations) from the network distribution P(X i e) is estimated using these cases by counting observed frequencies in the samples. By the Law of Large Numbers, estimate converges to the exact probability as more cases are generated Approximate propagation in BNs within an arbitrary tolerance or accuracy is an NP-complete problem In practice, if e is not too unlikely, convergence is quickly
69 C.Bielza, P.Larrañaga -UPM- 69 Inferencia Approximate aproximada inference Probabilistic logic sampling [Henrion 88] Given an ancestral ordering of the nodes (parents before children), generate from X once we have generated from its parents (i.e. from the root nodes down to the leaves) When all the nodes have been visited, we have a case, an instantiation of all the nodes in the BN Use conditional prob. given the known values of the parents A forward sampling algorithm Repeat and use the observed frequencies to estimate P(X i e)
70 C.Bielza, P.Larrañaga -UPM- 70 Inferencia Approximate aproximada inference Probabilistic logic sampling Suppose we obtain the following samples: (0,1,1,1,1,1), (0,1,0,1,1,1), (1,0,0,1,1,1), (0,0,1,1,1,0), (1,1,1,1,0,0) Then: With evidence, e.g. X 2 =1, we discard the third and fourth samples and we would repeat until having a sample of size 5 as desired (0,1,1,1,1,1), (0,1,0,1,1,1), (1,1,0,0,1,1), (1,1,1,1,1,0), (1,1,1,1,0,0)
71 C.Bielza, P.Larrañaga -UPM- 71 Inferencia Approximate aproximada inference Probabilistic logic sampling It works since there is a simulation algorithm with the following idea to simulate from (X 1,,X r ): If each factor of the factorization of the chain rule is simple to sample from, then For i=1 to r Generate x i ~X i x 1,,x i-1 Return (x 1,,x r )
72 C.Bielza, P.Larrañaga -UPM- 72 Inferencia Approximate aproximada inference Likelihood weighting [Fung & Chang 90; Shachter & Peot 90] PLS easily generalized to more than one query node When approximating P(X i e), reject all the samples not consistent with e Problem: if e is unlikely, most of the cases discarded (they don t contribute to the counts in the freq.) inefficient Example: if we observe X 2 =1 and P(X 2 =1)=0.0064, we need trials to get 64 valid samples. Thus, obtaining a significant number of samples becomes intractable Likelihood weighting to avoid so many rejections of PLS
73 C.Bielza, P.Larrañaga -UPM- 73 Inferencia Approximate aproximada inference Likelihood weighting Likelihood weighting: Don t generate from E; fix its values E=e Generate from the rest as in PLS Instead of adding 1 to the run count, the CPTs for the evidence nodes are used to determine how likely that evidence combination is: For a sample i, assign a weight w i given by the likelihood of the evidence given its parents In PLS, w i =1 for samples consistent with e and w i =0 otherwise
74 C.Bielza, P.Larrañaga -UPM- 74 Inferencia Approximate aproximada inference Likelihood weighting: example P(C=n B=y,E=y)?? e A P(A=y)=0.2 A=n C=n P(C=y A=n)=0.7 C B B=y P(B=y A=n)=0.4 E=y P(E=y C=n,B=y)=0.8 (A=n,B=y,C=n,D=y,E=y) with w 1 =0.4*0.8=0.32 (A=n,B=y,C=y,D=n,E=y) with w 2 =0.88 (A=y,B=y,C=y,D=y,E=y) with w 3 =0.80 E D P(D=y B=y)=0.7 D=y
75 C.Bielza, P.Larrañaga -UPM- 75 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC): basics Designed for cases in which sampling from a distribution ( ) is not easy, i.e. with MCMC we simulate draws from complex prob. distributions General description: Select a Markov chain on, with stationary distribution ( ) Start at a point 0 and generate 1,, n from the chain until convergence Eliminate an initial transient 1,, k and use k+1,, n as an approximate sample from ( ) Two issues: How to design a Markov chain with stationary distribution Metropolis-Hastings algorithm & its special cases (we only see Gibbs sampler) How to judge the convergence of the Markov chain a number of criteria
76 C.Bielza, P.Larrañaga -UPM- 76 Inferencia Approximate aproximada inference MCMC: Gibbs sampler
77 C.Bielza, P.Larrañaga -UPM- 77 Inferencia Approximate aproximada inference MCMC: Gibbs sampler From the conditional prob., we sample from the JPD The chain moves from i to i+1 one coordinate at a time (or one group of coordinates at a time less corr among pars) Bivariate: =( 1, 2 ) 2 1 ( ) 0 2 1
78 C.Bielza, P.Larrañaga -UPM- 78 Inferencia Approximate aproximada inference MCMC in BNs In BNs, Gibbs means, for each X i E, sampling from: Gibbs Example: patient with severe headache and not in a coma P(B=b H=h, C= c)? M Only its Markov blanket is involved Theorem [Pearl 97] S B C H
79 C.Bielza, P.Larrañaga -UPM- 79 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC) Analytically, P= Gibbs sampling: M S B C H Only visit unobserved nodes Normalizing constants ij only computed once
80 C.Bielza, P.Larrañaga -UPM- 80 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC) E.g. one cycle would be:
81 C.Bielza, P.Larrañaga -UPM- 81 Inferencia Approximate aproximada inference Markov Chain Monte Carlo (MCMC) ~0.032 after 500 iterations, accumulate 1000 values
82 C.Bielza, P.Larrañaga -UPM- 82 Inferencia Approximate aproximada inference Assessing approximate inference algorithms Measure the quality of different approximations (compare algorithms) Kullback-Leibler divergence between a true distribution P and the estimated distrib. P of a node with states i: KL=0 if P=P For several query nodes, X and Y, and Z the evidence, we should use KL(P(X,Y Z),P (X,Y Z))
83 Software C.Bielza, P.Larrañaga -UPM- 83
84 Software C.Bielza, P.Larrañaga -UPM- 84
85 C.Bielza, P.Larrañaga -UPM- 85 Software genie.sis.pitt.edu
86 Software C.Bielza, P.Larrañaga -UPM- 86
87 C.Bielza, P.Larrañaga -UPM- 87 Software http.cs.berkeley.edu/~murphyk/
88 C.Bielza, P.Larrañaga -UPM- 88 Software leo.ugr.es/elvira
89 Examples C.Bielza, P.Larrañaga -UPM- 89
90 C.Bielza, P.Larrañaga -UPM- 90 Examples Increase if S=yes
91 Examples C.Bielza, P.Larrañaga -UPM- 91
92 C.Bielza, P.Larrañaga -UPM- 92 Examples Increase
93 C.Bielza, P.Larrañaga -UPM- 93 Examples Increase
94 Examples C.Bielza, P.Larrañaga -UPM- 94
95 Examples Increase Increase C.Bielza, P.Larrañaga -UPM- 95
96 C.Bielza, P.Larrañaga -UPM- 96 Texts and readings: general T.Verma, J.Pearl (1990) Causal networks: Semantics and expresiveness, UAI-4, S.Lauritzen, D.Spiegelhalter (1988) Local computations with probabilities on graphical structures and their applications to expert systems, J. of the Royal Stat. Soc., Series B,
97 C.Bielza, P.Larrañaga -UPM- 97 Texts and readings: deterministic algorithms for approx inference
98 C.Bielza, P.Larrañaga -UPM- 98 Texts and readings: stochastic simulation for approx inference M.Henrion (1988) Propagating uncertainty in BNs by logic sampling, UAI- 98, R.Fung, K.Chang (1990) Weighing and integrating evidence for stochastic simulation in Bayesian networks, UAI-5, R.Shachter, M.Peot (1990) Simulation approaches to general probabilistic inference on belief networks, UAI-5, D.Gamerman (1997) Markov Chain Monte Carlo. Chapman & Hall. G.Casella & E.George (1992) Explaining the Gibbs sampler, The Amer. Statistician 46, M.K.Cowles, B.P.Carlin (1996) MCMC convergence diagnostics: A comparative review, J. of the Amer. Statis. Assoc. 91, G.O.Roberts, A.F.M.Smith (1994) Simple conditions for the convergence of the Gibbs sampler and M-H algorithms, Stochastic Processes and their Applications 49,
99 C.Bielza, P.Larrañaga -UPM- 99 Possible projects/readings 1 Canonical models for the CPTs: Noisy OR modelisations S.Srinivas (1993) A generalization of the noisy OR model. UAI-93 F.J.Díez (1993) Parameter adjustment in Bayes networks. The generalized noisy-or gate. UAI in Neapolitan s book 2 Context-specific independence: X and Y are c.i. given Z in context C=c if P(X Y,Z,C=c)=P(X Z,C=c) C.Boutilier, N.Friedman, M.Goldszmidt, D.Koller (1996) Context-specific independence in Bayesian networks. UAI-96, Modeling tricks: parent divorcing, time-stamped models, expert disagreements, interventions 2.3 in Jensen s book
100 C.Bielza, P.Larrañaga -UPM- 100 Possible projects/readings 4 Abductive inference J.A.Gámez (2004) Abductive inference in Bayesian networks: A review. In Gámez, J.A., Moral, S., Salmerón, A., eds.: Advances in Bayesian Networks, Springer, Partial abduction L.M. de Campos, J. A. Gámez and S. Moral (2002): Partial abductive inference in Bayesian belief networks - An evolutionary computation approach by using problem-specific genetic operators. IEEE Trans. Evolutionary Computation 6(2): R. Marinescu, R. Dechter (2009) And/or branch-and-bound search for combinatorial optimization in graphical models. Artificial Intelligence 173,
101 Possible projects/readings 6 Approximate inference L.Hernández, S.Moral, A.Salmerón (1998) A Monte Carlo algorithm for probabilistic propagation in belief networks based on importance sampling and stratified simulation techniques, Int. J. of Approx. Reasoning 18, C.Yuan, M.Druzdzel (2005) Importance sampling algorithms for Bayesian networks: Principles and performance, Mathematical and Computer Modeling 43, S. Moral, A. Salmerón (2005) Dynamic importance sampling in BNs based on probability trees. International Journal of Approximate Reasoning 38(3), A. Cano, M. Gómez, S. Moral, C. Pérez-Ariza (2009) Recursive probability trees for Bayesian networks. Proceedings XIII CAEPIA, 1-10 (decomposition of potentials) T. Heskes, O. Zoeter (2002) Expectation propagation for approximate inference in dynamic BNs, Proc. 18th Conf. UAI-002, A. Cano, M. Gómez, S. Moral (2011) Approximate inference in Bayesian networks using binary probability trees, International Journal of Approximate Reasoning 52, C.Bielza, P.Larrañaga -UPM- 101
102 C.Bielza, P.Larrañaga -UPM- 102 Possible projects/readings 7 Hugin architecture: potentials in the cliques are changed dynamically and there s a division in the separators Lauritzen and Spiegelhalter (1988) F.Jensen, S.Lauritzen, K.Olesen (1990) Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly 4, Lazy propagation: dissolves the differences between Shenoy-Shafer and Hugin propagation A.Madsen, F.Jensen (1999) Lazy evaluation of symmetric Bayesian decision problems, UAI-99,
103 C.Bielza, P.Larrañaga -UPM- 103 Possible projects/readings 9 More on graph theory: properties of c.i., equivalence graph-lists of c.i. statements-factorization of the JPD Chaps 5 (5.3, 5.4, 5.6) and 6 of Castillo et al s book Chap5 of Jensen s book 10 Inference in hybrid networks (discrete & continuous variables) T. Heskes, O. Zoeter (2003) Generalized belief propagation for approximate inference in hybrid Bayesian networks, Proc. 9th Int. Workshop on AI and Statistics R. Rumí, A. Salmerón (2007). Approximate probability propagation with mixtures of truncated exponentials. Int. J. Approx. Reas. 45,
104 INFERENCE IN BAYESIAN NETWORKS Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Master Universitario en Inteligencia Artificial C.Bielza, P.Larrañaga -UPM-
Lecture 5: Exact inference
Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other
More informationModeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA
Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................
More informationLecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying
given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an
More informationA New Approach For Convert Multiply-Connected Trees in Bayesian networks
A New Approach For Convert Multiply-Connected Trees in Bayesian networks 1 Hussein Baloochian, Alireza khantimoory, 2 Saeed Balochian 1 Islamic Azad university branch of zanjan 2 Islamic Azad university
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 8 Junction Trees CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9) 4
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationAv. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil
" Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationJunction tree propagation - BNDG 4-4.6
Junction tree propagation - BNDG 4-4. Finn V. Jensen and Thomas D. Nielsen Junction tree propagation p. 1/2 Exact Inference Message Passing in Join Trees More sophisticated inference technique; used in
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationApproximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)
Approximate (Monte Carlo) Inference in Bayes Nets Basic idea: Let s repeatedly sample according to the distribution represented by the Bayes Net. If in 400/1000 draws, the variable X is true, then we estimate
More informationEscola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil
Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationWorkshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient
Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include
More informationSTAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference
STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationFMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationBelief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000
Belief propagation in a bucket-tree Handouts, 275B Fall-2000 Rina Dechter November 1, 2000 1 From bucket-elimination to tree-propagation The bucket-elimination algorithm, elim-bel, for belief updating
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationLecture 9: Undirected Graphical Models Machine Learning
Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?
More informationTopological parameters for time-space tradeoff
Artificial Intelligence 125 (2001) 93 118 Topological parameters for time-space tradeoff Rina Dechter a,, Yousri El Fattah b a Information & Computer Science, University of California, Irvine, CA 92717,
More informationNode Aggregation for Distributed Inference in Bayesian Networks
Node Aggregation for Distributed Inference in Bayesian Networks Kuo-Chu Chang and Robert Fung Advanced Decision Systmes 1500 Plymouth Street Mountain View, California 94043-1230 Abstract This study describes
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Inference Exact: VE Exact+Approximate: BP Readings: Barber 5 Dhruv Batra
More informationMachine Learning Lecture 16
ourse Outline Machine Learning Lecture 16 undamentals (2 weeks) ayes ecision Theory Probability ensity stimation Undirected raphical Models & Inference 28.06.2016 iscriminative pproaches (5 weeks) Linear
More informationRecall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node
Recall from last time Lecture 4: Wrap-up of Bayes net representation. Markov networks Markov blanket, moral graph Independence maps and perfect maps Undirected graphical models (Markov networks) A Bayes
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Independence Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188
More informationA Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks
A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2
More informationIntroduction to Bayesian networks
PhD seminar series Probabilistics in Engineering : Bayesian networks and Bayesian hierarchical analysis in engineering Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Introduction to Bayesian
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies
More informationAbstract. 2 Background 2.1 Belief Networks. 1 Introduction
Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference* David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada
More informationProbabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference
Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada
More informationExpectation Propagation
Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,
More informationV,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A
Inference II Daphne Koller Stanford University CS228 Handout #13 In the previous chapter, we showed how efficient inference can be done in a BN using an algorithm called Variable Elimination, that sums
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationInternational Journal of Approximate Reasoning
International Journal of Approximate Reasoning 52 (2) 49 62 Contents lists available at ScienceDirect International Journal of Approximate Reasoning journal homepage: www.elsevier.com/locate/ijar Approximate
More informationCS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees
CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy
More informationTree-structured approximations by expectation propagation
Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 22, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 1 / 22 If the graph is non-chordal, then
More informationOSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees
OSU CS 536 Probabilistic Graphical Models Loopy Belief Propagation and Clique Trees / Join Trees Slides from Kevin Murphy s Graphical Model Tutorial (with minor changes) Reading: Koller and Friedman Ch
More informationA Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions
Appeared in: G. F. Cooper & S. Moral (eds.), Uncertainty in Artificial Intelligence, Vol. 14, 1999, pp. 328--337, Morgan Kaufmann, San Francisco, CA. A Comparison of Lauritzen-Spiegelhalter, Hugin, and
More informationChapter 2 PRELIMINARIES. 1. Random variables and conditional independence
Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian
More informationEfficient Probabilistic Inference Algorithms for Cooperative Multiagent Systems
University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2010 Efficient Probabilistic Inference Algorithms for Cooperative Multiagent Systems Hongxuan Jin University of Windsor
More informationComputational Intelligence
Computational Intelligence A Logical Approach Problems for Chapter 10 Here are some problems to help you understand the material in Computational Intelligence: A Logical Approach. They are designed to
More informationMini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning
Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning Rina Dechter* Department of Information and Computer Science University of California, Irvine dechter@ics. uci. edu Abstract
More informationWarm-up as you walk in
arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c
More informationProbabilistic Graphical Models
Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft
More informationMean Field and Variational Methods finishing off
Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708
More informationMean Field and Variational Methods finishing off
Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708
More informationVariational Methods for Graphical Models
Chapter 2 Variational Methods for Graphical Models 2.1 Introduction The problem of probabb1istic inference in graphical models is the problem of computing a conditional probability distribution over the
More information6 : Factor Graphs, Message Passing and Junction Trees
10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical
More informationDynamic J ointrees. Figure 1: Belief networks and respective jointrees.
97 Dynamic J ointrees Adnan Darwiche Department of Mathematics American University of Beirut PO Box 11-236 Beirut, Lebanon darwiche@aub. edu.lb Abstract It is well known that one can ignore parts of a
More informationBayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.
Bayes Nets Independence With joint probability distributions we can compute many useful things, but working with joint PD's is often intractable. The naïve Bayes' approach represents one (boneheaded?)
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction
More informationA Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions
328 A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions Vasilica Lepar Institute of Informatics University of Fribourg Site
More informationInference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at
More informationIterative Algorithms for Graphical Models 1
Iterative Algorithms for Graphical Models Robert Mateescu School of Information and Computer Science University of California, Irvine mateescu@ics.uci.edu http://www.ics.uci.edu/ mateescu June 30, 2003
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 25, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 25, 2011 1 / 17 Clique Trees Today we are going to
More informationApplication of a hill-climbing algorithm to exact and approximate inference in credal networks
4th International Symposium on Imprecise Probabilities and Their Applications, Pittsburgh, Pennsylvania, 2005 Application of a hill-climbing algorithm to exact and approximate inference in credal networks
More informationBayesian Networks Inference
Bayesian Networks Inference Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 5 th, 2007 2005-2007 Carlos Guestrin 1 General probabilistic inference Flu Allergy Query: Sinus
More informationProcessing Probabilistic and Deterministic Graphical Models. Rina Dechter DRAFT
Processing Probabilistic and Deterministic Graphical Models Rina Dechter DRAFT May 1, 2013 c Rina Dechter 2 Contents 1 Introduction 7 1.1 Probabilistic vs Deterministic Models..................... 7 1.2
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationBayesian Networks Inference (continued) Learning
Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning
More informationExact Inference: Elimination and Sum Product (and hidden Markov models)
Exact Inference: Elimination and Sum Product (and hidden Markov models) David M. Blei Columbia University October 13, 2015 The first sections of these lecture notes follow the ideas in Chapters 3 and 4
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationIEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER 2011 2401 Probabilistic Image Modeling With an Extended Chain Graph for Human Activity Recognition and Image Segmentation Lei Zhang, Member,
More informationDirected Graphical Models (Bayes Nets) (9/4/13)
STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationReview I" CMPSCI 383 December 6, 2011!
Review I" CMPSCI 383 December 6, 2011! 1 General Information about the Final" Closed book closed notes! Includes midterm material too! But expect more emphasis on later material! 2 What you should know!
More informationBAYESIAN NETWORKS STRUCTURE LEARNING
BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian
More informationGraphical Models and Markov Blankets
Stephan Stahlschmidt Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Why Graphical Models? Illustration
More informationExam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012
Exam Topics Artificial Intelligence Recap & Expectation Maximization CSE 473 Dan Weld BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint
More informationSearch Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity.
Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity. University of California, Irvine CS199: Individual Study with Rina Dechter Héctor Otero Mediero
More informationGraphical Probability Models for Inference and Decision Making
Graphical Probability Models for Inference and Decision Making Unit 4: Inference in Graphical Models The Junction Tree Algorithm Instructor: Kathryn Blackmond Laskey Unit 4 (v2) - 1 - Learning Objectives
More informationThese notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.
Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.
More informationDynamic Bayesian network (DBN)
Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More information18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos
Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris
More informationA Fast Learning Algorithm for Deep Belief Nets
A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National
More informationPart I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm
OU 56 Probabilistic Graphical Models Loopy Belief Propagation and lique Trees / Join Trees lides from Kevin Murphy s Graphical Model Tutorial (with minor changes) eading: Koller and Friedman h 0 Part I:
More informationCh9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg
Ch9: Exact Inference: Variable Elimination Shimi Salant Barak Sternberg Part 1 Reminder introduction (1/3) We saw two ways to represent (finite discrete) distributions via graphical data structures: Bayesian
More informationA Factor Tree Inference Algorithm for Bayesian Networks and its Applications
A Factor Tree Infere Algorithm for Bayesian Nets and its Applications Wenhui Liao, Weihong Zhang and Qiang Ji Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute,
More informationDistributed Multi-agent Probabilistic Reasoning With Bayesian Networks
Distributed Multi-agent Probabilistic Reasoning With Bayesian Networks Yang Xiang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2, yxiang@cs.uregina.ca Abstract.
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationParameter Control of Genetic Algorithms by Learning and Simulation of Bayesian Networks
Submitted Soft Computing Parameter Control of Genetic Algorithms by Learning and Simulation of Bayesian Networks C. Bielza,*, J.A. Fernández del Pozo, P. Larrañaga Universidad Politécnica de Madrid, Departamento
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationMarkov Logic: Representation
Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into
More information