Probabilistic Graphical Models

Similar documents
Probabilistic Graphical Models

Probabilistic Graphical Models

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Markov Random Fields

Probabilistic Graphical Models

Bayesian Networks Inference

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference

Lecture 3: Conditional Independence - Undirected

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS 343: Artificial Intelligence

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

CS 188: Artificial Intelligence

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

10 Sum-product on factor tree graphs, MAP elimination

Decomposition of log-linear models

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

EE512 Graphical Models Fall 2009

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

Search Algorithms for Solving Queries on Graphical Models & the Importance of Pseudo-trees in their Complexity.

Junction tree propagation - BNDG 4-4.6

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Junction Trees and Chordal Graphs

STA 4273H: Statistical Machine Learning

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

P and NP (Millenium problem)

Warm-up as you walk in

Minimal Dominating Sets in Graphs: Enumeration, Combinatorial Bounds and Graph Classes

Learning Bounded Treewidth Bayesian Networks

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Topological parameters for time-space tradeoff

4 Factor graphs and Comparing Graphical Model Types

Loopy Belief Propagation

3 : Representation of Undirected GMs

Lecture 9: Undirected Graphical Models Machine Learning

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Collective classification in network data

Coloring 3-Colorable Graphs

NP-Completeness. Algorithms

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Junction Trees and Chordal Graphs

A General Scheme for Automatic Generation of Search Heuristics from Specification Dependencies. Kalev Kask and Rina Dechter

arxiv: v1 [cs.dm] 21 Dec 2015

Introduction to Algorithms

Lecture 11: Maximum flow and minimum cut

COMP90051 Statistical Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Learning decomposable models with a bounded clique size

Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs

Mean Field and Variational Methods finishing off

CSE 417 Branch & Bound (pt 4) Branch & Bound

ALGORITHMS FOR DECISION SUPPORT

EE512 Graphical Models Fall 2009

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Class2: Constraint Networks Rina Dechter

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Algorithm classification

SSA-Form Register Allocation

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Multiagent Graph Coloring: Pareto Efficiency, Fairness and Individual Rationality By Yaad Blum, Jeffrey S. Rosenchein

On Structural Parameterizations of the Matching Cut Problem

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees

Node Aggregation for Distributed Inference in Bayesian Networks

Machine Learning!!!!! Srihari. Chain Graph Models. Sargur Srihari

Treewidth and graph minors

Treewidth. Kai Wang, Zheng Lu, and John Hicks. April 1 st, 2015

Introduction III. Graphs. Motivations I. Introduction IV

Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Machine Learning. Sourangshu Bhattacharya

Information Processing Letters

Interaction Between Input and Output-Sensitive

Reachability in K 3,3 -free and K 5 -free Graphs is in Unambiguous Logspace

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Introduction to Graph Theory

CSE 417 Network Flows (pt 4) Min Cost Flows

Graphical models and message-passing algorithms: Some introductory lectures

Honour Thy Neighbour Clique Maintenance in Dynamic Graphs

LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor

Lecture 4: Undirected Graphical Models

Visual Recognition: Examples of Graphical Models

Distributed Relationship Schemes for Trees

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 24: Online Algorithms

Bayesian Networks Inference (continued) Learning

Semi-Independent Partitioning: A Method for Bounding the Solution to COP s

CS 343: Artificial Intelligence

Graph algorithms based on infinite automata: logical descriptions and usable constructions

Expectation Propagation

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm

Transcription:

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 22, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 1 / 22

If the graph is non-chordal, then we can use heuristics (i.e., width, weight) Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 2 / 22 This week we saw... Variable elimination algorithm can be used to compute P(Y), P(Y, e) and P(Y e). Finding the optimal ordering for VE is NP-hard. For chordal graphs, we can construct an optimal ordering via de clique tree or the max-cardinality algorithm.

Conditioning Algorithm It uses the fact that observing certain variables can simplify the elimination process. If the variable was not observed, we can Use a case analysis to enumerate all the possibilities. Perform the VE on the simplified graphs. Aggregate the results for the different values. This offers no advantage with respect to the VE algorithm in terms of operations. But it does in terms of memory. Time-Space tradeoff. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 3 / 22

More formally... Let s consider the case of a Markov network. Let Φ be the set of factors over X and P φ the associated distribution. All the observations already assimilated in Φ. The goal is to compute P Φ (Y) for some query Y. Let U X be a set of variables, then ˆP Φ (Y) = ˆP Φ (Y, u) y Val(U) Each term ˆP Φ (Y, u) can be computed by marginalizing out X U Y in the unnormalized measure ˆP Φ [u]. The reduce measure is obtained by reducing the factors to the context u. The reduced process has smaller cost in general. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 4 / 22

Conditioning Algorithm We construct a network H Φ [u] for each assignment u. Note that these networks have the same structure, but different parameters. Run sum-product VE for each of the networks. Sum the results to obtain ˆP Φ (Y). Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 5 / 22

Sum-product VE for conditional distributions Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 6 / 22

Conditioning Algorithm We construct a network H Φ [u] for each assignment u. Note that these networks have the same structure, but different parameters. Run sum-product VE for each of the networks. Sum the results to obtain ˆP Φ (Y). Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 7 / 22

Normalization and an Example To get P Φ (Y) we have to normalize. The partition function is the sum of partition functions Z Φ = u Z Φ[u] We want to obtain P(J) with evidence G = g 1. Apply the conditioning algorithm to S. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 8 / 22

Complexity of conditioning It seems more efficient than VE, i.e., smaller network. However, we need to compute multiple VE, one for each u Val(U). We know that P(J) = C,D,I,S,G,L,H P(C, D, I, S, G, L, H, J). Reorder P(J) = [ ] g C,D,I,S,L,H P(C, D, I, S, g, L, H, J). The inside of the parenthesis is the P(J) in the network H ΦG=g. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 9 / 22

Conditioning vs Variable elimination Conditioning algorithm executes parts of the summation, enumerating all possible values of the conditioning variables. The VE algorithm does the same summation from the inside out using dynamic programming to reuse computation. Theorem: Let Φ be a set of factors, Y a query, and U a set of conditioning variables with Z = X Y U. Let be an elimination ordering over Z used by the VE over the network H Φu in the conditioning algorithm. Let + be an ordering consistent with over the variables Z, and where for each U U we have Z + U. Then the number of operations performed by the conditioning is no less than the number of operations performed by VE with ordering +. Conditioning never performs less operations than VE, as it uses sums and products but it does not reuse the computation for the conditioning variables. If and + are not consistent, we cannot say anything. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 10 / 22

Example Elimination ordering is {C, D, I, H, S, L} and conditioning on G Run of P(J) with G eliminated last. We have defined φ + the augmented factor that contains G in the scope. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 11 / 22

More examples If we conditioned on A k in order to cut the loop, we will perform he entire elimination of the chain A 1 A k 1 one time for each value of A k. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 12 / 22

More examples We want to cut every other A i, e.g., A 2, A 4,. The cost of conditioning is exponential in k. The induced width (i.e., number of nodes in the largest clique -1) of the network is 2. The cost of VE is linear in k. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 13 / 22

Conditioning is still useful When is conditioning still useful? 1 VE can have very large factors that are too memory consuming in large networks. Conditioning gives a continuous time-space tradeoff. 2 Conditioning is the basis for useful approximate inference algorithms where for example we only enumerate a small set of possible u Val(U). Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 14 / 22

Effects in the graph Conditioning on U introduces U on every factor in the graph. We should connect U to every node in the graph. When eliminating U we remove it from the graph. We can defined an induced graph for the conditioning algorithm, which has two type of filled edges. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 15 / 22

Induced Graph for Conditioning Def: Let Φ be a set of factors over X = {X 1,, X n } with U X conditional variables, and an elimination ordering over the set X X U. The induced graph I Φ,,U is an undirected graph over X with edges. 1 A conditioning edge between every U U and every other variable. 2 A factor edge between every pair of variables X i, X j X that both appear in some intermediate factor ψ generated by the VE using as elimination ordering. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 16 / 22

Example (BN) (Induced Graph) Query: P(J), we conditioned on L, and eliminate {C, D, I, H, G, S}. Conditioning edges shown in dashed and factor edges as regular edges. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 17 / 22

Complexity Analysis I Theorem: Consider conditioning algorithm to a set of factors Φ, with conditioning variables U X, and elimination ordering. Then the running time of the algorithm is O(nv m ), where v is a bound on the domain size of any variable and m is the size of the largest clique in the induced graph. Conditioning adds edges between the conditioning variables and all other variables in the graph. VE only does between those that are neighbors of the variable to eliminate. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 18 / 22

Complexity Analysis II Theorem: Consider conditioning algorithm to a set of factors Φ, with conditioning variables U X, and elimination ordering to eliminate X X U. The space complexity of the algorithm is O(nv m f ), where v is a bound on the domain size of any variable and m is the size of the largest clique in the graph using only factor edges. For VE the space and time complexity are exponential in the size of the largest clique of the induced graph. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 19 / 22

Improving conditioning: alternating VE and cond. In terms of operations, conditioning is always equal or worst than VE. The main problem is that computations are repeated for all values, and sometimes we do the computations multiple times, even if they are the same. We are interested in P(D). Better to eliminate first A 1,, A k 1 before conditioning on A k. Then only a network involving A k, B, C, D. We can then condition on any of these variables, e.g., A k, and eliminate the others B, C. We first did elimination, then condition, then elimination. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 20 / 22

Improving conditioning: network decomposition Efficient when conditioning splits the graph into independent pieces. If we conditioned on A 2 this splits the network. If we conditioned on A 3, there is no need to take into account the top part of the network, i.e., {A 1, B 1, C 1 }, as we will repeat the same computation Since we partitioned the network into individual pieces, we can now perform the computation on each of them separately, and then combine the results. The conditioning variables used in one part will not be used in the other. Build an algorithm that checks wether after conditioning the graph is disconnected or not. If it has, then splits computation into the disjoint sets recursively. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 21 / 22

Summary of this week Variable elimination algorithm can be used to compute P(Y), P(Y, e) and P(Y e). Finding the optimal ordering for VE is NP-hard. For chordal graphs, we can construct an optimal ordering via de clique tree or the max-cardinality algorithm. If the graph is non-chordal, then we can use heuristics (i.e., width, weight) in a deterministic or stochastic fashion. Today we saw the conditioning algorithm. Improving conditioning via alternating VE and conditioning and via network decomposition. Next week we will see clique trees and message passing. Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 22 / 22