# FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Size: px
Start display at page:

## Transcription

1 FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu

2 Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly

3 Choice of Graphical Models In a probabilistic graphical model, each node represents a random variable (or a group of variables), and the links express probabilistic relationships between these variables. The graph captures the way in which the joint distribution over all of the random variables can be decomposed into a product of factors, each depending only on a subset of the variables In Bayesian networks, also known as directed graphical models, the links of the graphs have a particular directionality indicated by arrows The other major class of graphical models are Markov random fields, also known as undirected graphical models, in which the links do not carry arrows and have no directional significance Directed graphs are useful for expressing causal relationships between random variables, whereas undirected graphs are better suited at expressing soft constraints between random variables For the purposes of solving inference problems, it is often convenient to convert both directed and undirected graphs into a different representation called a factor graph

4 Bayesian Networks Directed Acyclic Graph (DAG) parent child l.h.s. symmetric w.r.t a, b, c Holds for any choice of joint distribution r.h.s. not symmetric w.r.t a, b, c Slides adapted from textbook (Bishop)

5 Representation For a given choice of K, we can again represent this as a directed graph having K nodes, one for each conditional distribution on the right hand side, with each node having incoming links from all lower numbered nodes. We say that this graph is fully connected because there is a link between every pair of nodes It s however the absence of links in the graph that conveys interesting information about the properties of the class of distributions that the graph represents

6 Bayesian Networks General Factorization

7 Factorization The equation expresses the factorization properties of the joint distribution for a directed graphical model. This is general we can equally associate sets of variables and vector valued variables with the nodes of a graph It is easy to show that the representation on the right hand side of is always correctly normalized provided the individual conditional distributions are normalized Restriction: there must be no directed cycles (directed acyclic graphs, DAGs). This is equivalent to the statement that there exists an ordering of the nodes such that there are no links that go from any node to any lower numbered node

8 Bayesian Curve Fitting (1) Polynomial

9 Bayesian Curve Fitting (2) Plate

10 Bayesian Curve Fitting (3) Input variables and explicit hyperparameters

11 Bayesian Curve Fitting Learning Condition on data

12 Bayesian Curve Fitting Prediction Predictive distribution: where

13 Generative Models Causal process for generating images

14 Ancestral Sampling

15 Causal, Generative Models The graphical model captures the causal process by which the observed data was generated. For this reason, such models are often called generative models By contrast, the polynomial regression model is not generative because there is no probability distribution associated with the input variable x, and so it is not possible to generate synthetic data points from this model. We could make it generative by introducing a suitable prior distribution p(x), at the expense of a more complex model

16 Discrete Variables (1) General joint distribution: 2 parameters Independent joint distribution: parameters

17 Discrete Variables (2) General joint distribution over parameters variables: node Markov chain: parameters

18 Discrete Variables: Bayesian Parameters (1)

19 Discrete Variables: Bayesian Parameters (2) Shared prior

20 Parameterized Conditional Distributions If are discrete, state variables, in general has parameters. The parameterized form requires only 1 parameters

21 Linear Gaussian Models Directed Graph Each node is Gaussian, the mean is a linear function of the parents. Vector valued Gaussian Nodes

22 Conditional Independence Suppose that is independent of given Equivalently Notation

23 Verifying Conditional Independence Conditional independence properties play an important role in using probabilistic models for pattern recognition, simplifying both the structure of a model and the computations needed to perform inference and learning under that model If we are given an expression for the joint distribution over a set of variables in terms of a product of conditional distributions, then we could test whether any potential conditional independence property holds by repeated application of the sum and product rules of probability. Such an approach would be very time consuming An important feature of graphical models is that conditional independence properties of the joint distribution can be read directly from the graph without the need for any analytical manipulations. The general framework for achieving this is called d separation

24 Conditional Independence: Example 1 node c is said to be tail-to-tail with respect to the path from node a to node b Make a distinction: In this case the conditional independence property does not hold in general. It may, however, hold for a particular distribution by virtue of the specific numerical values associated with the various conditional probabilities, but it does not follow in general from the structure of the graph.

25 Conditional Independence: Example 1

26 Conditional Independence: Example 2 node c is said to be head-to-tail with respect to the path from node a to node b

27 Conditional Independence: Example 2

28 Conditional Independence: Example 3 node c is head-to-head with respect to the path from a to b Note: this is the opposite of Example 1, with unobserved.

29 Conditional Independence: Example 3 There is one more subtlety associated with this third example that we need to consider. We say that node y is a descendant of node x if there is a path from x to y in which each step of the path follows the directions of the arrows. Then it can be shown that a head-to-head path will become unblocked if either the node, or any of its descendants, is observed.

30 Summary A tail to tail node or a head to tail node leaves a path unblocked unless it is observed in which case it blocks the path By contrast, a head to head node blocks a path if it is unobserved, but once the node, and/or at least one of its descendants, is observed the path becomes unblocked

31 Am I out of fuel? and hence B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full)

32 Am I out of fuel? Probability of an empty tank increased by observing 0.

33 Am I out of fuel? Probability of an empty tank reduced by observing 0. This referred to as explaining away.

34 D separation A, B, and C are non intersecting subsets of nodes in a directed graph. A path from A to B is blocked if it contains a node such that either a) the arrows on the path meet either head to tail or tailto tail at the node, and the node is in the set C, or b) the arrows meet head to head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d separated from B by C. If A is d separated from B by C, the joint distribution over all variables in the graph satisfies.

35 D separation: Example

36 Bayes Ball Algorithm To check if x A x C x B, we need to check if every variable in A is d-separated from every variable in C conditioned on all variables in B Otherwise said, given that all nodes in x B are clamped, when we change nodes x A, can we change any of the node x C? The Bayes-Ball Algorithm is a such a d-separation test We shade all nodes x B, place balls at each node in x A (or x C ), let them bounce around according to some rules, and then ask if any of the balls reach any of the nodes in x C (or x A ) So we need to know what happens when a ball arrives at a node Y on its way from X to Z

37 Bayes-Ball Rules

38 Bayes-Ball Boundary Rules We also need the boundary conditions Explaining away case: If y or any of its descendants is shaded, the ball passes through Balls can travel opposite to edge directions

39 D separation: I.I.D. Data

40 Directed Graphs as Distribution Filters We can view a graphical model (in this case a directed graph) as a filter in which a probability distribution p(x) is allowed through the filter if, and only if, it satisfies the directed factorization property. The set of all possible probability distributions p(x) that pass through the filter is denoted DF. We can alternatively use the graph to filter distributions according to whether they respect all of the conditional independencies implied by the d- separation properties of the graph. The d-separation theorem says that it is the same set of distributions DF that will be allowed through this second kind of filter.

41 Directed Graphs as Distribution Filters At one extreme we have a fully connected graph that exhibits no conditional independence properties at all, and which can represent any possible joint distribution over the given variables. The set DF will contain all possible distributions p(x) At the other extreme, we have the fully disconnected graph. This corresponds to joint distributions which factorize into the product of the marginal distributions over the variables comprising the nodes of the graph. For any given graph, the set of distributions DF will include any distributions that have additional independence properties beyond those described by the graph. For instance, a fully factorized distribution will always be passed through the filter implied by any graph over the corresponding set of variables.

42 The Markov Blanket Factors independent of x i cancel between numerator and denominator.

43 Markov Random Fields A Markov random field, also known as a Markov network or an undirected graphical model has a set of nodes each of which corresponds to a variable or group of variables, as well as a set of links each of which connects a pair of nodes The links are undirected, that is they do not carry arrows. By removing directionality from the links of the graph, the asymmetry between parent and child nodes is removed, and so the subtleties associated with head to head nodes (explaining away) no longer arise

44 Conditional Independence a vertex cutset is a subset whose removal breaks the graph into two or more pieces We consider all possible paths that connect nodes in set A to nodes in set B. If all such paths pass through one or more nodes in set C, then all such paths are blocked and so the conditional independence property holds. However, if there is at least one such path that is not blocked, then the property does not necessarily hold, or there will exist at least some distributions corresponding to the graph that do not satisfy this conditional independence relation. Note that this is exactly the same as the d separation criterion except that there is no explaining away phenomenon. Testing for conditional independence in undirected graphs is therefore simpler than in directed graphs An alternative way to view the conditional independence test is to remove all nodes in set C from the graph together with any links that connect to those nodes. We then ask if there exists a path that connects any node in A to any node in B. If there are no such paths, then the conditional independence property must hold

45 Markov Blanket Markov Blanket

46 Cliques and Maximal Cliques Clique Maximal Clique

47 Cliques and Maximal Cliques Clique Maximal Clique This leads us to consider a graphical concept called a clique, which is defined as a subset of the nodes in a graph such that there exists a link between all pairs of nodes in the subset. In other words, the set of nodes in a clique is fully connected. We can therefore define the factors in the decomposition of the joint distribution to be functions of the variables in the cliques. In fact, we can consider functions of the maximal cliques, without loss of generality, because other cliques must be subsets of maximal cliques.

48 Joint Distribution where is the positive potential over clique C and is the normalization coefficient; note: state variables terms in. Energies and the Boltzmann distribution Mostly necessary for learning Note that we do not restrict the choice of potential functions to those that have a specific probabilistic interpretation as marginal or conditional distributions. This is in contrast to directed graphs in which each factor represents the conditional distribution of the corresponding variable, conditioned on the state of its parents. However, when the undirected graph is constructed by starting with a directed graph, the potential functions may indeed have such an interpretation

49 Hammersley Clifford s Theorem (I) In plain words Consider the set of all possible distributions defined over a fixed set of variables corresponding to the nodes of a particular undirected graph. We can define UI to be the set of such distributions that are consistent with the set of conditional independence statements that can be read from the graph using graph separation Similarly, we can define UF to be the set of such distributions that can be expressed as a factorization with respect to the maximal cliques of the graph. The Hammersley Clifford theorem (Clifford, 1990) states that the sets UI and UF are identical.

50 Hammersley Clifford s Theorem (II)

51 Illustration: Image De Noising (1) Original Image Noisy Image

52 Illustration: Image De Noising (2)

53 Illustration: Image De Noising (3) Noisy Image Restored Image (ICM)

54 Illustration: Image De Noising (4) Restored Image (ICM) Restored Image (Graph cuts)

55 Converting Directed to Undirected Graphs (1)

56 How can we do this in general? We wish to convert any distribution specified by a factorization over a directed graph into one specified by a factorization over an undirected graph. This can be achieved if the clique potentials of the undirected graph are given by the conditional distributions of the directed graph. In order for this to be valid, we must ensure that the set of variables that appears in each of the conditional distributions is a member of at least one clique of the undirected graph. For nodes on the directed graph having just one parent, this is achieved simply by replacing the directed link with an undirected link. However, for nodes in the directed graph having more than one parent, this is not sufficient. These are nodes that have head to head paths encountered in our discussion of conditional independence.

58 Moralization (I)

59 Moralization (II) We saw that in going from a directed to an undirected representation we had to discard some conditional independence properties from the graph. We could always trivially convert any distribution over a directed graph into one over an undirected graph by simply using a fully connected undirected graph. This would, however, discard all conditional independence properties and so would be vacuous. The process of moralization adds the fewest extra links and so retains the maximum number of independence properties.

60 D-map and I-map A graph is said to be a D map (for dependency map ) of a distribution if every conditional independence statement satisfied by the distribution is reflected in the graph. Thus a completely disconnected graph (no links) will be a trivial D map for any distribution. Alternatively, we can consider a specific distribution and ask which graphs have the appropriate conditional independence properties. If every conditional independence statement implied by a graph is satisfied by a specific distribution, then the graph is said to be an I map (for independence map ) of that distribution. A fully connected graph will be a trivial I map for any distribution. If it is the case that every conditional independence property of the distribution is reflected in the graph, and vice versa, then the graph is said to be a perfect map for that distribution. A perfect map is therefore both an I map and a D map.

61 Directed vs. Undirected Graphs (1)

62 Directed vs. Undirected Graphs (2) Cannot be expressed using undirected graph over the same 3 variables Cannot be expressed using a directed graph over the same 4 variables

63 Inference We will study the problem of inference in graphical models, in which some of the nodes in a graph are clamped to observed values, and we wish to compute the posterior distributions of one or more subsets of other nodes. We will be able to exploit the graphical structure both to find efficient algorithms for inference, and to make the structure of those algorithms transparent. Specifically, we shall see that many algorithms can be expressed in terms of the propagation of local messages around the graph.

64 Inference in Graphical Models

65 Inference on a Chain

66 Inference on a Chain Because each summation effectively removes a variable from the distribution, this can be viewed as the removal of a node from the graph.

67 Inference on a Chain

68 Inference on a Chain

69 Inference on a Chain To compute local marginals: Compute and store all forward messages,. Compute and store all backward messages,. Compute Z at any node x n Compute for all variables required.

70 Readings Bishop, Ch. 8 (up to 8.4.2)

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian

### Machine Learning. Sourangshu Bhattacharya

Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

### Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

### Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

### D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

### Probabilistic Graphical Models

Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft

### Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include

### Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

### Graphical Models Part 1-2 (Reading Notes)

Graphical Models Part 1-2 (Reading Notes) Wednesday, August 3 2011, 2:35 PM Notes for the Reading of Chapter 8 Graphical Models of the book Pattern Recognition and Machine Learning (PRML) by Chris Bishop

### Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

### Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Graphical Models Dmitrij Lagutin, dlagutin@cc.hut.fi T-61.6020 - Machine Learning: Basic Principles 12.3.2007 Contents Introduction to graphical models Bayesian networks Conditional independence Markov

### Bayesian Machine Learning - Lecture 6

Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1

### Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,

### Lecture 4: Undirected Graphical Models

Lecture 4: Undirected Graphical Models Department of Biostatistics University of Michigan zhenkewu@umich.edu http://zhenkewu.com/teaching/graphical_model 15 September, 2016 Zhenke Wu BIOSTAT830 Graphical

### Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

### Lecture 3: Conditional Independence - Undirected

CS598: Graphical Models, Fall 2016 Lecture 3: Conditional Independence - Undirected Lecturer: Sanmi Koyejo Scribe: Nate Bowman and Erin Carrier, Aug. 30, 2016 1 Review for the Bayes-Ball Algorithm Recall

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

### Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

### FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

### 3 : Representation of Undirected GMs

0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

### These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

### Lecture 5: Exact inference

Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

### Directed Graphical Models (Bayes Nets) (9/4/13)

STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For

### 4 Factor graphs and Comparing Graphical Model Types

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 4 Factor graphs and Comparing Graphical Model Types We now introduce

### ECE521 Lecture 21 HMM cont. Message Passing Algorithms

ECE521 Lecture 21 HMM cont Message Passing Algorithms Outline Hidden Markov models Numerical example of figuring out marginal of the observed sequence Numerical example of figuring out the most probable

### Graphical Models & HMMs

Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models

### Probabilistic Graphical Models

Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

### Machine Learning

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from

### Machine Learning Lecture 16

ourse Outline Machine Learning Lecture 16 undamentals (2 weeks) ayes ecision Theory Probability ensity stimation Undirected raphical Models & Inference 28.06.2016 iscriminative pproaches (5 weeks) Linear

### COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical

### Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Recall from last time Lecture 4: Wrap-up of Bayes net representation. Markov networks Markov blanket, moral graph Independence maps and perfect maps Undirected graphical models (Markov networks) A Bayes

### ECE521 W17 Tutorial 10

ECE521 W17 Tutorial 10 Shenlong Wang and Renjie Liao *Some of materials are credited to Jimmy Ba, Eric Sudderth, Chris Bishop Introduction to A4 1, Graphical Models 2, Message Passing 3, HMM Introduction

### 1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

### Chapter 8 of Bishop's Book: Graphical Models

Chapter 8 of Bishop's Book: Graphical Models Review of Probability Probability density over possible values of x Used to find probability of x falling in some range For continuous variables, the probability

### Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart 1 Motivation Up to now we have considered distributions of a single random variable

### CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy

### Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright

### Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex

### 5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains Recall

### Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

### Approximation Algorithms

Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A 4 credit unit course Part of Theoretical Computer Science courses at the Laboratory of Mathematics There will be 4 hours

### Introduction to Graphical Models

Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

### Probabilistic Graphical Models

School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

### Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation-6: Hardness of Inference Contents 1 NP-Hardness Part-II

### Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e CHAPTER 16: Graphical Models Graphical Models Aka Bayesian

### Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian

### Graphical models and message-passing algorithms: Some introductory lectures

Graphical models and message-passing algorithms: Some introductory lectures Martin J. Wainwright 1 Introduction Graphical models provide a framework for describing statistical dependencies in (possibly

### Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

### A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract

### 6 : Factor Graphs, Message Passing and Junction Trees

10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical

### Lecture 9: Undirected Graphical Models Machine Learning

Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?

### Undirected Graphical Models. Raul Queiroz Feitosa

Undirected Graphical Models Raul Queiroz Feitosa Pros and Cons Advantages of UGMs over DGMs UGMs are more natural for some domains (e.g. context-dependent entities) Discriminative UGMs (CRF) are better

### Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2

### Markov Equivalence in Bayesian Networks

Markov Equivalence in Bayesian Networks Ildikó Flesch and eter Lucas Institute for Computing and Information Sciences University of Nijmegen Email: {ildiko,peterl}@cs.kun.nl Abstract robabilistic graphical

### Machine Learning

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

### Machine Learning

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

### Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

" Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract

### A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

### Bayesian Classification Using Probabilistic Graphical Models

San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

### ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

### Introduction to information theory and coding - Lecture 1 on Graphical models

Introduction to information theory and coding - Lecture 1 on Graphical models Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - October,

### On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

### COS 513: Foundations of Probabilistic Modeling. Lecture 5

COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical

### Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

### Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction

### A Discovery Algorithm for Directed Cyclic Graphs

A Discovery Algorithm for Directed Cyclic Graphs Thomas Richardson 1 Logic and Computation Programme CMU, Pittsburgh PA 15213 1. Introduction Directed acyclic graphs have been used fruitfully to represent

### Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 4: Elimination algorithm, reconstituted graph, triangulation

### Loopy Belief Propagation

Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

### Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

Theory Probabilistic ical Models L. Enrique Sucar, INAOE and (INAOE) 1 / 32 Outline and 1 2 3 4 5 6 7 8 and 9 (INAOE) 2 / 32 A graph provides a compact way to represent binary relations between a set of

### Inference for loglinear models (contd):

Stat 504, Lecture 25 1 Inference for loglinear models (contd): Loglinear/Logit connection Intro to Graphical Models Stat 504, Lecture 25 2 Loglinear Models no distinction between response and explanatory

### Expectation Propagation

Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,

### A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

### Exact Inference: Elimination and Sum Product (and hidden Markov models)

Exact Inference: Elimination and Sum Product (and hidden Markov models) David M. Blei Columbia University October 13, 2015 The first sections of these lecture notes follow the ideas in Chapters 3 and 4

### 2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Graphical Models 2-1 2. Graphical Models Undirected graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models There are three families of

### Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

### WUCT121. Discrete Mathematics. Graphs

WUCT121 Discrete Mathematics Graphs WUCT121 Graphs 1 Section 1. Graphs 1.1. Introduction Graphs are used in many fields that require analysis of routes between locations. These areas include communications,

### AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Volume 8 No. 0 208, -20 ISSN: 3-8080 (printed version); ISSN: 34-3395 (on-line version) url: http://www.ijpam.eu doi: 0.2732/ijpam.v8i0.54 ijpam.eu AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE

### 18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

### Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br

### Causality with Gates

John Winn Microsoft Research, Cambridge, UK Abstract An intervention on a variable removes the influences that usually have a causal effect on that variable. Gates [] are a general-purpose graphical modelling

### Directed Graphical Models

Copyright c 2008 2010 John Lafferty, Han Liu, and Larry Wasserman Do Not Distribute Chapter 18 Directed Graphical Models Graphs give a powerful way of representing independence relations and computing

### LECTURE 8: SETS. Software Engineering Mike Wooldridge

LECTURE 8: SETS Mike Wooldridge 1 What is a Set? The concept of a set is used throughout mathematics; its formal definition matches closely our intuitive understanding of the word. Definition: A set is

### IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER 2011 2401 Probabilistic Image Modeling With an Extended Chain Graph for Human Activity Recognition and Image Segmentation Lei Zhang, Member,

### Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

Reasoning About Uncertainty Graphical representation of causal relations (examples) Graphical models Inference in graphical models (introduction) 1 Jensen, 1996 Example 1: Icy Roads 2 1 Jensen, 1996 Example

### Introduction to Graph Theory

Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

### CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

### Treewidth and graph minors

Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

### STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of

### The Structure of Bull-Free Perfect Graphs

The Structure of Bull-Free Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertex-disjoint

### Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

### Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

### Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Ch9: Exact Inference: Variable Elimination Shimi Salant Barak Sternberg Part 1 Reminder introduction (1/3) We saw two ways to represent (finite discrete) distributions via graphical data structures: Bayesian

### CS 343: Artificial Intelligence

CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

### What is machine learning?

Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

### Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

### Junction tree propagation - BNDG 4-4.6

Junction tree propagation - BNDG 4-4. Finn V. Jensen and Thomas D. Nielsen Junction tree propagation p. 1/2 Exact Inference Message Passing in Join Trees More sophisticated inference technique; used in