Cheng Soon Ong & Christian Walder. Canberra February June 2018
|
|
- Daisy Greer
- 5 years ago
- Views:
Transcription
1 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1 Linear Regression 2 Linear Classification 1 Linear Classification 2 Kernel Methods Sparse Kernel Methods Mixture Models and EM 1 Mixture Models and EM 2 Neural Networks 1 Neural Networks 2 Principal Component Analysis Autoencoders Graphical Models 1 Graphical Models 2 Graphical Models 3 Sampling Sequential Data 1 Sequential Data 2 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 828
2 Part XIII Probabilistic Graphical Models 1 472of 828
3 via a picture Image Segmentation Cluster the gray value representation of an image Neighbourhood information lost Need to use the structure of the image. 473of 828
4 via (1/4) Why is the grass wet? Cloudy Sprinkler Rain WetGrass Introduce four Boolean variables : C(loudy), S(prinkler), R(ain), W(etGrass) {F(alse), T(rue)}. 474of 828
5 via (2/4) Model the conditional probabilities p(c = F) p(c = T) C p(s = F) p(s = T) F T S R p(w = F) p(w = T) F F T F F T T T C p(r = F) p(r = T) F T Cloudy Sprinkler Rain WetGrass 475of 828
6 via (3/4) If everything depends on everything C S R W p(c, S, R, W) F F F F... F F F T T T T F... T T T T... p(w, S, R, C) = p(w S, R, C) p(s, R, C) = p(w S, R, C) p(s R, C) p(r, C) = p(w S, R, C) p(s R, C) p(r C) p(c) Cloudy Sprinkler Rain WetGrass 476of 828
7 via (4/4) p(w) = p(w S, R, C) p(s R, C) p(r C) p(c) S,R,C Sprinkler Cloudy WetGrass Rain p(w) = p(w S, R) S,R C p(s C) p(r C) p(c) Sprinkler Cloudy Rain WetGrass 477of 828
8 via Distributive Law Two key observations when dealing with probabilities 1 Distributive Law can save operations a(b + c) = ab + ac }{{}}{{} 2 operations 3 operations 2 If some probabilities do not depend on all random variables, we might be able to factor them out. For example, assume p(x 1, x 3 x 2) = p(x 1 x 2) p(x 3 x 2), then (using x 3 p(x 3 x 2) = 1) p(x 1) = x 2,x 3 p(x 1, x 2, x 3) = x 2,x 3 p(x 1, x 3 x 2) p(x 2) = p(x 1 x 2) p(x 3 x 2) p(x 2) = p(x 1 x 2) p(x 2) x 2,x 3 x 2 } {{ } O( X 1 X 2 X 3 ) } {{ } O( X 1 X 2 ) 478of 828
9 via Complexity Reduction How to deal with more complex expression? p(x 1 ) p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) p(x 5 x 1, x 3 ) p(x 6 x 4 ) p(x 7 x 4, x 5 ) 479of 828
10 via Complexity Reduction How to deal with more complex expression? p(x 1 ) p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) p(x 5 x 1, x 3 ) p(x 6 x 4 ) p(x 7 x 4, x 5 ) Graphical models x 1 x 2 x 3 x 4 x 5 x 6 x 7 480of 828
11 Probabilistic Graphical Models: Aims Graphical models Visualise the structure of a probabilistic model Complex computations with formulas manipulations with graphs Obtain insights into model properties by inspection Develop and motivate new models 481of 828
12 Probabilistic Graphical Models: Main Flavours Graph 1 Nodes (vertices) : a random variable 2 Edges (links, arcs; directed or undirected) : probabilistic relationship Directed Graph : (also called Directed Graphical Model) expressing causal relationship between variables Undirected Graph : Markov Random Field expressing soft constraints between variables Factor Graph : convenient for solving inference problems (derived from s or Markov Random Fields). x1 x2 x3 x1 x2 x3 yi x4 x5 x6 x7 fa fb fc fd Factor Graph xi Markov Random Field 482of 828
13 : Arbitrary Joint Distribution p(a, b, c) = p(c a, b) p(a, b) = p(c a, b) p(b a) p(a) 483of 828
14 : Arbitrary Joint Distribution p(a, b, c) = p(c a, b) p(a, b) = p(c a, b) p(b a) p(a) 1 Draw a node for each conditional distribution associated with a random variable. 2 Draw an edge from each conditional distribution associated with a random variable to all other conditional distribution which are conditioned on this variable. a b c We have chosen a particular ordering of the variables! 484of 828
15 : Arbitrary Joint Distribution General case for K variables p(x 1,..., x K ) = p(x K x 1,..., x K 1 )... p(x 2 x 1 ) p(x 1 ) The graph of this distribution is fully connected. (Prove it.) What happens if we deal with a distribution represented by a graph which is not fully connected? Can not be the most general distribution anymore. The absence of edges carries important information. 485of 828
16 Joint Factorisation Graph p(x 1 )p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) p(x 5 x 1, x 3 ) p(x 6 x 4 ) p(x 7 x 4, x 5 ) 486of 828
17 Joint Factorisation Graph p(x 1 )p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) p(x 5 x 1, x 3 ) p(x 6 x 4 ) p(x 7 x 4, x 5 ) 1 Draw a node for each conditional distribution associated with a random variable. 2 Draw an edge from each conditional distribution associated with a random variable to all other conditional distribution which are conditioned on this variable. 487of 828
18 Joint Factorisation Graph p(x 1 )p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) p(x 5 x 1, x 3 ) p(x 6 x 4 ) p(x 7 x 4, x 5 ) 1 Draw a node for each conditional distribution associated with a random variable. 2 Draw an edge from each conditional distribution associated with a random variable to all other conditional distribution which are conditioned on this variable. x 1 x 2 x 3 x 4 x 5 x 6 x 7 488of 828
19 Graph Joint Factorisation x 1 x 2 x 3 x 4 x 5 x 6 x 7 Can we get the expression from the graph? 489of 828
20 Graph Joint Factorisation x 1 x 2 x 3 x 4 x 5 x 6 x 7 Can we get the expression from the graph? 1 Write a product of probability distributions, one for each associated random variable. Draw a node for each conditional distribution associated with a random variable. 2 Add all random variables associated with parent nodes to the list of conditioning variables. Draw an edge from each conditional distribution associated with a random variable to all other conditional distribution which are conditioned on this variable. p(x 1 )p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) p(x 5 x 1, x 3 ) p(x 6 x 4 ) p(x 7 x 4, x 5 ) 490of 828
21 Joint Factorisation Graph The joint distribution defined by a graph is given by the product, over all of the nodes of the graph, of a conditional distribution for each node conditioned on the variables corresponding to the parents of the node in the graph. p(x) = K p(x k pa(x k )) k=1 where pa(x k ) denotes the set of parents of x k and x = (x 1,..., x K ). The absence of edges (compared to the fully connected case) is the point. 491of 828
22 Joint Factorisation Graph Restriction : Graph must be a directed acyclic graph (DAG). There are no closed paths in the graph when moving along the directed edges. Or equivalently: There exists an ordering of the nodes such that there are no edges that go from any node to any lower numbered node. x 1 x 2 x 3 x 4 x 5 x 6 x 7 Extension: Can also have sets of variables, or vectors at a node. 492of 828
23 : Normalisation Given p(x) = K p(x k pa(x k )). k=1 Is p(x) normalised, x p(x) = 1? 493of 828
24 : Normalisation Given p(x) = K p(x k pa(x k )). k=1 Is p(x) normalised, x p(x) = 1? As graph is DAG, there always exists a node with no outgoing edges, say x i. p(x) = x x 1,...,x i 1,x i+1,...,x K K p(x k pa(x k )) k=1 k i x i p(x i pa(x i )) }{{} =1 because x i p(x i pa(x i )) = x i Repeat, until no node left. p(x i,pa(x i)) p(pa(x i)) = p(pa(xi)) p(pa(x i)) = 1 494of 828
25 - Bayesian polynomial regression : observed inputs x, observed targets t, noise variance σ 2, hyperparameter α controlling the priors for w. Focusing on t and w only p(t, w) = p(w) N p(t n w) n=1 w t 1 t N 495of 828
26 - Bayesian polynomial regression : observed inputs x, observed targets t, noise variance σ 2, hyperparameter α controlling the priors for w. Focusing on t and w only p(t, w) = p(w) N p(t n w) n=1 w w t 1 t N t n N A plate. 496of 828
27 - Include also the parameters into the graphical model N p(t, w x, α, σ 2 ) = p(w α) p(t n w, x n, σ 2 ) n=1 Random variables = open circles Deterministic variables = smaller solid circles x n α σ 2 t n N w 497of 828
28 - Random variables Observed random variables, e.g. t Unobserved random variables, e.g. w, (latent random variables, hidden random variables) Shade the observed random variables in the graphical model. x n α σ 2 t n N w 498of 828
29 - Prediction : new data point x. Want to predict t. [ N ] p( t, t, w x, x, α, σ 2 ) = p(t n x n, w, σ 2 ) p(w α) p( t x, w, σ 2 ) n=1 x n α w t n N x σ 2 t Polynomial regression model including prediction. 499of 828
30 - Any p(x) may be written p(x) = = K p(x k x 1, x 2,..., x k 1 ) k=1 K p(x k pa(x k )) k=1 What does it mean if pa(x k ) x 1, x 2,..., x k 1 in the above equation? Sparse graphical model. p(x) no longer general. Assumption, e.g. p(w C, S, R) = p(w S, R). independence. 500of 828
31 - Definition ( ) If for three random variables a, b, and c the following holds p(a b, c) = p(a c) then a is conditionally independent of b given c. Notation : a b c. The above equation must hold for all possible values of c. Consequence : p(a, b c) = p(a b, c) p(b c) = p(a c) p(b c) independence simplifies the structure of the model the computations needed to perform interference/learning. NB: conditional dependence is a related but different concept we are not concerned with in this course. 501of 828
32 - Rules for Symmetry : Decomposition : Weak Union : Contraction : Intersection : X Y Z = Y X Z Y, W X Z = Y X Z and W X Z X Y, W Z = X Y Z, W X W Z, Y and X Y Z = X W, Y Z X Y Z, W and X W Z, Y = X Y, W Z Note: Intersection is only valid for p(x), p(y), p(z), p(w) > of 828
33 : TT (1/3) Can we work with the graphical model directly? Check the simplest examples containing only three nodes. First example has joint distribution p(a, b, c) = p(a c) p(b c) p(c) Marginalise both sides over c p(a, b) = p(a c) p(b c) p(c) p(a) p(b). c Does not hold : a b (where is the empty set). c a b 503of 828
34 : TT (2/3) Now condition on c. p(a, b c) = Therefore a b c. p(a, b, c) p(c) c = p(a c) p(b c) a b 504of 828
35 : TT (3/3) Graphical interpretation In both graphical models there is a path from a to b. The node c is called tail-to-tail (TT) with respect to this path because the node c is connected to the tails of the arrows in the path. The presence of the TT-node c in the path left renders a dependent on b (and b dependent on a). Conditioning on c blocks the path from a to b and causes a and b to become conditionally independent on c. c c a b a b Not a b a b c 505of 828
36 : HT (1/3) Second example. p(a, b, c) = p(a) p(c a) p(b c) Marginalise over c to test for independence. p(a, b) = p(a) p(c a) p(b c) = p(a) p(b a) p(a) p(b) c Does not hold : a b. a c b 506of 828
37 : HT (2/3) Now condition on c. p(a, b c) = p(a, b, c) p(c) = p(a) p(c a) p(b c) p(c) = p(a c) p(b c) where we used Bayes theorem p(c a) = p(a c) p(c)/p(a). Therefore a b c. a c b 507of 828
38 : HT (3/3) a c b a c b Not a b a b c 508of 828
39 : HH (1/4) Third example. (A little bit more subtle.) p(a, b, c) = p(a) p(b) p(c a, b) Marginalise over c to test for independence. p(a, b) = c p(a) p(b) p(c a, b) = p(a) p(b) c p(c a, b) = p(a) p(b) a and b are independent if NO variable is observed: a b. a b c 509of 828
40 : HH (2/4) Now condition on c. p(a, b c) = p(a, b, c) p(c) = Does not hold : a b c. a p(a) p(b) p(c a, b) p(c) b p(a c) p(b c). c 510of 828
41 : HH (3/4) Graphical interpretation In both graphical models there is a path from a to b. The node c is called head-to-head (HH) with respect to this path because the node c is connected to the heads of the arrows in the path. The presence of the HH-node c in the path left makes a independent of b (and b independent of a). The unobserved c blocks the path from a to b. a b a b c a b c Not a b c 511of 828
42 : HH (4/4) Graphical interpretation Conditioning on c unblocks the path from a to b, and renders a conditionally dependent on b given c. Some more terminology: Node y is a descendant of node x if there is a path from x to y in which each step follows the directions of the arrows. A HH-path will become unblocked if either the node, or any of its descendants, is observed. a b a f e b c Not a b c c Not a f c 512of 828
43 - General Graph independence and has been established for all configurations of three variables: (HH, HT, TT) (observed, unobserved) the subtle case of HH junction with observed descendent. We can generalise these results to arbitrary Bayesian Networks. Roughly: the graph connectivity implies conditional independence for those sets of nodes which are directionally separated. 513of 828
44 - D-Separation Definition (Blocked Path) A blocked path is a path which contains an observed TT- or HT-node, or an unobserved HH-node whose descendants are all unobserved. c a b a c b a b c 514of 828
45 - D-Separation Consider a general directed graph in which A, B, and C are arbitrary non-intersecting sets of nodes. (There may be other nodes in the graph which are not contained in the union of A, B, and C.) Consider all possible paths from any node in A to any node in B. Any such path is blocked, if it includes a node such that either the node is HT or TT, and the node is in set C, or the node is HH, and neither the node, nor any of the descendants, is in set C. If all paths are blocked, then A is d separated from B by C, and the joint distribution over all the variables in the graph will satisfy A B C. (Note: D-separation stands for directional separation.) 515of 828
46 - D-separation Example The path from a to b is not blocked by f because f is a TT-node and unobserved. The path from a to b is not blocked by e because e is a HH-node, and although unobserved itself, one of its descendants (node c) is observed. a f e b c 516of 828
47 - D-separation Another example The path from a to b is blocked by f because f is a TT-node and observed. Therefore, a b f. Furthermore, the path from a to b is also blocked by e because e is a HH-node, and neither it nor its descendants are observed. Therefore a b f. a f e b c 517of 828
48 Factorisation (1/4) Theorem (Factorisation ) If a probability distribution factorises according to a directed acyclic graph, and if A, B and C are disjoint subsets of nodes such that A is d-separated from B by C in the graph, then the distribution satisfies A B C. Theorem ( Factorisation) If a probability distribution satisfies the conditional independence statements implied by d-separation over a particular directed graph, then it also factorises according to the graph. 518of 828
49 Factorisation (2/4) Why is Factorisation relevant? statements are usually what a domain expert knows about the problem at hand. Needed is a model p(x) for computation. The Factorisation provides p(x) from statements. One can build a global model for computation from local conditional independence statements. 519of 828
50 Factorisation (3/4) Given a set of statements. Adding another statement will in general produce other statements. All statements can be read as d-separation in a DAG. However, there are sets of statements which cannot be satisfied by any Bayesian Network. 520of 828
51 Factorisation (4/4) The broader picture A directed graphical model can be viewed as a filter accepting probability distributions p(x) and only letting these through which satisfy the factorisation property. The set of all possible distribution p(x) which pass through the filter is denoted as DF. Alternatively, only these probability distributions p(x) pass through the filter (graph), which respect the conditional independencies implied by the d-separation properties of the graph. The d-separation theorem says that the resulting set DF is the same in both cases. p(x) DF 521of 828
Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles
Graphical Models Dmitrij Lagutin, dlagutin@cc.hut.fi T-61.6020 - Machine Learning: Basic Principles 12.3.2007 Contents Introduction to graphical models Bayesian networks Conditional independence Markov
More informationFMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationBayesian Machine Learning - Lecture 6
Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1
More informationGraphical Models Part 1-2 (Reading Notes)
Graphical Models Part 1-2 (Reading Notes) Wednesday, August 3 2011, 2:35 PM Notes for the Reading of Chapter 8 Graphical Models of the book Pattern Recognition and Machine Learning (PRML) by Chris Bishop
More informationProbabilistic Graphical Models
Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft
More informationWorkshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient
Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationGraphical Models & HMMs
Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 17, 2011 Today: Graphical models Learning from fully labeled data Learning from partly observed data
More informationMachine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e CHAPTER 16: Graphical Models Graphical Models Aka Bayesian
More informationGraphical Models and Markov Blankets
Stephan Stahlschmidt Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Why Graphical Models? Illustration
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Independence Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationChapter 2 PRELIMINARIES. 1. Random variables and conditional independence
Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure
More informationCSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas
ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Optimal Naïve Nets (Adapted from
More informationToday. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory
Today Logistic Regression Maximum Entropy Formulation Decision Trees Redux Now using Information Theory Graphical Models Representing conditional dependence graphically 1 Logistic Regression Optimization
More informationDependency detection with Bayesian Networks
Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationInference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at
More informationReasoning About Uncertainty
Reasoning About Uncertainty Graphical representation of causal relations (examples) Graphical models Inference in graphical models (introduction) 1 Jensen, 1996 Example 1: Icy Roads 2 1 Jensen, 1996 Example
More informationDirected Graphical Models (Bayes Nets) (9/4/13)
STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationProbabilistic Graphical Models
Probabilistic Graphical Models SML 17 Workshop#10 Prepared by: Yasmeen George Agenda Probabilistic Graphical Models Worksheet solution Design PGMs PGMs: Joint likelihood Using chain rule Using conditional
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationIntroduction to information theory and coding - Lecture 1 on Graphical models
Introduction to information theory and coding - Lecture 1 on Graphical models Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - October,
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationA note on the pairwise Markov condition in directed Markov fields
TECHNICAL REPORT R-392 April 2012 A note on the pairwise Markov condition in directed Markov fields Judea Pearl University of California, Los Angeles Computer Science Department Los Angeles, CA, 90095-1596,
More informationMarkov Equivalence in Bayesian Networks
Markov Equivalence in Bayesian Networks Ildikó Flesch and eter Lucas Institute for Computing and Information Sciences University of Nijmegen Email: {ildiko,peterl}@cs.kun.nl Abstract robabilistic graphical
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationA Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence
A Brief Introduction to Bayesian Networks AIMA 14.1-14.3 CIS 391 Intro to Artificial Intelligence (LDA slides from Lyle Ungar from slides by Jonathan Huang (jch1@cs.cmu.edu)) Bayesian networks A simple,
More informationBayesian Networks and Decision Graphs
ayesian Networks and ecision raphs hapter 7 hapter 7 p. 1/27 Learning the structure of a ayesian network We have: complete database of cases over a set of variables. We want: ayesian network structure
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More information1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture
More informationLecture 3: Conditional Independence - Undirected
CS598: Graphical Models, Fall 2016 Lecture 3: Conditional Independence - Undirected Lecturer: Sanmi Koyejo Scribe: Nate Bowman and Erin Carrier, Aug. 30, 2016 1 Review for the Bayes-Ball Algorithm Recall
More informationCausality with Gates
John Winn Microsoft Research, Cambridge, UK Abstract An intervention on a variable removes the influences that usually have a causal effect on that variable. Gates [] are a general-purpose graphical modelling
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationGraphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.
Machine Learning (ML, F16) Lecture#15 (Tuesday, Nov. 1st) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often, one is interested in representing a joint distribution P over a set of n random
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationComparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis
Comparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis Luis Gabriel De Alba Rivera Aalto University School of Science and
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationStatistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)
Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often one is interested in representing a joint distribution P
More informationLecture 4: Undirected Graphical Models
Lecture 4: Undirected Graphical Models Department of Biostatistics University of Michigan zhenkewu@umich.edu http://zhenkewu.com/teaching/graphical_model 15 September, 2016 Zhenke Wu BIOSTAT830 Graphical
More informationStat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs
Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, 2016 1 Introduction Graphical models come in many kinds. There are graphical models where all the variables are categorical (Lauritzen,
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Tomi Silander School of Computing National University of Singapore June 13, 2011 What is a probabilistic graphical model? The formalism of probabilistic graphical
More informationDirected Graphical Models
Copyright c 2008 2010 John Lafferty, Han Liu, and Larry Wasserman Do Not Distribute Chapter 18 Directed Graphical Models Graphs give a powerful way of representing independence relations and computing
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationFactor Graphs and message passing
Factor Graphs and message passing Carl Edward Rasmussen October 28th, 2016 Carl Edward Rasmussen Factor Graphs and message passing October 28th, 2016 1 / 13 Key concepts Factor graphs are a class of graphical
More informationStatistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart 1 Motivation Up to now we have considered distributions of a single random variable
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationCOUNTING AND PROBABILITY
CHAPTER 9 COUNTING AND PROBABILITY Copyright Cengage Learning. All rights reserved. SECTION 9.3 Counting Elements of Disjoint Sets: The Addition Rule Copyright Cengage Learning. All rights reserved. Counting
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More information2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1
Graphical Models 2-1 2. Graphical Models Undirected graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models There are three families of
More informationThese notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.
Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected
More informationRNNs as Directed Graphical Models
RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview
More informationCSC Discrete Math I, Spring Sets
CSC 125 - Discrete Math I, Spring 2017 Sets Sets A set is well-defined, unordered collection of objects The objects in a set are called the elements, or members, of the set A set is said to contain its
More informationSection 6.3: Further Rules for Counting Sets
Section 6.3: Further Rules for Counting Sets Often when we are considering the probability of an event, that event is itself a union of other events. For example, suppose there is a horse race with three
More informationA Brief Introduction to Bayesian Networks. adapted from slides by Mitch Marcus
A Brief Introduction to Bayesian Networks adapted from slides by Mitch Marcus Bayesian Networks A simple, graphical notation for conditional independence assertions and hence for compact specification
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due October 15th, beginning of class October 1, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,
More informationGraphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin
Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright
More informationLecture 9: Undirected Graphical Models Machine Learning
Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?
More informationUsing a Model of Human Cognition of Causality to Orient Arcs in Structural Learning
Using a Model of Human Cognition of Causality to Orient Arcs in Structural Learning A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George
More informationCAUSAL NETWORKS: SEMANTICS AND EXPRESSIVENESS*
Uncertainty in Artificial Intelligence 4 R.D. Shachter, T.S. Levitt, L.N. Kanal, J.F. Lemmer (Editors) Elsevier Science Publishers B.V. (North-Holland), 1990 69 CAUSAL NETWORKS: SEMANTICS AND EXPRESSIVENESS*
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationMachine Learning Lecture 16
ourse Outline Machine Learning Lecture 16 undamentals (2 weeks) ayes ecision Theory Probability ensity stimation Undirected raphical Models & Inference 28.06.2016 iscriminative pproaches (5 weeks) Linear
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationTransitivity and Triads
1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012 2 / 32 Outline 1 Local Structure Transitivity 2 3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one
More informationSampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S
Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S May 2, 2009 Introduction Human preferences (the quality tags we put on things) are language
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationTHREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.
THREE LECTURES ON BASIC TOPOLOGY PHILIP FOTH 1. Basic notions. Let X be a set. To make a topological space out of X, one must specify a collection T of subsets of X, which are said to be open subsets of
More informationWarm-up as you walk in
arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c
More informationLearning Tractable Probabilistic Models Pedro Domingos
Learning Tractable Probabilistic Models Pedro Domingos Dept. Computer Science & Eng. University of Washington 1 Outline Motivation Probabilistic models Standard tractable models The sum-product theorem
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationProbabilistic Learning Classification using Naïve Bayes
Probabilistic Learning Classification using Naïve Bayes Weather forecasts are usually provided in terms such as 70 percent chance of rain. These forecasts are known as probabilities of precipitation reports.
More information2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2
2.2 Set Operations 127 2.2 Set Operations Introduction Two, or more, sets can be combined in many different ways. For instance, starting with the set of mathematics majors at your school and the set of
More informationBayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.
Bayes Nets Independence With joint probability distributions we can compute many useful things, but working with joint PD's is often intractable. The naïve Bayes' approach represents one (boneheaded?)
More informationSequence Labeling: The Problem
Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used
More informationCausal Networks: Semantics and Expressiveness*
Causal Networks: Semantics and Expressiveness* Thomas Verina & Judea Pearl Cognitive Systems Laboratory, Computer Science Department University of California Los Angeles, CA 90024
More informationAI Programming CS S-15 Probability Theory
AI Programming CS662-2013S-15 Probability Theory David Galles Department of Computer Science University of San Francisco 15-0: Uncertainty In many interesting agent environments, uncertainty plays a central
More information2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1
Graphical Models 2-1 2. Graphical Models Undirected pairwise graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models Families of graphical
More information