Machine Learning

Similar documents
Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Bayesian Networks Inference (continued) Learning

Machine Learning

Bayesian Networks Inference

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

CS 343: Artificial Intelligence

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models

CS 343: Artificial Intelligence

Introduction to Graphical Models

COMP90051 Statistical Machine Learning

STA 4273H: Statistical Machine Learning

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

Machine Learning. Sourangshu Bhattacharya

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

ECE521 Lecture 18 Graphical Models Hidden Markov Models

6 : Factor Graphs, Message Passing and Junction Trees

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE521 W17 Tutorial 10

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Graphical Models Part 1-2 (Reading Notes)

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

Probabilistic Graphical Models

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

CS 188: Artificial Intelligence

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Learning Bayesian Networks (part 3) Goals for the lecture

Directed Graphical Models (Bayes Nets) (9/4/13)

Bayesian Classification Using Probabilistic Graphical Models

Machine Learning Lecture 16

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Machine Learning

Simple Model Selection Cross Validation Regularization Neural Networks

Chapter 8 of Bishop's Book: Graphical Models

Probabilistic Graphical Models

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Warm-up as you walk in

Junction tree propagation - BNDG 4-4.6

COMP90051 Statistical Machine Learning

Time series, HMMs, Kalman Filters

Bayesian Machine Learning - Lecture 6

Factor Graphs and message passing

Markov Logic: Representation

Review I" CMPSCI 383 December 6, 2011!

Graphical Probability Models for Inference and Decision Making

Mean Field and Variational Methods finishing off

Probabilistic Graphical Models

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Mean Field and Variational Methods finishing off

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Lecture 3: Conditional Independence - Undirected

Building Classifiers using Bayesian Networks

Graphical Models. David M. Blei Columbia University. September 17, 2014

Computational Intelligence

RNNs as Directed Graphical Models

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Cognitive Network Inference through Bayesian Network Analysis

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Directed Graphical Models

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Graphical Models and Markov Blankets

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Summary: A Tutorial on Learning With Bayesian Networks

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Deep Generative Models Variational Autoencoders

Graphical Models & HMMs

Scene Grammars, Factor Graphs, and Belief Propagation

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

The Basics of Graphical Models

Introduction to Probabilistic Graphical Models

Graphical Models and Bayesian Networks user!2015 tutorial

7. Boosting and Bagging Bagging

Lecture 9: Undirected Graphical Models Machine Learning

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Probabilistic Learning Classification using Naïve Bayes

Dynamic Bayesian network (DBN)

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Sequence Labeling: The Problem

Bayes Net Learning. EECS 474 Fall 2016

Transcription:

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies Simple inference Simple learning Readings: Required: Bishop chapter 8, through 8.2 Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express sets of conditional independence assumptions via graph structure Graph structure plus associated parameters define joint probability distribution over set of variables Two types of graphical models: Directed graphs (aka Bayesian Networks) today Undirected graphs (aka Markov Random Fields) 1

Graphical Models Why Care? Among most important ML developments of the decade Graphical models allow combining: Prior knowledge in form of dependencies/independencies Prior knowledge in form of priors over parameters Observed training data Principled and ~general methods for Probabilistic inference Learning Useful in practice Diagnosis, help systems, text analysis, time series models,... Conditional Independence Definition: X is conditionally independent of Y given Z, if the probability distribution governing X is independent of the value of Y, given the value of Z Which we often write E.g., 2

Marginal Independence Definition: X is marginally independent of Y if Equivalently, if Equivalently, if Represent Joint Probability Distribution over Variables 3

Describe network of dependencies Bayes Nets define Joint Probability Distribution in terms of this graph, plus parameters Benefits of Bayes Nets: Represent the full joint distribution in fewer parameters, using prior knowledge about dependencies Algorithms for inference and learning 4

Bayesian Networks Definition A Bayes network represents the joint probability distribution over a collection of random variables A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPD s) Each node denotes a random variable Edges denote dependencies For each node X i its CPD defines P(X i Pa(X i )) The joint distribution over all variables is defined to be Pa(X) = immediate parents of X in the graph Bayesian Network Nodes = random variables StormClouds A conditional probability distribution (CPD) is associated with each node N, defining P(N Parents(N)) Lightning Thunder Rain Parents P(W Pa) P( W Pa) L, R 0 1.0 L, R 0 1.0 L, R 0.2 0.8 L, R 0.9 0.1 The joint distribution over all variables: 5

Bayesian Network StormClouds What can we say about conditional independencies in a Bayes Net? One thing is this: Each node is conditionally independent of its non-descendents, given only its immediate parents. Lightning Thunder Rain Parents P(W Pa) P( W Pa) L, R 0 1.0 L, R 0 1.0 L, R 0.2 0.8 L, R 0.9 0.1 Some helpful terminology Parents = Pa(X) = immediate parents Antecedents = parents, parents of parents,... Children = immediate children Descendents = children, children of children,... 6

Bayesian Networks CPD for each node X i describes P(X i Pa(X i )) Chain rule of probability says that in general: But in a Bayes net: StormClouds How Many Parameters? Lightning Rain Parents P(W Pa) P( W Pa) L, R 0 1.0 L, R 0 1.0 L, R 0.2 0.8 L, R 0.9 0.1 Thunder To define joint distribution in general? To define joint distribution for this Bayes Net? 7

StormClouds Inference in Bayes Nets Lightning Rain Parents P(W Pa) P( W Pa) L, R 0 1.0 L, R 0 1.0 L, R 0.2 0.8 L, R 0.9 0.1 Thunder P(S=1, L=0, R=1, T=0, W=1) = StormClouds Learning a Bayes Net Lightning Rain Parents P(W Pa) P( W Pa) L, R 0 1.0 L, R 0 1.0 L, R 0.2 0.8 L, R 0.9 0.1 Thunder Consider learning when graph structure is given, and data = { <s,l,r,t,w> } What is the MLE solution? MAP? 8

Algorithm for Constructing Bayes Network Choose an ordering over variables, e.g., X 1, X 2,... X n For i=1 to n Add X i to the network Select parents Pa(X i ) as minimal subset of X 1... X i-1 such that Notice this choice of parents assures (by chain rule) (by construction) Example Bird flu and Allegies both cause Nasal problems Nasal problems cause Sneezes and Headaches 9

What is the Bayes Network for X1, X4 with NO assumed conditional independencies? What is the Bayes Network for Naïve Bayes? 10

What do we do if variables are mix of discrete and real valued? Bayes Network for a Hidden Markov Model Implies the future is conditionally independent of the past, given the present Unobserved state: S t-2 S t-1 S t S t+1 S t+2 Observed output: O t-2 O t-1 O t O t+1 O t+2 11

What You Should Know Bayes nets are convenient representation for encoding dependencies / conditional independence BN = Graph plus parameters of CPD s Defines joint distribution over variables Can calculate everything else from that Though inference may be intractable Reading conditional independence relations from the graph Each node is cond indep of non-descendents, given only its parents Explaining away See Bayes Net applet: http://www.cs.cmu.edu/~javabayes/home/applet.html Inference in Bayes Nets In general, intractable (NP-complete) For certain cases, tractable Assigning probability to fully observed set of variables Or if just one variable unobserved Or for singly connected graphs (ie., no undirected loops) Belief propagation For multiply connected graphs Junction tree Sometimes use Monte Carlo methods Generate many samples according to the Bayes Net distribution, then count up the results Variational methods for tractable approximate solutions 12

Example Bird flu and Allegies both cause Sinus problems Sinus problems cause Headaches and runny Nose Prob. of joint assignment: easy Suppose we are interested in joint assignment <F=f,A=a,S=s,H=h,N=n> What is P(f,a,s,h,n)? let s use p(a,b) as shorthand for p(a=a, B=b) 13

Prob. of marginals: not so easy How do we calculate P(N=n)? let s use p(a,b) as shorthand for p(a=a, B=b) Generating a sample from joint distribution: easy How can we generate random samples drawn according to P(F,A,S,H,N)? let s use p(a,b) as shorthand for p(a=a, B=b) 14

Generating a sample from joint distribution: easy How can we generate random samples drawn according to P(F,A,S,H,N)? let s use p(a,b) as shorthand for p(a=a, B=b) Generating a sample from joint distribution: easy Note we can estimate marginals like P(N=n) by generating many samples from joint distribution, then count the fraction of samples for which N=n Similarly, for anything else we care about P(F=1 H=1, N=0) à weak but general method for estimating any probability term let s use p(a,b) as shorthand for p(a=a, B=b) 15

Prob. of marginals: not so easy But sometimes the structure of the network allows us to be clever à avoid exponential work eg., chain A B C D E Inference in Bayes Nets In general, intractable (NP-complete) For certain cases, tractable Assigning probability to fully observed set of variables Or if just one variable unobserved Or for singly connected graphs (ie., no undirected loops) Variable elimination Belief propagation For multiply connected graphs Junction tree Sometimes use Monte Carlo methods Generate many samples according to the Bayes Net distribution, then count up the results Variational methods for tractable approximate solutions 16