Bayesian Networks Inference (continued) Learning

Similar documents
Bayesian Networks Inference

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Building Classifiers using Bayesian Networks

Machine Learning

Machine Learning

Time series, HMMs, Kalman Filters

10708 Graphical Models: Homework 2

Learning Bayesian Networks (part 3) Goals for the lecture

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

STAT 598L Learning Bayesian Network Structure

Machine Learning

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

CS 188: Artificial Intelligence

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

CS 343: Artificial Intelligence

Probabilistic Graphical Models

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Summary: A Tutorial on Learning With Bayesian Networks

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Simple Model Selection Cross Validation Regularization Neural Networks

CS 343: Artificial Intelligence

Warm-up as you walk in

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference

Probabilistic Graphical Models

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Bayesian Classification Using Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

STA 4273H: Statistical Machine Learning

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Exam Advanced Data Mining Date: Time:

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Probabilistic Classifiers DWML, /27

Learning Bounded Treewidth Bayesian Networks

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Perceptron as a graph

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Machine Learning

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Dynamic Bayesian network (DBN)

Learning Tractable Probabilistic Models Pedro Domingos

BAYESIAN NETWORKS STRUCTURE LEARNING

Junction tree propagation - BNDG 4-4.6

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Extending Naive Bayes Classifier with Hierarchy Feature Level Information for Record Linkage. Yun Zhou. AMBN-2015, Yokohama, Japan 16/11/2015

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

A Co-Clustering approach for Sum-Product Network Structure Learning

What is machine learning?

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models

Instance-based Learning

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Probabilistic Graphical Models

CSE 446 Bias-Variance & Naïve Bayes

Speeding Up Computation in Probabilistic Graphical Models using GPGPUs

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Mean Field and Variational Methods finishing off

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Learning Bayesian Networks (part 2) Goals for the lecture

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

Scene Grammars, Factor Graphs, and Belief Propagation

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Boosting Simple Model Selection Cross Validation Regularization

Structured Learning. Jun Zhu

Scene Grammars, Factor Graphs, and Belief Propagation

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Bayesian Network Classifiers with Reduced Precision Parameters

Mean Field and Variational Methods finishing off

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Machine Learning. Sourangshu Bhattacharya

AN MDL FRAMEWORK FOR DATA CLUSTERING

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Machine Learning. Supervised Learning. Manfred Huber

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Exact Inference: Elimination and Sum Product (and hidden Markov models)

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data

Trading off Speed and Accuracy in Multilabel Classification

Introduction to Probabilistic Graphical Models

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Review I" CMPSCI 383 December 6, 2011!

Probabilistic Graphical Models

Consistent and Efficient Reconstruction of Latent Tree Models

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

CLUSTERING. JELENA JOVANOVIĆ Web:

Probabilistic Learning Classification using Naïve Bayes

From Exact to Anytime Solutions for Marginal MAP

Lecture 9: Undirected Graphical Models Machine Learning

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Transcription:

Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 23 rd, 2005

Review Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Compute P(X e) Time exponential in tree-width, not number of variables Today Finding most likely explanation Using sampling for approximate inference Learn BN structure Flu Headache Sinus Allergy Nose

Most likely explanation (MLE) Query: Flu Sinus Allergy Headache Nose Using Bayes rule: Normalization irrelevant:

Max-marginalization Flu Allergy=t Sinus

Example of variable elimination for MLE Forward pass Flu Allergy Sinus Headache Nose=t

Example of variable elimination for MLE Backward pass Flu Allergy Sinus Headache Nose=t

MLE Variable elimination algorithm Forward pass Given a BN and a MLE query max x1,,x n P(x 1,,x n,e) Instantiate evidence e Choose an ordering on variables, e.g., X 1,, X n For i = 1 to n, If X i {e} Collect factors f 1,,f k that include X i Generate a new factor by eliminating X i from these factors Variable X i has been eliminated!

MLE Variable elimination algorithm Backward pass {x 1*,, x n* } will store maximizing assignment For i = n to 1, If X i {e} Take factors f 1,,f k used when X i was eliminated Instantiate f 1,,f k, with {x i+1*,, x n* } Now each f j depends only on X i Generate maximizing assignment for X i :

Stochastic simulation Obtaining a sample from the joint distribution Flu Allergy Sinus Headache Nose

Using stochastic simulation (sampling) to compute P(X) Given a BN, a query P(X), and number of samples m Flu Allergy Choose a topological ordering on variables, e.g., X 1,, X n For j = 1 to m Headache Sinus Nose {x 1j,, x nj } will be j th sample For i = 1 to n Sample x ij from the distribution P(X i Pa Xi ), where parents are instantiated to {x 1j,, x i-1j } Add {x 1j,, x nj } to dataset Use counts to compute P(X)

Example of using rejection sampling to compute P(X e) Flu Allergy Sinus Headache Nose = t

Using rejection sampling to compute P(X e) Given a BN, a query P(X e), and number of samples m Choose a topological ordering on variables, e.g., X 1,, X n j = 0 While j < m {x 1j,, x nj } will be j th sample For i = 1 to n Flu Headache Sample x ij from the distribution P(X i Pa Xi ), where parents are instantiated to {x 1j,, x i-1j } If {x 1j,, x nj } consistent with evidence, add it to dataset and j = j +1 Use counts to compute P(X e) Sinus Allergy Nose = t

Example of using importance sampling to compute P(X e) Flu Allergy Sinus Headache Nose = t

Using importance sampling to compute P(X e) Flu Allergy For j = 1 to m {x 1j,, x nj } will be j th sample Initialize weight of sample w j = 1 For i = 1 to n If X i {e} else Headache Sample x ij from the distribution P(X i Pa Xi ), where parents are instantiated to {x 1j,, x i-1j } Set x ij to assignment in evidence e Multiply weight w j by P(x ij Pa Xi ), where parents are instantiated to {x 1j,, x i-1j } Add {x 1j,, x nj } to dataset with weight w j Use weighted counts to compute P(X e) Sinus Nose = t

What you need to know about inference Bayesian networks A useful compact representation for large probability distributions Inference to compute Probability of X given evidence e Most likely explanation (MLE) given evidence e Inference is NP-hard Variable elimination algorithm Efficient algorithm ( only exponential in tree-width, not number of variables) Elimination order is important! Approximate inference necessary when tree-width to large Only difference between probabilistic inference and MLE is sum versus max Sampling Example of approximate inference Simulate from model Likelihood weighting for inference Can be very slow

Where are we? Bayesian networks Represent exponentially-large probability distributions compactly Inference in BNs Exact inference very fast for problems with low treewidth Many approximate inference algorithms, e.g., sampling Learning BNs Given structure, estimate parameters Using counts (maximum likelihood), MAP is also possible What about learning structure?

Maximum likelihood (ML) for learning BN structure Possible structures Learn parameters using ML Score structure Flu Allergy Sinus Headache Nose Data <x_1^{(1)},,x_n^{(1)}> <x_1^{(m)},,x_n^{(m)}>

How many graphs are there?

How many trees are there? Nonetheless Efficient optimal algorithm finds best tree

Scoring a tree

Chow-Liu tree learning algorithm For each pair of variables X i,x j Compute empirical distribution: Compute mutual information: Define a graph Nodes X 1,,X n Edge (i,j) gets weight Optimal tree BN Compute maximal spanning tree Directions in BN: pick any node as root, breadth-first-search defines directions

Can we extend Chow-Liu Tree augmented naïve Bayes (TAN) [Friedman et al. 97] Same as Chow-Liu, but score edges with: (Approximate learning) models with tree-width up to k [Narasimhan & Bilmes 04] But, O(n k+1 )

Scoring general graphical models Model selection problem What s the best structure? Flu Allergy Sinus Headache Nose Data <x_1^{(1)},,x_n^{(1)}> <x_1^{(m)},,x_n^{(m)}> The more edges, the fewer independence assumptions, the higher the likelihood of the data, but will overfit

Bayesian score Given a structure, distribution over parameters Difficult integral, use Bayes information criterion (BIC) approximation Note: regularize with MDL score Best BN under BIC still NP-hard

Learn BN structure using local search Starting from Chow-Liu tree Local search, possible moves: Add edge Delete edge Invert edge Score using BIC

What you need to know about learning BNs Bayesian networks A useful compact representation for large probability distributions Inference Variable elimination algorithm Sampling Example of approximate inference Learning BNs Maximum likelihood or MAP learns parameters Best tree (Chow-Liu) Best TAN Other BNs, usually local search with BIC score

Acknowledgements JavaBayes applet http://www.pmr.poli.usp.br/ltd/software/javabayes/ho me/index.html