ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Similar documents
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Bayesian Networks Inference

Bayesian Networks Inference (continued) Learning

Probabilistic Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Machine Learning

Machine Learning

Machine Learning

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Dynamic Bayesian network (DBN)

Lecture 5: Exact inference

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

STA 4273H: Statistical Machine Learning

Machine Learning

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence

Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

ECE 5424: Introduction to Machine Learning

Machine Learning. Sourangshu Bhattacharya

Machine Learning

Chapter 8 of Bishop's Book: Graphical Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS 343: Artificial Intelligence

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Computer vision: models, learning and inference. Chapter 10 Graphical Models

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Probabilistic Graphical Models

Introduction to Graphical Models

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Probabilistic Graphical Models

ECE521 W17 Tutorial 10

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Machine Learning

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Junction tree propagation - BNDG 4-4.6

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Lecture 9: Undirected Graphical Models Machine Learning

Time series, HMMs, Kalman Filters

Loopy Belief Propagation

COMP90051 Statistical Machine Learning

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Markov Random Fields and Gibbs Sampling for Image Denoising

Probabilistic Graphical Models

Expectation Propagation

Mean Field and Variational Methods finishing off

Structured Learning. Jun Zhu

Exact Inference: Elimination and Sum Product (and hidden Markov models)

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Mean Field and Variational Methods finishing off

6 : Factor Graphs, Message Passing and Junction Trees

Bayesian Classification Using Probabilistic Graphical Models

Machine Learning Lecture 16

Graphical Models. David M. Blei Columbia University. September 17, 2014

Sequence Labeling: The Problem

Probabilistic Graphical Models

Algorithms for Markov Random Fields in Computer Vision

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Machine Learning!!!!! Srihari. Chain Graph Models. Sargur Srihari

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Warm-up as you walk in

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Introduction

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Speeding Up Computation in Probabilistic Graphical Models using GPGPUs

COMP90051 Statistical Machine Learning

3 : Representation of Undirected GMs

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Collective classification in network data

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

The Basics of Graphical Models

CS 4100 // artificial intelligence

Markov Random Fields

Announcements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm

Announcements. Homework 1: Search. Project 1: Search. Midterm date and time has been set:

Integrating Probabilistic Reasoning with Constraint Satisfaction

Introduction to Hidden Markov models

More details on Loopy BP

Probabilistic Graphical Models

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Graphical Models for Computer Vision

Bayes Net Learning. EECS 474 Fall 2016

CS 188: Artificial Intelligence Fall 2011

Transcription:

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width Readings: KF 9.3,9.4; Barber 5.2 Dhruv Batra Virginia Tech

Administrativia HW1 Due in 2 weeks: Feb 17, Feb 19, 11:59pm Project Proposal Due: Mar 12, Mar 5, 11:59pm <=2pages, NIPS format HW2 Out soon Due: Mar 5, Mar 12, 11:59pm (C) Dhruv Batra 2

Project Individual or Groups of 2 we prefer teams of 2 Deliverables: 5%: Project proposal (NIPS format): <= 2 pages 10%: Midway presentations (in class) 10%: Final report: webpage with results (C) Dhruv Batra 3

Proposal 2 Page (NIPS format) http://nips.cc/conferences/2013/paperinformation/stylefiles Necessary Information: Project title Project idea. This should be approximately two paragraphs. Data set details Ideally existing dataset. No data-collection projects. Software Which libraries will you use? What will you write? Papers to read. Include 1-3 relevant papers. You will probably want to read at least one of them before submitting your proposal. Teammate will you have a teammate? If so, whom? Maximum team size is two students. Mid-sem Milestone What will you complete by the project milestone due date? Experimental results of some kind are expected here. (C) Dhruv Batra 4

Project Main categories Application/Survey Compare a bunch of existing algorithms on a new application domain of your interest Formulation/Development Formulate a new model or algorithm for a new or old problem Theory Theoretically analyze an existing algorithm Rules Should fit in Advanced Machine Learning Can apply ML to your own research. Must be done this semester. OK to combine with other class-projects Must declare to both course instructors Must have explicit permission from BOTH instructors Must have a sufficient ML component Using libraries No need to implement all algorithms OK to use standard MRF, BN, Structured SVM, etc libraries More thought+effort => More credit (C) Dhruv Batra 5

Recap of Last Time (C) Dhruv Batra 6

Main Issues in PGMs Representation How do we store P(X 1, X 2,, X n ) What does my model mean/imply/assume? (Semantics) Learning How do we learn parameters and structure of P(X 1, X 2,, X n ) from data? What model is the right for my data? Inference How do I answer questions/queries with my model? such as Marginal Estimation: P(X 5 X 1, X 4 ) Most Probable Explanation: argmax P(X 1, X 2,, X n ) (C) Dhruv Batra 7

Possible Queries Evidence: E=e (e.g. N=t) Query variables of interest Y Flu Sinus Allergy Headache Nose=t Conditional Probability: P(Y E=e) E.g. P(F,A N=t) Special case: Marginals P(F) Maximum a Posteriori: argmax P(All variables E=e) argmax_{f,a,s,h} P(f,a,s,h N = t) Marginal-MAP: argmax_y P(Y E=e) = argmax_{y} Σ o P(Y=y, O=o E=e) Old-school terminology: MPE Old-school terminology: MAP (C) Dhruv Batra 8

Application: Medical Diagnosis (C) Dhruv Batra Image Credit: Erik Sudderth 9

Are MAP and Max of Marginals Consistent? Sinus Nose P(S=f)=0.6 P(S=t)=0.4 P(N S)

Hardness Find P(All variables) Easy for BN: O(n) MAP Find argmax P(All variables E=e) Find any assignment P(All variables E=e) > p NP-hard NP-hard Conditional Probability / Marginals Is P(Y=y E=e) > 0 Find P(Y=y E=e) Find P(Y=y E=e) p <= ε Marginal-MAP Find argmax_{y} Σ o P(Y=y, O=o E=e) NP-hard #P-hard NP-hard for any ε<0.5 NP PP -hard (C) Dhruv Batra 11

Inference in BNs hopeless? In general, yes! Even approximate! In practice Exploit structure Many effective approximation algorithms some with guarantees Plan Exact Inference Transition to Undirected Graphical Models (MRFs) Approximate inference in the unified setting

Algorithms Conditional Probability / Marginals Variable Elimination Sum-Product Belief Propagation Sampling: MCMC MAP Variable Elimination Max-Product Belief Propagation Sampling MCMC Integer Programming Linear Programming Relaxation Combinatorial Optimization (Graph-cuts) (C) Dhruv Batra 13

Marginal Inference Example Evidence: E=e (e.g. N=t) Query variables of interest Y Flu Sinus Allergy Headache Nose=t Conditional Probability: P(Y E=e) P(F N=t) Derivation on board (C) Dhruv Batra 14

Variable Elimination algorithm Given a BN and a query P(Y e) P(Y,e) Instantiate Evidence IMPORTANT!!! Choose an ordering on variables, e.g., X 1,, X n For i = 1 to n, If X i {Y,E} Collect factors f 1,,f k that include X i Generate a new factor by eliminating X i from these factors Variable X i has been eliminated! Normalize P(Y,e) to obtain P(Y e) (C) Dhruv Batra Slide Credit: Carlos Guestrin 15

Plan for today BN Inference (Finish) Variable Elimination VE for MAP Inference Graph-view of VE Moralization Fill edges Induced Width Tree width (Start) Undirected Graphical Models (C) Dhruv Batra 16

VE for MAP Inference Evidence: E=e (e.g. N=t) Query variables of interest Y Flu Sinus Allergy Conditional Probability: P(Y E=e) P(F N=t) Headache Nose=t Maximum a Posteriori: argmax P(All variables E=e) argmax_{f,a,s,h} P(f,a,s,h N = t) Derivation on board VE or Dynamic Programming extends to arbitrary commutative semi-rings! (C) Dhruv Batra 17

VE for MAP Forward Pass Given a BN and a MAP query max x1,,x n P(x 1,,x n,e) Instantiate Evidence Choose an ordering on variables, e.g., X 1,, X n For i = 1 to n, If X i E Collect factors f 1,,f k that include X i Generate a new factor by eliminating X i from these factors Variable X i has been eliminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin 18

VE for MAP Backward Pass {x 1*,, x n* } will store maximizing assignment For i = n to 1, If X i E Take factors f 1,,f k used when X i was eliminated Instantiate f 1,,f k, with {x i+1*,, x n* } Now each f j depends only on X i Generate maximizing assignment for X i : (C) Dhruv Batra Slide Credit: Carlos Guestrin 19

Instantiating Evidence Given a BN and a query P(Y e) P(Y,e) This step reduces the size of factors Y 1 = {a, z} Y 2 = {a, z} Y 3 = {a, z} Y 4 = {a, z} Y 5 = {a, z} X 1 = X 2 = X 3 = X 4 = X 5 = Hidden Markov Model (HMM) (C) Dhruv Batra 20

Graph-view of VE So far: Algorithmic / Algebriac view of VE Next: Graph-based view of VE Modifications to graph-structure as VE is running (C) Dhruv Batra 21

Moralization Marry Parents Coherence Difficulty Intelligence Moralize graph: Connect parents into a clique and remove edge directions Grade SAT Letter Happy Job Connect nodes that appear together in an initial factor (C) Dhruv Batra Slide Credit: Carlos Guestrin 22

Eliminating a node Fill edges Coherence Difficulty Intelligence Eliminate variable add Fill Edges: Connect neighbors Grade SAT Letter Happy Job (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Induced graph Coherence Elimination order: O = {C,D,S,I,L,H,J,G} Difficulty Intelligence Grade Letter SAT The induced graph I FO for elimination order O has an edge X i X j if X i and X j appear together in a factor generated by VE for elimination order O on factors F Happy Job (C) Dhruv Batra Slide Credit: Carlos Guestrin 24

Different elimination order can lead to different induced graph Coherence Elimination order: O = {G,C,D,S,I,L,H,J} Difficulty Intelligence Grade SAT Letter Happy Job (C) Dhruv Batra Slide Credit: Carlos Guestrin 25

Induced graph and complexity of VE Read complexity from cliques in induced graph Coherence Structure of induced graph encodes complexity of VE!!! Difficulty Grade Letter Intelligence SAT Theorem: Every factor generated by VE is a clique in I FO Every maximal clique in I FO corresponds to a factor generated by VE Happy Job Elimination order: O = {C,D,I,S,L,H,J,G} Induced width Size of largest clique in I FO minus 1 Treewidth induced width of best order O* (C) Dhruv Batra Slide Credit: Carlos Guestrin 26

Example: Large induced-width with small number of parents Compact representation Easy inference L (C) Dhruv Batra 27

Finding optimal elimination order Coherence Difficulty Intelligence Theorem: Finding best elimination order is NP-complete: Decision problem: Given a graph, determine if there exists an elimination order that achieves induced width K Grade SAT Happy Letter Job Elimination order: {C,D,I,S,L,H,J,G} Interpretation: Hardness of finding elimination order in addition to hardness of inference Actually, can find elimination order in time exponential in size of largest clique same complexity as inference (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

Minimum (weighted) fill heuristic Coherence Difficulty Intelligence Min (weighted) fill heuristic Often very effective Initialize unobserved nodes X as unmarked Grade Letter SAT For k = 1 to X O(next) ç unmarked var whose elimination adds fewest edges Mark X Add fill edges introduced by eliminating X Happy Job Elimination order: {C,D,I,S,L,H,J,G} Weighted version: Consider size of factor rather than number of edges (C) Dhruv Batra Slide Credit: Carlos Guestrin 29

Demo http://www.cs.us.es/~cgdiaz/cispace/bayes.html (C) Dhruv Batra 30

BN: Exact Inference: What you need to know Types of queries Conditional probabilities / Marginals maximum a posteriori (MAP) Marginal-MAP Different queries give different answers Hardness of inference Exact and approximate inference are NP-hard MAP is NP-complete Conditional Probabilities #P-complete Marginal-MAP is much harder (NP PP -complete) Variable elimination algorithm Eliminate a variable: Combine factors that include this var into single factor Marginalize/Maximize var from new factor Efficient algorithm ( only exponential in induced-width, not number of variables) If you hear: Exact inference only efficient in tree graphical models You say: No! Any graph with low induced width Elimination order is important! NP-complete problem Many good heuristics (C) Dhruv Batra 31

Main Issues in PGMs Representation How do we store P(X 1, X 2,, X n ) What does my model mean/imply/assume? (Semantics) Learning How do we learn parameters and structure of P(X 1, X 2,, X n ) from data? What model is the right for my data? Inference How do I answer questions/queries with my model? such as Marginal Estimation: P(X 5 X 1, X 4 ) Most Probable Explanation: argmax P(X 1, X 2,, X n ) (C) Dhruv Batra 32

New Topic: Markov Nets / MRFs Markov chains Directed Factor Graph Dynamic Bayes nets HMM Graphical Models Directed Bayesian Networks Latent variable models LDS Discrete Mixture models clustering Influence diagrams Continuous dimenreduct Undirected Graphs Chain Graphs Decision theory Strong JT overcomplete repres. Markov network input dependent Factor Graphs Clique Graphs CRF Pairwise Boltz. machine (disc.) Clique tree Junction tree Gauss. Process (cont) (C) Dhruv Batra Image Credit: David Barber 33

Synonyms Markov Networks Markov Random Fields Gibbs Distribution In vision literature MAP inference in MRFs = Energy Minimization (C) Dhruv Batra 34

A general Bayes net Set of random variables Flu Allergy Directed acyclic graph Encodes independence assumptions Headache Sinus Nose CPTs Conditional Probability Tables Joint distribution: (C) Dhruv Batra 35

Markov Nets Set of random variables Undirected graph Encodes independence assumptions Unnormalized Factor Tables Joint distribution: Product of Factors (C) Dhruv Batra 36

Local Structures in BNs Causal Trail X à Y à Z Evidential Trail X ß Y ß Z Common Cause X ß Y à Z Common Effect (v-structure) X à Y ß Z (C) Dhruv Batra 37

Local Structures in MNs On board (C) Dhruv Batra 38

Active Trails and Separation A path X 1 X k is active when set of variables Z are observed if none of X i {X 1,,X k } are observed (are part of Z) Variables X are separated from Y given Z in graph If no active path between any X X and any Y Y given Z (C) Dhruv Batra 39

Independence Assumptions in MNs Separation defines global independencies Pairwise Markov Independence: Pairs of non-adjacent variables A,B are independent given all others T 3 T 1 T 4 T 2 T 5 T 6 Markov Blanket: Variable A independent of rest given its neighbors T 7 T 8 T 9 (C) Dhruv Batra Slide Credit: Carlos Guestrin 40