Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Size: px
Start display at page:

Download "Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos"

Transcription

1 Learning the Structure of Sum-Product Networks Robert Gens Pedro Domingos

2 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning

3 Graphical Models Representation Inference Learning Compact and expressive Global independence Exponential in treewidth of graph Extremely difficult because: Learning requires inference Approximate inference is unreliable Hidden variables no global optimum

4 Sum-Product Networks Representation Inference Compact and expressive Local independence Linear in number of edges Learning Much easier because exact inference Only weight learning to date

5 This Paper: SPN Structure Learning General-purpose SPN structure learning Discrete or continuous Learns layers of hidden variables Fully leverages context-specific independence Simple and intuitive 1-3 orders of magnitude faster, and more accurate at query time

6 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning

7 Compactly Representable Probability Distributions Graphical Models Sum-Product Networks Existing Tractable Models

8 Compactly Representable Probability Distributions Graphical Models Sum-Product Networks Existing Exact inference in time Tractable linear in network size Models

9 A Univariate Distribution Is an SPN. X

10 A Univariate Distribution Is an SPN. X... Multinomial Gaussian Poisson

11 A Univariate Distribution Is an SPN. X

12 X

13 A Product of SPNs over Disjoint Variables Is an SPN. X Y

14 A Product of SPNs over Disjoint Variables Is an SPN. X Y

15 X Y

16 A Weighted Sum of SPNs over the Same Variables Is an SPN. w 1 w 2 X Y X Y

17 A Weighted Sum of SPNs over the Same Variables Is an SPN. w 1 w 2 Sums out a mixture variable X Y X Y

18 X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6

19 All Marginals Are Computable in Linear Time

20 All Marginals Are Computable in Linear Time X Y

21 All Marginals Are Computable in Linear Time w 1 w 2 X Y X Y

22 All Marginals Are Computable in Linear Time X Y X Y

23 All Marginals Are Computable in Linear Time? X Y X Y

24 All Marginals Are Computable in Linear Time? X Y X Y Evidence Evidence

25 All Marginals Are Computable in Linear Time? X Y X Y Evidence Evidence

26 All Marginals Are Computable in Linear Time? X Y X Y Evidence Marginalize Evidence Marginalize

27 All Marginals Are Computable in Linear Time? X Y X Y Evidence Marginalize Evidence Marginalize

28 All Marginals Are Computable in Linear Time = X Y X Y Evidence Marginalize Evidence Marginalize

29 X Y X Y

30 All MAP States Are Computable in Linear Time X Y X Y

31 All MAP States Are Computable in Linear Time max X Y X Y

32 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y

33 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y Evidence Evidence

34 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y Evidence Mode Evidence Mode

35 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y Evidence Mode Evidence Mode

36 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode

37 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode

38 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode

39 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode

40 What Does an SPN Mean? Products = Features Sums = Clusterings

41 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel

42 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel Calf Cow Bull Person patting Person on bike

43 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel Calf Cow Bull Person patting Person on bike Tail Body Legs Head Arms Body Legs Head

44 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel Calf Cow Bull Person patting Person on bike Tail Body Legs Head Arms Body Legs Head

45 What Does an SPN Mean? Products = Features Sums = Clusterings Scene decomposition Hay Greenery Truck Sign Flowers Mixture Gravel Calf Cow Bull Person patting Person on bike Pooling Tail Body Legs Head Arms Body Legs Head Part decomposition

46 Special Cases Hierarchical mixture models Thin junction trees (e.g.: hidden Markov models) Non-recursive probabilistic context-free grammars Etc.

47 Weight Learning Generative (Poon & Domingos, UAI 2011) UAI 2011 Best Paper Award Discriminative (Gens & Domingos, NIPS 2012) NIPS 2012 Outstanding Student Paper Award

48 Weight Learning Generative (Poon & Domingos, UAI 2011) UAI 2011 Best Paper Award Discriminative (Gens & Domingos, NIPS 2012) NIPS 2012 Outstanding Student Paper Award Key limitation: Requires a structure

49 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning

50 LearnSPN Recursive algorithm Instances Variables

51 LearnSPN Recursive algorithm Instances Variables

52 LearnSPN Return: Instances Multinomial Gaussian X 1 Poisson

53 Instances Variables

54 Establish approximate independence Instances Variables

55 Establish approximate independence Instances Variables Return: Recurse Recurse Recurse

56 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Recurse Recurse Recurse

57 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse

58 If no detectable dependencies Instances Variables Return: Fully factorized R... R... R... R... R... distribution

59 If no detectable dependencies Instances Variables Return: Fully factorized distribution

60 Instances Variables If never finds independence Return: w 1 w 7 R... R... R... R... R... Kernel density estimate

61 Instances If never finds independence Variables Return: w 1 w 2 w 7 Kernel density estimate

62 Establish approximate independence Return: Instances Variables If no independence, cluster similar instances Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse

63 Establish approximate independence Return: Instances If no independence, cluster similar instances Return: w 1 w 2 w 3 Recurse Recurse Recurse Variables Recurse Recurse Recurse

64 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse Recurse Recurse

65 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse Recurse Recurse

66 Instances Variables

67 Instances R... R... Variables

68 Instances R... R... R... R... R... Variables

69 Instances R... R... R... R... R... R... R... R... R... Variables

70 Instances R... R... R... R... R... R... R... R... R... Variables High treewidth

71 C Instances C Variables High treewidth

72 Instances C Variables High treewidth

73 Training LearnSPN set Establish approximate independence Return: Instances Variables If no independence, cluster similar instances Return: w 1 w 2 w 3 Recurse Recurse Recurse If V =1, Return: Recurse Recurse Recurse

74 Variable Splits ( ) Pairwise independence test p-value (validation set) X 6 X 7 X 5 X 1 X 4 X 2 X 3

75 Variable Splits ( ) Pairwise independence test p-value (validation set) X 6 X 7 X 5 X 1 X 4 X 2 X 3

76 Variable Splits ( ) Pairwise independence test p-value (validation set) Recurse Recurse X 6 X 5 X 7 X 1 X 2 X 3 X 4

77 Variable Splits ( ) Pairwise independence test p-value (validation set) X 6 X 7 X 5 X 1 X 4 X 2 X 3

78 Instance Clustering (+) X 6 X 7 X 5 X 1 X 4 X 2 X 3

79 Instance Clustering (+) X 1 X 2 X 3 X 4 X 5 X 6 X 7

80 Instance Clustering (+) m 1 m 2 m 3 m 4 m 5 m 6 m 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7

81 Instance Clustering (+) m 1 m 2 m 3 m 4 m 5 m 6 m 7

82 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) m 1 m 2 m 3 m 4 m 5 m 6 m 7

83 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) m 1 m 2 m 3 m 4 m 5 m 6 m 7 C 1 C 2 C 3

84 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) m 1 m 2 m 3 m 4 m 5 m 6 m 7 C 1 C 2 C 3

85 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) Recurse Recurse Recurse m 3 m 4 m 1 m 2 C 2 C 1 m 5 m 6 C 3 m 7

86 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) Recurse Recurse Recurse 2 7 m 3 m 4 m 1 m 2 C 2 C 1 m 5 m 6 C 3 m 7

87 LearnSPN Locally Optimizes Likelihood

88 LearnSPN Locally Optimizes Likelihood X i X 1 X 2 X j No loss of likelihood if truly independent X k

89 LearnSPN Locally Optimizes Likelihood X i X 1 X 2 X j No loss of likelihood if truly independent X k Naive Bayes likelihood LearnSPN likelihood w 1 w 2 w 1 w 2 Recurse Recurse

90 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning

91 Experiments 20 Datasets collaborative filtering click-through logs nucleic acid sequences... Representation Learning Inference SPN LearnSPN Exact Bayesian Gibbs WinMine Network Loopy BP Instances 2k-388k Variables Markov Network Della Pietra L1 Gibbs Loopy BP Gibbs Loopy BP Total: 1680 Experiments

92 Learning Results SPN Wins (p=0.05) SPN Ties SPN Losses WinMine Della Pietra L1 Log-likelihood Pseudo Log-likelihood

93 Learning Results SPN Wins (p=0.05) SPN Ties SPN Losses WinMine Della Pietra L1 Log-likelihood Pseudo Log-likelihood

94 Inference

95 P( Query Evidence ) Inference

96 Inference P( Query Evidence ) Q E 10% 30% 20% 30% 30% 30% 40% 30% 50% 30% 30% 0% 30% 10% 30% 20% 30% 40% 30% 50% For each proportion, generate 1000 queries from test set

97 Inference P( Query Evidence ) Q E 10% 30% 20% 30% 30% 30% 40% 30% 50% 30% 30% 0% 30% 10% 30% 20% 30% 40% 30% 50% { WinMine Della Pietra L1 X Gibbs {Loopy BP { X 20 Datasets X { 10 Variable proportions = 1200 Experiments (In addition to 400 for SPNs) For each proportion, generate 1000 queries from test set

98 Inference Accuracy EachMovie, 500 variables SPN Conditional Log-Likelihood WinMine L % 20% 30% 40% 50% Della Pietra Fraction of Query Variables

99 Inference Time Gibbs ms/query WinMine Della Pietra L1 SPN ms/query

100 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query

101 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query

102 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query

103 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query

104 Inference Time Gibbs ms/query WinMine Della Pietra L1 1 SPN is 1-3 orders of magnitude faster SPN ms/query

105 Inference Time Loopy Belief Propagation 1000 Single variable marginals Gibbs ms/query { 100 WinMine Della Pietra L1 1 { X 20 Datasets X 10 WinMine Variable proportions Della Pietra L1 10 = 600 Experiments 10x 30x SPNs had higher conditional marginal log-likelihood on 78% of experiments 0.1 (95% higher CLL vs. Gibbs) SPN ms/query

106 Experiment Conclusions SPN learning accuracy is comparable SPN exact inference is 1-3 orders of magnitude faster, and more accurate Inference does not involve tuning settings or diagnosing convergence Inference takes a predictable amount of time Can now apply SPNs to many domains

107 Future Work Other SPN structure and weight learning algorithms Approximating intractable distributions with SPNs Parallelizing SPN learning and inference Code and supplemental results available at spn.cs.washington.edu/learnspn/

Learning the Structure of Sum-Product Networks

Learning the Structure of Sum-Product Networks Robert Gens rcg@cs.washington.edu Pedro Domingos pedrod@cs.washington.edu Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA Abstract Sum-product networks (SPNs)

More information

Learning Tractable Probabilistic Models Pedro Domingos

Learning Tractable Probabilistic Models Pedro Domingos Learning Tractable Probabilistic Models Pedro Domingos Dept. Computer Science & Eng. University of Washington 1 Outline Motivation Probabilistic models Standard tractable models The sum-product theorem

More information

A Co-Clustering approach for Sum-Product Network Structure Learning

A Co-Clustering approach for Sum-Product Network Structure Learning Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Group A Co-Clustering approach for Sum-Product Network Antonio Vergari Nicola Di Mauro Floriana Esposito December 8, 2014

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias Inference Complexity s Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Don t use model complexity as your learning bias Use inference complexity. Joint work with

More information

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

More information

Expectation Propagation

Expectation Propagation Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,

More information

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM) Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,

More information

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian

More information

Collective classification in network data

Collective classification in network data 1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Dynamic Bayesian network (DBN)

Dynamic Bayesian network (DBN) Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Joseph Gonzalez Yucheng Low Arthur Gretton Carlos Guestrin Draw Samples Sampling as an Inference Procedure Suppose we wanted to know the

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Learning Markov Networks With Arithmetic Circuits

Learning Markov Networks With Arithmetic Circuits Daniel Lowd and Amirmohammad Rooshenas Department of Computer and Information Science University of Oregon Eugene, OR 97403 {lowd,pedram}@cs.uoregon.edu Abstract Markov networks are an effective way to

More information

Learning Markov Networks With Arithmetic Circuits

Learning Markov Networks With Arithmetic Circuits Daniel Lowd and Amirmohammad Rooshenas Department of Computer and Information Science University of Oregon Eugene, OR 97403 {lowd,pedram}@cs.uoregon.edu Abstract Markov networks are an effective way to

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

Non-Parametric Bayesian Sum-Product Networks

Non-Parametric Bayesian Sum-Product Networks Sang-Woo Lee slee@bi.snu.ac.kr School of Computer Sci. and Eng., Seoul National University, Seoul 151-744, Korea Christopher J. Watkins c.j.watkins@rhul.ac.uk Department of Computer Science, Royal Holloway,

More information

Learning Arithmetic Circuits

Learning Arithmetic Circuits Learning Arithmetic Circuits Daniel Lowd and Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350, U.S.A. {lowd,pedrod}@cs.washington.edu Abstract

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Integrating Probabilistic Reasoning with Constraint Satisfaction

Integrating Probabilistic Reasoning with Constraint Satisfaction Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational

More information

Learning Arithmetic Circuits

Learning Arithmetic Circuits Learning Arithmetic Circuits Daniel Lowd and Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350, U.S.A. {lowd,pedrod}@cs.washington.edu Abstract

More information

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy

More information

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees OSU CS 536 Probabilistic Graphical Models Loopy Belief Propagation and Clique Trees / Join Trees Slides from Kevin Murphy s Graphical Model Tutorial (with minor changes) Reading: Koller and Friedman Ch

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

Developer s Guide for Libra 1.1.2d

Developer s Guide for Libra 1.1.2d Developer s Guide for Libra 1.1.2d Daniel Lowd Amirmohammad Rooshenas December 29, 2015 1 Introduction Libra s code was written to be efficient, concise, and

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft

More information

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models 1 Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models Divyanshu Vats and José M. F. Moura arxiv:1107.4067v2 [stat.ml] 18 Mar 2012 Abstract Graphical models use graphs to compactly

More information

Learning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks

Learning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks Learning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks Kaiyu Zheng, Andrzej Pronobis and Rajesh P. N. Rao Abstract We introduce Graph-Structured Sum-Product

More information

Cambridge Interview Technical Talk

Cambridge Interview Technical Talk Cambridge Interview Technical Talk February 2, 2010 Table of contents Causal Learning 1 Causal Learning Conclusion 2 3 Motivation Recursive Segmentation Learning Causal Learning Conclusion Causal learning

More information

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs

More information

COS 513: Foundations of Probabilistic Modeling. Lecture 5

COS 513: Foundations of Probabilistic Modeling. Lecture 5 COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

Lecture 11: Clustering Introduction and Projects Machine Learning

Lecture 11: Clustering Introduction and Projects Machine Learning Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1 Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1 Today Clustering Project

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Day 3 Lecture 1. Unsupervised Learning

Day 3 Lecture 1. Unsupervised Learning Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Inference Exact: VE Exact+Approximate: BP Readings: Barber 5 Dhruv Batra

More information

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm OU 56 Probabilistic Graphical Models Loopy Belief Propagation and lique Trees / Join Trees lides from Kevin Murphy s Graphical Model Tutorial (with minor changes) eading: Koller and Friedman h 0 Part I:

More information

Mean Field and Variational Methods finishing off

Mean Field and Variational Methods finishing off Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708

More information

Message-passing algorithms (continued)

Message-passing algorithms (continued) Message-passing algorithms (continued) Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Graphs with loops We saw that Belief Propagation can exactly optimize

More information

Variational Methods for Discrete-Data Latent Gaussian Models

Variational Methods for Discrete-Data Latent Gaussian Models Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian

More information

CREDAL SUM-PRODUCT NETWORKS

CREDAL SUM-PRODUCT NETWORKS CREDAL SUM-PRODUCT NETWORKS Denis Mauá Fabio Cozman Universidade de São Paulo Brazil Diarmaid Conaty Cassio de Campos Queen s University Belfast United Kingdom ISIPTA 2017 DEEP PROBABILISTIC GRAPHICAL

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 8 Junction Trees CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9) 4

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Mean Field and Variational Methods finishing off

Mean Field and Variational Methods finishing off Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

3 : Representation of Undirected GMs

3 : Representation of Undirected GMs 0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

More information

Algorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks

Algorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2016-12-01 Algorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks Aaron W. Dennis Brigham Young

More information

Information theory tools to rank MCMC algorithms on probabilistic graphical models

Information theory tools to rank MCMC algorithms on probabilistic graphical models Information theory tools to rank MCMC algorithms on probabilistic graphical models Firas Hamze, Jean-Noel Rivasseau and Nando de Freitas Computer Science Department University of British Columbia Email:

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks Agastya Kalra, Abdullah Rashwan, Wilson Hsu, Pascal Poupart Cheriton School of Computer Science, Waterloo AI Institute, University

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Markov Logic: Representation

Markov Logic: Representation Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into

More information

Algorithms for Markov Random Fields in Computer Vision

Algorithms for Markov Random Fields in Computer Vision Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell

More information

Tree-structured approximations by expectation propagation

Tree-structured approximations by expectation propagation Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts

More information

Predicting 3D People from 2D Pictures

Predicting 3D People from 2D Pictures Predicting 3D People from 2D Pictures Leonid Sigal Michael J. Black Department of Computer Science Brown University http://www.cs.brown.edu/people/ls/ CIAR Summer School August 15-20, 2006 Leonid Sigal

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Contrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields

Contrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields Contrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields Ni Lao, Jun Zhu Carnegie Mellon University 5 Forbes Avenue, Pittsburgh, PA 15213 {nlao,junzhu}@cs.cmu.edu Abstract

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

Introduction to Bayesian networks

Introduction to Bayesian networks PhD seminar series Probabilistics in Engineering : Bayesian networks and Bayesian hierarchical analysis in engineering Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Introduction to Bayesian

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter

More information

arxiv: v1 [cs.lg] 17 Jan 2019

arxiv: v1 [cs.lg] 17 Jan 2019 Noname manuscript No. (will be inserted by the editor) Learning Tractable Probabilistic Models in Open Worlds Amélie Levray Vaishak Belle arxiv:1901.05847v1 [cs.lg] 17 Jan 2019 Received: date/ Accepted:

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials

Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials José A. Gámez Computing Systems Department Intelligent Systems and Data Mining Group i 3 A University of Castilla-La

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

6 : Factor Graphs, Message Passing and Junction Trees

6 : Factor Graphs, Message Passing and Junction Trees 10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Bayesian Networks Inference

Bayesian Networks Inference Bayesian Networks Inference Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 5 th, 2007 2005-2007 Carlos Guestrin 1 General probabilistic inference Flu Allergy Query: Sinus

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Lin Liao Dieter Fox Henry Kautz Department of Computer Science & Engineering University of Washington Seattle,

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information