Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos
|
|
- Hope Bryant
- 5 years ago
- Views:
Transcription
1 Learning the Structure of Sum-Product Networks Robert Gens Pedro Domingos
2 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning
3 Graphical Models Representation Inference Learning Compact and expressive Global independence Exponential in treewidth of graph Extremely difficult because: Learning requires inference Approximate inference is unreliable Hidden variables no global optimum
4 Sum-Product Networks Representation Inference Compact and expressive Local independence Linear in number of edges Learning Much easier because exact inference Only weight learning to date
5 This Paper: SPN Structure Learning General-purpose SPN structure learning Discrete or continuous Learns layers of hidden variables Fully leverages context-specific independence Simple and intuitive 1-3 orders of magnitude faster, and more accurate at query time
6 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning
7 Compactly Representable Probability Distributions Graphical Models Sum-Product Networks Existing Tractable Models
8 Compactly Representable Probability Distributions Graphical Models Sum-Product Networks Existing Exact inference in time Tractable linear in network size Models
9 A Univariate Distribution Is an SPN. X
10 A Univariate Distribution Is an SPN. X... Multinomial Gaussian Poisson
11 A Univariate Distribution Is an SPN. X
12 X
13 A Product of SPNs over Disjoint Variables Is an SPN. X Y
14 A Product of SPNs over Disjoint Variables Is an SPN. X Y
15 X Y
16 A Weighted Sum of SPNs over the Same Variables Is an SPN. w 1 w 2 X Y X Y
17 A Weighted Sum of SPNs over the Same Variables Is an SPN. w 1 w 2 Sums out a mixture variable X Y X Y
18 X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6
19 All Marginals Are Computable in Linear Time
20 All Marginals Are Computable in Linear Time X Y
21 All Marginals Are Computable in Linear Time w 1 w 2 X Y X Y
22 All Marginals Are Computable in Linear Time X Y X Y
23 All Marginals Are Computable in Linear Time? X Y X Y
24 All Marginals Are Computable in Linear Time? X Y X Y Evidence Evidence
25 All Marginals Are Computable in Linear Time? X Y X Y Evidence Evidence
26 All Marginals Are Computable in Linear Time? X Y X Y Evidence Marginalize Evidence Marginalize
27 All Marginals Are Computable in Linear Time? X Y X Y Evidence Marginalize Evidence Marginalize
28 All Marginals Are Computable in Linear Time = X Y X Y Evidence Marginalize Evidence Marginalize
29 X Y X Y
30 All MAP States Are Computable in Linear Time X Y X Y
31 All MAP States Are Computable in Linear Time max X Y X Y
32 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y
33 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y Evidence Evidence
34 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y Evidence Mode Evidence Mode
35 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h)? max X Y X Y Evidence Mode Evidence Mode
36 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode
37 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode
38 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode
39 All MAP States Are Computable in Linear Time max y,h P (X =0,Y = y, H = h) = 0.12 max X Y X Y Evidence Mode Evidence Mode
40 What Does an SPN Mean? Products = Features Sums = Clusterings
41 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel
42 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel Calf Cow Bull Person patting Person on bike
43 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel Calf Cow Bull Person patting Person on bike Tail Body Legs Head Arms Body Legs Head
44 What Does an SPN Mean? Products = Features Sums = Clusterings Hay Greenery Truck Sign Flowers Gravel Calf Cow Bull Person patting Person on bike Tail Body Legs Head Arms Body Legs Head
45 What Does an SPN Mean? Products = Features Sums = Clusterings Scene decomposition Hay Greenery Truck Sign Flowers Mixture Gravel Calf Cow Bull Person patting Person on bike Pooling Tail Body Legs Head Arms Body Legs Head Part decomposition
46 Special Cases Hierarchical mixture models Thin junction trees (e.g.: hidden Markov models) Non-recursive probabilistic context-free grammars Etc.
47 Weight Learning Generative (Poon & Domingos, UAI 2011) UAI 2011 Best Paper Award Discriminative (Gens & Domingos, NIPS 2012) NIPS 2012 Outstanding Student Paper Award
48 Weight Learning Generative (Poon & Domingos, UAI 2011) UAI 2011 Best Paper Award Discriminative (Gens & Domingos, NIPS 2012) NIPS 2012 Outstanding Student Paper Award Key limitation: Requires a structure
49 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning
50 LearnSPN Recursive algorithm Instances Variables
51 LearnSPN Recursive algorithm Instances Variables
52 LearnSPN Return: Instances Multinomial Gaussian X 1 Poisson
53 Instances Variables
54 Establish approximate independence Instances Variables
55 Establish approximate independence Instances Variables Return: Recurse Recurse Recurse
56 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Recurse Recurse Recurse
57 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse
58 If no detectable dependencies Instances Variables Return: Fully factorized R... R... R... R... R... distribution
59 If no detectable dependencies Instances Variables Return: Fully factorized distribution
60 Instances Variables If never finds independence Return: w 1 w 7 R... R... R... R... R... Kernel density estimate
61 Instances If never finds independence Variables Return: w 1 w 2 w 7 Kernel density estimate
62 Establish approximate independence Return: Instances Variables If no independence, cluster similar instances Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse
63 Establish approximate independence Return: Instances If no independence, cluster similar instances Return: w 1 w 2 w 3 Recurse Recurse Recurse Variables Recurse Recurse Recurse
64 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse Recurse Recurse
65 Establish approximate independence Instances If no independence, cluster similar instances Variables Return: Return: w 1 w 2 w 3 Recurse Recurse Recurse Recurse Recurse Recurse Recurse Recurse
66 Instances Variables
67 Instances R... R... Variables
68 Instances R... R... R... R... R... Variables
69 Instances R... R... R... R... R... R... R... R... R... Variables
70 Instances R... R... R... R... R... R... R... R... R... Variables High treewidth
71 C Instances C Variables High treewidth
72 Instances C Variables High treewidth
73 Training LearnSPN set Establish approximate independence Return: Instances Variables If no independence, cluster similar instances Return: w 1 w 2 w 3 Recurse Recurse Recurse If V =1, Return: Recurse Recurse Recurse
74 Variable Splits ( ) Pairwise independence test p-value (validation set) X 6 X 7 X 5 X 1 X 4 X 2 X 3
75 Variable Splits ( ) Pairwise independence test p-value (validation set) X 6 X 7 X 5 X 1 X 4 X 2 X 3
76 Variable Splits ( ) Pairwise independence test p-value (validation set) Recurse Recurse X 6 X 5 X 7 X 1 X 2 X 3 X 4
77 Variable Splits ( ) Pairwise independence test p-value (validation set) X 6 X 7 X 5 X 1 X 4 X 2 X 3
78 Instance Clustering (+) X 6 X 7 X 5 X 1 X 4 X 2 X 3
79 Instance Clustering (+) X 1 X 2 X 3 X 4 X 5 X 6 X 7
80 Instance Clustering (+) m 1 m 2 m 3 m 4 m 5 m 6 m 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7
81 Instance Clustering (+) m 1 m 2 m 3 m 4 m 5 m 6 m 7
82 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) m 1 m 2 m 3 m 4 m 5 m 6 m 7
83 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) m 1 m 2 m 3 m 4 m 5 m 6 m 7 C 1 C 2 C 3
84 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) m 1 m 2 m 3 m 4 m 5 m 6 m 7 C 1 C 2 C 3
85 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) Recurse Recurse Recurse m 3 m 4 m 1 m 2 C 2 C 1 m 5 m 6 C 3 m 7
86 Instance Clustering (+) Online hard EM Naive Bayes mixture model Cluster penalty (validation set) P (V )= X i Learned weights are the mixture priors P (C i ) Y j P (X j C i ) Recurse Recurse Recurse 2 7 m 3 m 4 m 1 m 2 C 2 C 1 m 5 m 6 C 3 m 7
87 LearnSPN Locally Optimizes Likelihood
88 LearnSPN Locally Optimizes Likelihood X i X 1 X 2 X j No loss of likelihood if truly independent X k
89 LearnSPN Locally Optimizes Likelihood X i X 1 X 2 X j No loss of likelihood if truly independent X k Naive Bayes likelihood LearnSPN likelihood w 1 w 2 w 1 w 2 Recurse Recurse
90 w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning
91 Experiments 20 Datasets collaborative filtering click-through logs nucleic acid sequences... Representation Learning Inference SPN LearnSPN Exact Bayesian Gibbs WinMine Network Loopy BP Instances 2k-388k Variables Markov Network Della Pietra L1 Gibbs Loopy BP Gibbs Loopy BP Total: 1680 Experiments
92 Learning Results SPN Wins (p=0.05) SPN Ties SPN Losses WinMine Della Pietra L1 Log-likelihood Pseudo Log-likelihood
93 Learning Results SPN Wins (p=0.05) SPN Ties SPN Losses WinMine Della Pietra L1 Log-likelihood Pseudo Log-likelihood
94 Inference
95 P( Query Evidence ) Inference
96 Inference P( Query Evidence ) Q E 10% 30% 20% 30% 30% 30% 40% 30% 50% 30% 30% 0% 30% 10% 30% 20% 30% 40% 30% 50% For each proportion, generate 1000 queries from test set
97 Inference P( Query Evidence ) Q E 10% 30% 20% 30% 30% 30% 40% 30% 50% 30% 30% 0% 30% 10% 30% 20% 30% 40% 30% 50% { WinMine Della Pietra L1 X Gibbs {Loopy BP { X 20 Datasets X { 10 Variable proportions = 1200 Experiments (In addition to 400 for SPNs) For each proportion, generate 1000 queries from test set
98 Inference Accuracy EachMovie, 500 variables SPN Conditional Log-Likelihood WinMine L % 20% 30% 40% 50% Della Pietra Fraction of Query Variables
99 Inference Time Gibbs ms/query WinMine Della Pietra L1 SPN ms/query
100 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query
101 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query
102 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query
103 Inference Time Gibbs ms/query WinMine Della Pietra L SPN ms/query
104 Inference Time Gibbs ms/query WinMine Della Pietra L1 1 SPN is 1-3 orders of magnitude faster SPN ms/query
105 Inference Time Loopy Belief Propagation 1000 Single variable marginals Gibbs ms/query { 100 WinMine Della Pietra L1 1 { X 20 Datasets X 10 WinMine Variable proportions Della Pietra L1 10 = 600 Experiments 10x 30x SPNs had higher conditional marginal log-likelihood on 78% of experiments 0.1 (95% higher CLL vs. Gibbs) SPN ms/query
106 Experiment Conclusions SPN learning accuracy is comparable SPN exact inference is 1-3 orders of magnitude faster, and more accurate Inference does not involve tuning settings or diagnosing convergence Inference takes a predictable amount of time Can now apply SPNs to many domains
107 Future Work Other SPN structure and weight learning algorithms Approximating intractable distributions with SPNs Parallelizing SPN learning and inference Code and supplemental results available at spn.cs.washington.edu/learnspn/
Learning the Structure of Sum-Product Networks
Robert Gens rcg@cs.washington.edu Pedro Domingos pedrod@cs.washington.edu Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA Abstract Sum-product networks (SPNs)
More informationLearning Tractable Probabilistic Models Pedro Domingos
Learning Tractable Probabilistic Models Pedro Domingos Dept. Computer Science & Eng. University of Washington 1 Outline Motivation Probabilistic models Standard tractable models The sum-product theorem
More informationA Co-Clustering approach for Sum-Product Network Structure Learning
Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Group A Co-Clustering approach for Sum-Product Network Antonio Vergari Nicola Di Mauro Floriana Esposito December 8, 2014
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationInference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias
Inference Complexity s Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Don t use model complexity as your learning bias Use inference complexity. Joint work with
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationExpectation Propagation
Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationCollective classification in network data
1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationDynamic Bayesian network (DBN)
Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies
More informationProbabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation
Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity
More informationParallel Gibbs Sampling From Colored Fields to Thin Junction Trees
Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Joseph Gonzalez Yucheng Low Arthur Gretton Carlos Guestrin Draw Samples Sampling as an Inference Procedure Suppose we wanted to know the
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationLearning Markov Networks With Arithmetic Circuits
Daniel Lowd and Amirmohammad Rooshenas Department of Computer and Information Science University of Oregon Eugene, OR 97403 {lowd,pedram}@cs.uoregon.edu Abstract Markov networks are an effective way to
More informationLearning Markov Networks With Arithmetic Circuits
Daniel Lowd and Amirmohammad Rooshenas Department of Computer and Information Science University of Oregon Eugene, OR 97403 {lowd,pedram}@cs.uoregon.edu Abstract Markov networks are an effective way to
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More informationNon-Parametric Bayesian Sum-Product Networks
Sang-Woo Lee slee@bi.snu.ac.kr School of Computer Sci. and Eng., Seoul National University, Seoul 151-744, Korea Christopher J. Watkins c.j.watkins@rhul.ac.uk Department of Computer Science, Royal Holloway,
More informationLearning Arithmetic Circuits
Learning Arithmetic Circuits Daniel Lowd and Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350, U.S.A. {lowd,pedrod}@cs.washington.edu Abstract
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationIntegrating Probabilistic Reasoning with Constraint Satisfaction
Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational
More informationLearning Arithmetic Circuits
Learning Arithmetic Circuits Daniel Lowd and Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350, U.S.A. {lowd,pedrod}@cs.washington.edu Abstract
More informationCS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees
CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy
More informationOSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees
OSU CS 536 Probabilistic Graphical Models Loopy Belief Propagation and Clique Trees / Join Trees Slides from Kevin Murphy s Graphical Model Tutorial (with minor changes) Reading: Koller and Friedman Ch
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More informationDeveloper s Guide for Libra 1.1.2d
Developer s Guide for Libra 1.1.2d Daniel Lowd Amirmohammad Rooshenas December 29, 2015 1 Introduction Libra s code was written to be efficient, concise, and
More informationProbabilistic Graphical Models
Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft
More informationFinding Non-overlapping Clusters for Generalized Inference Over Graphical Models
1 Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models Divyanshu Vats and José M. F. Moura arxiv:1107.4067v2 [stat.ml] 18 Mar 2012 Abstract Graphical models use graphs to compactly
More informationLearning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks
Learning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks Kaiyu Zheng, Andrzej Pronobis and Rajesh P. N. Rao Abstract We introduce Graph-Structured Sum-Product
More informationCambridge Interview Technical Talk
Cambridge Interview Technical Talk February 2, 2010 Table of contents Causal Learning 1 Causal Learning Conclusion 2 3 Motivation Recursive Segmentation Learning Causal Learning Conclusion Causal learning
More information18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos
Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationCOS 513: Foundations of Probabilistic Modeling. Lecture 5
COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationBayesian Networks Inference (continued) Learning
Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning
More informationLecture 11: Clustering Introduction and Projects Machine Learning
Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1 Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1 Today Clustering Project
More informationComparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Inference Exact: VE Exact+Approximate: BP Readings: Barber 5 Dhruv Batra
More informationPart I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm
OU 56 Probabilistic Graphical Models Loopy Belief Propagation and lique Trees / Join Trees lides from Kevin Murphy s Graphical Model Tutorial (with minor changes) eading: Koller and Friedman h 0 Part I:
More informationMean Field and Variational Methods finishing off
Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708
More informationMessage-passing algorithms (continued)
Message-passing algorithms (continued) Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Graphs with loops We saw that Belief Propagation can exactly optimize
More informationVariational Methods for Discrete-Data Latent Gaussian Models
Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian
More informationModeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA
Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian
More informationCREDAL SUM-PRODUCT NETWORKS
CREDAL SUM-PRODUCT NETWORKS Denis Mauá Fabio Cozman Universidade de São Paulo Brazil Diarmaid Conaty Cassio de Campos Queen s University Belfast United Kingdom ISIPTA 2017 DEEP PROBABILISTIC GRAPHICAL
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 8 Junction Trees CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9) 4
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationMean Field and Variational Methods finishing off
Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationAlgorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks
Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2016-12-01 Algorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks Aaron W. Dennis Brigham Young
More informationInformation theory tools to rank MCMC algorithms on probabilistic graphical models
Information theory tools to rank MCMC algorithms on probabilistic graphical models Firas Hamze, Jean-Noel Rivasseau and Nando de Freitas Computer Science Department University of British Columbia Email:
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationOnline Structure Learning for Feed-Forward and Recurrent Sum-Product Networks
Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks Agastya Kalra, Abdullah Rashwan, Wilson Hsu, Pascal Poupart Cheriton School of Computer Science, Waterloo AI Institute, University
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More informationMarkov Logic: Representation
Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into
More informationAlgorithms for Markov Random Fields in Computer Vision
Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationSummary: A Tutorial on Learning With Bayesian Networks
Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationMachine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell
More informationTree-structured approximations by expectation propagation
Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts
More informationPredicting 3D People from 2D Pictures
Predicting 3D People from 2D Pictures Leonid Sigal Michael J. Black Department of Computer Science Brown University http://www.cs.brown.edu/people/ls/ CIAR Summer School August 15-20, 2006 Leonid Sigal
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationContrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields
Contrastive Feature Induction for Efficient Structure Learning of Conditional Random Fields Ni Lao, Jun Zhu Carnegie Mellon University 5 Forbes Avenue, Pittsburgh, PA 15213 {nlao,junzhu}@cs.cmu.edu Abstract
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationIntroduction to Bayesian networks
PhD seminar series Probabilistics in Engineering : Bayesian networks and Bayesian hierarchical analysis in engineering Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Introduction to Bayesian
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationarxiv: v1 [cs.lg] 17 Jan 2019
Noname manuscript No. (will be inserted by the editor) Learning Tractable Probabilistic Models in Open Worlds Amélie Levray Vaishak Belle arxiv:1901.05847v1 [cs.lg] 17 Jan 2019 Received: date/ Accepted:
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationLearning and Recognizing Visual Object Categories Without First Detecting Features
Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationUnsupervised naive Bayes for data clustering with mixtures of truncated exponentials
Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials José A. Gámez Computing Systems Department Intelligent Systems and Data Mining Group i 3 A University of Castilla-La
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More information6 : Factor Graphs, Message Passing and Junction Trees
10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationBayesian Networks Inference
Bayesian Networks Inference Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 5 th, 2007 2005-2007 Carlos Guestrin 1 General probabilistic inference Flu Allergy Query: Sinus
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationExtracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Lin Liao Dieter Fox Henry Kautz Department of Computer Science & Engineering University of Washington Seattle,
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More information