A Co-Clustering approach for Sum-Product Network Structure Learning
|
|
- Christopher Walters
- 5 years ago
- Views:
Transcription
1 Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Group A Co-Clustering approach for Sum-Product Network Antonio Vergari Nicola Di Mauro Floriana Esposito December 8, 2014
2 Outline Introducing (SPNs) definition and properties inference Learning the structure of SPNs current state of the art affinities with hierarchical co-clustering sketching a proposal Experimentation experimental setting results Further Works and Conclusions
3 Probabilistic Graphical Models They can compactly represent joint probability distributions but inference is potentially intractable, exponential in the treewidth One of the bottlenecks is the computation of the partition function: Z = ϕ C (x C ) x X C C
4 X A tractable distribution is an SPN Robert Gens and Pedro Domingos [2013] ``Learning the Structure of '' In: ICML (3), pp
5 X X Y A tractable distribution is an SPN A product of SPNs over different scopes is an SPN (decomposability) Robert Gens and Pedro Domingos [2013] ``Learning the Structure of '' In: ICML (3), pp
6 w 1 w 2 X X Y X Y X Y A tractable distribution is an SPN A product of SPNs over different scopes is an SPN (decomposability) A weighted sum (w i 0) of SPN over the same scope is an SPN (completeness) Nothing else is an SPN Robert Gens and Pedro Domingos [2013] ``Learning the Structure of '' In: ICML (3), pp
7 w 1 w 2 X X Y X Y X Y A tractable distribution is an SPN A product of SPNs over different scopes is an SPN (decomposability) A weighted sum (w i 0) of SPN over the same scope is an SPN (completeness) Nothing else is an SPN They compactly encode the network polynomial over X (multilinear function) Robert Gens and Pedro Domingos [2013] ``Learning the Structure of '' In: ICML (3), pp
8 Inference Given an SPN S over rvs X, these measures are computable in time linear to edges(s) : the partition function of the distribution over X: Z = S( ) exact marginal probabilities for an evidence e: P r(e) = S(e)/S( ) MPE probability for an evidence e and query q (converting S into S max by replacing sum nodes with max nodes) MP E(q, e) = max P r(q, e) = q Smax (e)
9 Inference (example) To compute the marginal probability P r(x) = y Y P r(x, y): X Y X Y
10 Inference (example) To compute the marginal probability P r(x) = y Y P r(x, y): Set the leaf values for X and marginalize over Y by setting their probabilities to X Y X Y 081 1
11 Inference (example) To compute the marginal probability P r(x) = y Y P r(x, y): Set the leaf values for X and marginalize over Y by setting their probabilities to 1 Propagate probabilities (computing sums and products bottom-up) X Y X Y 081 1
12 Inference (example) To compute the marginal probability P r(x) = y Y P r(x, y): Set the leaf values for X and marginalize over Y by setting their probabilities to 1 Propagate probabilities (computing sums and products bottom-up) The exact values is the root value X Y X Y 081 1
13 Deep Architectures Exploit local interactions to save computations by layering a deep architecture X1 X1 X2 X2 X3 X3 X4 X4 X5 X1 X1 X2 X2 X3 X3 X5 X4 Learning a compact structure equals to discover these interactions X5 X5 X4
14 Building the network top down or bottom up by clustering features and/or instances: KMeans on features (k = 2, euclidean distance) to discover similarities, adding layers of sum nodes (fixed length, fully connected)[dennis and Ventura 2012] Merging feature regions bottom-up by a Bayesian-Dirichlet independence test, adding sum layers (fixed length) and reducing edges by maximizing Mutual Information (Information Bottleneck) [Peharz, Geiger, and Pernkopf 2013] LearnSPN: alternating splitting instances (clustering by similarity) and features (independence checking) in a greedy way Builds a tree-like SPN with univariate distributions at the leaves estimated from data [Gens and Domingos 2013]
15 LearnSPN Build a tree-like SPN that maximizes the log-likelihood of the data by alternating splits on instances and features Online Hard-EM with restarts to cluster instances T with an exp prior (P r(c i ) e λ C i X ) on clusters to avoid overfitting: P r(x) = C i C X j X P r(x j C i )P r(c i ) found clusters become sum node children with parameters w i = C i / T Features X are clustered into independent components via a greedy procedure based on a G Test over pairs X i,x j : G(X i, X j ) = 2 x i X i which are associated to product node children c(x i, x j ) log c(x i, x j ) T c(x x j X i )c(x j ) j If T < m consider features independent and create leaves by smooth laplacian frequence estimation (with parameter α)
16 LearnSPN (example) 1 X Y Z W
17 LearnSPN (example) 1 X Y Z W
18 LearnSPN (example) 1 X Y Z W Z W X Y Z W Z 8
19 A Co-Clustering Analogy In the end, LearnSPN builds a network by partitioning the data matrix into blocks by highlighting local interactions One could discover both row and column interactions by the means of a co-clustering technique (co-clusters as blocks in the partitioning) To preserve the decomposability and completeness of the resulting SPN: co-clusters shall be non overlapping the union of the co-clusters shall reconstruct the whole data matrix To have a deep architecture one needs a hierarchy of co-clusters Such several out-of-the box co-clustering algorithms could be used to build SPNs but rows are have not the same meaning of columns! (see caveat)
20 From a Hierarchical Co-Clustering to an SPN Considering a co-cluster hierarchy as represented by two cluster hierarchies (on rows and columns), the construction uses the two hierarchies and the matrix decomposition intuition from LearnSPN Traverse the hierarchies top-down for at most k max levels (to avoid overfitting) and : add a sum node over a split in the row hierarchy estimating the child weights as the proportions of instances in the split add a product node over a split in the column hierarchy if a split contains less than m instances add a product node over single column splits if a split contains only a feature, add a leaf node estimating the univariate distribution with a Laplacian smoothing
21 Hierarchies translation (example) {1, 2, 3, 4} {X, Y, Z, W} {1, 2} {3, 4} {X, Y, Z} {W} {1} {2} {3} {4} {X, Z} {Y} {X} {Z}
22 Hierarchies translation (example) {1, 2, 3, 4} {X, Y, Z, W} {1, 2} {3, 4} {X, Y, Z} {W} {1} {2} {3} {4} {X, Z} {Y} {X} {Z} {1, 2, 3, 4} {X, Y, Z, W}
23 Hierarchies translation (example) {1, 2, 3, 4} {X, Y, Z, W} {1, 2} {3, 4} {X, Y, Z} {W} {1} {2} {3} {4} {X, Z} {Y} {X} {Z} {1, 2} {3, 4} {X, Y, Z, W} {X, Y, Z, W}
24 Hierarchies translation (example) {1, 2, 3, 4} {X, Y, Z, W} {1, 2} {3, 4} {X, Y, Z} {W} {1} {2} {3} {4} {X, Z} {Y} {X} {Z} {1, 2} {1, 2} {3, 4} {3, 4} {X, Y, Z} {W} {X, Y, Z} {W}
25 Co-clustering & SPNs: caveats X Y Z W X Y Z W X Y Z W X Y X Y Z W Z W
26 HiCC-SPN I First co-clustering algorithm used in the experimentation: HiCC Hierarchical Incremental Co-Clustering [Ienco, Pensa, and Meo 2009] almost parameter-free (only the number of iterations to get initial clusters) optimizing an association measure, Goodman-Kruskal's τ divergence: τ CR C C = E C R E CR C C E CR builds an initial co-clustering then recursively splits each partition (yielding two hierarchies) using a Stochastic Local Search approach to find an optimal partitioning: moving an element (row or column) from one cluster to another (cluster creation or deletion are allowed) conditioning on the previous clustering on columns (and vice versa) using restarts for hierarchy levels to differentiate the search
27 HiCC-SPN II Adding a constraint to column splits to preserve independence by checking a conditional mutual information criterion Given a row clustering R, we allow feature x (column) to be moved from cluster C i to C j whether: 1 C i x i C i I(x; x i R) > 1 C j 1 x x j C j I(x; x j R) I(x; y R) = H(x R) H(x y, R) = p(r) p(x i, y j R) p(x i, y j R) log p(x R R x i x y j y i R)p(y j R) where p(r), the prior on the cluster R on the row partitioning R is estimated as the cluster size proportion and p(x i, y j R), p(x i R) and p(y j R) are estimated as the frequencies of the values seen in R
28 Comparing LearnSPN ans HiCC-SPN in the generative framework of graphical models structure learning [Gens and Domingos 2013]: comparing the average log-likelihood on predicting instances from a test set 11 different datasets with binary features, standard in PGMs comparisons [Lowd and Davis 2010] [Haaren and Davis 2012] ranging from classification, recommending, frequent pattern mining 16 to 500 features, 1600 to instances, 001 to 05 density Training 75% Validation 10% Test 15% (no cv) Model selection via grid search in this parameter space: for LearnSPN: λ {02, 06, 08}, ρ {5, 10, 20}, m {1, 100, 400}, α {001, 01, 10} for HiCC-SPN: k {1 : 10}, m {1, 100, 400}, α {0001, 001, 01}
29 Experiment #1: results X T tr T v T te LearnSPN HiCC-SPN NLTCS ± ±311 Plants ± ±970 Audio ± ±1847 Jester ± ±1082 Netflix ± ±590 Accidents ± ±640 Retail ± ±727 Pumsb-star ± ±1269 DNA ± ±67 Book ± ±4733 EachMovie ± ±5635
30 Conclusions and Further Works Translating a co-cluster hierarchy into an SPN could be promising, the exact and tractable inference could be derived given a row and column partitioning but not every co-clustering algorithm is directly usable, plus some representational issues have to be taken into account We are now investigating nested partition models [Rodriguez and Ghosh 2012] that allow for guillotine splits of the data matrix in order to better capture the underlying latent interactions and have deeper insights into SPN structure learning
31 References References Dennis, Aaron and Dan Ventura (2012) ``Learning the Architecture of Using Clustering on Varibles'' In: Advances in Neural Information Processing Systems 25 (NIPS 2012) (cit on p 14) Gens, Robert and Pedro Domingos (2013) ``Learning the Structure of '' In: ICML (3), pp (cit on pp 4 7, 14, 28) Haaren, Jan Van and Jesse Davis (2012) ``Markov Network : A Randomized Feature Generation Approach'' In: AAAI (cit on p 28) Ienco, Dino, Ruggero G Pensa, and Rosa Meo (2009) ``Parameter-Free Hierarchical Co-clustering by n-ary Splits'' In: ECML/PKDD (1) (cit on p 26) Lowd, Daniel and Jesse Davis (2010) ``Learning Markov Network Structure with Decision Trees'' In: Proceedings of the Twenty-seventh International Conference on Machine Learning (cit on p 28) Peharz, Robert, Bernhard Geiger, and Franz Pernkopf (2013) ``Greedy Part-Wise Learning of '' In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2013) (cit on p 14) Rodriguez, Abel and Kaushik Ghosh (2012) Nested Partition Models (Cit on p 30)
32 References Discussion 1 2 X Y Z W Z W X Y Z W Z {1, 2} {1, 2} {3, 4} {3, 4} {X, Y, Z} {W} {X, Y, Z} {W}
Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos
Learning the Structure of Sum-Product Networks Robert Gens Pedro Domingos w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning Graphical Models Representation Inference
More informationSimplifying, Regularizing and Strengthening Sum-Product Network Structure Learning
Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning Antonio Vergari and Nicola Di Mauro and Floriana Esposito University of Bari Aldo Moro, Bari, Italy {antonio.vergari,nicola.dimauro,floriana.esposito}@uniba.it
More informationLearning the Structure of Sum-Product Networks
Robert Gens rcg@cs.washington.edu Pedro Domingos pedrod@cs.washington.edu Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA Abstract Sum-product networks (SPNs)
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationInference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias
Inference Complexity s Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Don t use model complexity as your learning bias Use inference complexity. Joint work with
More informationGreedy Part-Wise Learning of Sum-Product Networks
Greedy Part-Wise Learning of Sum-Product Networks Robert Peharz, Bernhard C. Geiger and Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz, University of Technology Abstract. Sum-product
More informationAlgorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks
Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2016-12-01 Algorithms for Learning the Structure of Monotone and Nonmonotone Sum-Product Networks Aaron W. Dennis Brigham Young
More informationLearning Ensembles of Cutset Networks
Learning Ensembles of Cutset Networks Tahrima Rahman and Vibhav Gogate Department of Computer Science The University of Texas at Dallas Richardson, TX 75080, USA. {tahrima.rahman,vibhav.gogate}@utdallas.edu
More informationOnline Structure Learning for Feed-Forward and Recurrent Sum-Product Networks
Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks Agastya Kalra, Abdullah Rashwan, Wilson Hsu, Pascal Poupart Cheriton School of Computer Science, Waterloo AI Institute, University
More informationLearning Tractable Probabilistic Models Pedro Domingos
Learning Tractable Probabilistic Models Pedro Domingos Dept. Computer Science & Eng. University of Washington 1 Outline Motivation Probabilistic models Standard tractable models The sum-product theorem
More informationNon-Parametric Bayesian Sum-Product Networks
Sang-Woo Lee slee@bi.snu.ac.kr School of Computer Sci. and Eng., Seoul National University, Seoul 151-744, Korea Christopher J. Watkins c.j.watkins@rhul.ac.uk Department of Computer Science, Royal Holloway,
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationBayesian Networks Inference (continued) Learning
Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning
More informationLearning Markov Networks With Arithmetic Circuits
Daniel Lowd and Amirmohammad Rooshenas Department of Computer and Information Science University of Oregon Eugene, OR 97403 {lowd,pedram}@cs.uoregon.edu Abstract Markov networks are an effective way to
More informationComparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationLearning Markov Networks With Arithmetic Circuits
Daniel Lowd and Amirmohammad Rooshenas Department of Computer and Information Science University of Oregon Eugene, OR 97403 {lowd,pedram}@cs.uoregon.edu Abstract Markov networks are an effective way to
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationModeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA
Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................
More informationBumptrees for Efficient Function, Constraint, and Classification Learning
umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A
More informationMedical Image Segmentation Based on Mutual Information Maximization
Medical Image Segmentation Based on Mutual Information Maximization J.Rigau, M.Feixas, M.Sbert, A.Bardera, and I.Boada Institut d Informatica i Aplicacions, Universitat de Girona, Spain {jaume.rigau,miquel.feixas,mateu.sbert,anton.bardera,imma.boada}@udg.es
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationLearning the Structure of Probabilistic Sentential Decision Diagrams
Learning the Structure of Probabilistic Sentential Decision Diagrams Yitao Liang Computer Science Department University of California, Los Angeles yliang@cs.ucla.edu Jessa Bekker Computer Science Department
More informationarxiv: v1 [cs.lg] 12 Sep 2018
Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product Networks Martin Trapp 1 2 Robert Peharz 3 Carl E. Rasmussen 3 Franz Pernkopf 1 arxiv:1809.04400v1 [cs.lg] 12 Sep 2018 Abstract While
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationFinding Non-overlapping Clusters for Generalized Inference Over Graphical Models
1 Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models Divyanshu Vats and José M. F. Moura arxiv:1107.4067v2 [stat.ml] 18 Mar 2012 Abstract Graphical models use graphs to compactly
More informationImage Classification Using Sum-Product Networks for Autonomous Flight of Micro Aerial Vehicles
2016 5th Brazilian Conference on Intelligent Systems Image Classification Using Sum-Product Networks for Autonomous Flight of Micro Aerial Vehicles Bruno Massoni Sguerra Escola Politécnica Universidade
More informationCambridge Interview Technical Talk
Cambridge Interview Technical Talk February 2, 2010 Table of contents Causal Learning 1 Causal Learning Conclusion 2 3 Motivation Recursive Segmentation Learning Causal Learning Conclusion Causal learning
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More information1 Introduction Constraints Thesis outline Related Works 3
Abstract In order to perform complex actions in human environments, an autonomous robot needs the ability to understand the environment, that is, to gather and maintain spatial knowledge. Topological map
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan Joint work with Daniel M. Roy and Yee Whye Teh 1 Outline Background and Motivation Mondrian Forests Randomization mechanism Online
More informationCS 664 Flexible Templates. Daniel Huttenlocher
CS 664 Flexible Templates Daniel Huttenlocher Flexible Template Matching Pictorial structures Parts connected by springs and appearance models for each part Used for human bodies, faces Fischler&Elschlager,
More informationClustering Documents in Large Text Corpora
Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science
More informationarxiv: v1 [cs.lg] 13 Nov 2015
DYNAMIC SUM PRODUCT NETWORKS FOR TRACTABLE INFERENCE ON SEQUENCE DATA Mazen Melibari 1, Pascal Poupart 1, Prashant Doshi 2 1 David R. Cheriton School of Computer Science, University of Waterloo, Canada
More informationData Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science
Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 06 07 Department of CS - DM - UHD Road map Cluster Analysis: Basic
More informationGraphical Models for Resource- Constrained Hypothesis Testing and Multi-Modal Data Fusion
Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation Graphical Models for Resource- Constrained Hypothesis Testing and Multi-Modal Data Fusion MURI Annual
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationLearning and Recognizing Visual Object Categories Without First Detecting Features
Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationAlgorithms for Markov Random Fields in Computer Vision
Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden
More informationICA mixture models for image processing
I999 6th Joint Sy~nposiurn orz Neural Computation Proceedings ICA mixture models for image processing Te-Won Lee Michael S. Lewicki The Salk Institute, CNL Carnegie Mellon University, CS & CNBC 10010 N.
More informationLearning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks
Learning Semantic Maps with Topological Spatial Relations Using Graph-Structured Sum-Product Networks Kaiyu Zheng, Andrzej Pronobis and Rajesh P. N. Rao Abstract We introduce Graph-Structured Sum-Product
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationSTAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference
STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationSum-Product Autoencoding: Encoding and Decoding Representations using Sum-Product Networks
Sum-Product Autoencoding: Encoding and Decoding Representations using Sum-Product Networks Antonio Vergari antonio.vergari@uniba.it University of Bari, Italy Robert Peharz rp587@cam.ac.uk University of
More informationLearning Sum-Product Networks
Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Learning Sum-Product Networks Nicola Di Mauro Antonio Vergari International Conference on Probabilistic Graphical Models
More informationGeneralized Inverse Reinforcement Learning
Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationarxiv: v1 [stat.ml] 21 Aug 2017
SUM-PRODUCT GRAPHICAL MODELS Mattia Desana 1 and Christoph Schnörr 1,2. arxiv:1708.06438v1 [stat.ml] 21 Aug 2017 1 Heidelberg Collaboratory for Image Processing (HCI) 2 Image and Pattern Analysis Group
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationRandomized Algorithms for Fast Bayesian Hierarchical Clustering
Randomized Algorithms for Fast Bayesian Hierarchical Clustering Katherine A. Heller and Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College ondon, ondon, WC1N 3AR, UK {heller,zoubin}@gatsby.ucl.ac.uk
More informationAn Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationApplied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University
Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural
More informationTree of Latent Mixtures for Bayesian Modelling and Classification of High Dimensional Data
Technical Report No. 2005-06, Department of Computer Science and Engineering, University at Buffalo, SUNY Tree of Latent Mixtures for Bayesian Modelling and Classification of High Dimensional Data Hagai
More informationCollective classification in network data
1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods
More informationCREDAL SUM-PRODUCT NETWORKS
CREDAL SUM-PRODUCT NETWORKS Denis Mauá Fabio Cozman Universidade de São Paulo Brazil Diarmaid Conaty Cassio de Campos Queen s University Belfast United Kingdom ISIPTA 2017 DEEP PROBABILISTIC GRAPHICAL
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationText Modeling with the Trace Norm
Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationSupplementary Material: The Emergence of. Organizing Structure in Conceptual Representation
Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationHierarchical Clustering Lecture 9
Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011:
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationEVOLUTIONARY DISTANCES INFERRING PHYLOGENIES
EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING
More informationProbabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation
Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More informationGradient of the lower bound
Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationUnsupervised naive Bayes for data clustering with mixtures of truncated exponentials
Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials José A. Gámez Computing Systems Department Intelligent Systems and Data Mining Group i 3 A University of Castilla-La
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationIntegrating Probabilistic Reasoning with Constraint Satisfaction
Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational
More information