Learning Bayesian Networks (part 3) Goals for the lecture

Similar documents
Building Classifiers using Bayesian Networks

Learning Bayesian Networks (part 2) Goals for the lecture

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Bayesian Networks Inference (continued) Learning

10708 Graphical Models: Homework 2

Instance-Based Learning. Goals for the lecture

Machine Learning

Machine Learning

Machine Learning

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Artificial Intelligence Naïve Bayes

Evaluating Machine-Learning Methods. Goals for the lecture

CS 343: Artificial Intelligence

Machine Learning

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning

Machine Learning

Evaluating Machine Learning Methods: Part 1

Machine Learning

Machine Learning

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Graphical Models. David M. Blei Columbia University. September 17, 2014

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

CS 343: Artificial Intelligence

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Dynamic Bayesian network (DBN)

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

Day 3 Lecture 1. Unsupervised Learning

The Basics of Graphical Models

Machine Learning. Decision Trees. Manfred Huber

Lecture 3: Conditional Independence - Undirected

Lecture 5: Exact inference

Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CS 188: Artificial Intelligence Fall Machine Learning

Large Scale Data Analysis Using Deep Learning

Probabilistic Classifiers DWML, /27

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS 188: Artificial Intelligence

CS 343: Artificial Intelligence

Bayesian Classification Using Probabilistic Graphical Models

Overview of machine learning

Random projection for non-gaussian mixture models

Introduction to Machine Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

CS 188: Artificial Intelligence Spring Announcements

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

RNNs as Directed Graphical Models

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Machine Learning. Classification

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Machine Learning. Sourangshu Bhattacharya

COMP90051 Statistical Machine Learning

Sequence Labeling: The Problem

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CS 188: Artificial Intelligence Fall Announcements

Announcements. CS 188: Artificial Intelligence Fall Machine Learning. Classification. Classification. Bayes Nets for Classification

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Comparing Case-Based Bayesian Network and Recursive Bayesian Multi-net Classifiers

CSEP 517 Natural Language Processing Autumn 2013

Lecture 9: Undirected Graphical Models Machine Learning

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

What is machine learning?

Join Bayes Nets: A New Type of Bayes net for Relational Data

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CSE 158. Web Mining and Recommender Systems. Midterm recap

COMS 4771 Clustering. Nakul Verma

Bayesian Network Structure Learning by Recursive Autonomy Identification

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Introduction to Statistical Relational Learning

Semi-supervised Learning

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Exam Advanced Data Mining Date: Time:

Incremental Learning of Tree Augmented Naive Bayes Classifiers

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Statistics 202: Statistical Aspects of Data Mining

Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models

TA Section: Problem Set 4

Co-clustering for differentially private synthetic data generation

Problem Set #6 Due: 11:30am on Wednesday, June 7th Note: We will not be accepting late submissions.

Augmented Naive Bayesian Classifiers

Machine Learning

Unsupervised Learning: Clustering

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms

COS 513: Foundations of Probabilistic Modeling. Lecture 5

STAT 598L Learning Bayesian Network Structure

Cambridge Interview Technical Talk

Structured Learning. Jun Zhu

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Information theory methods for feature selection

Social Media Computing

Problem Set #6 Due: 2:30pm on Wednesday, June 1 st

Unsupervised Learning

Extending Naive Bayes Classifier with Hierarchy Feature Level Information for Record Linkage. Yun Zhou. AMBN-2015, Yokohama, Japan 16/11/2015

Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Transcription:

Learning Bayesian Networks (part 3) Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Tom Dietterich, Pedro Domingos, Tom Mitchell, David Page, and Jude Shavlik Goals for the lecture you should understand the following concepts the naïve Bayes classifier the Tree Augmented Network (TAN) algorithm 1

Bayes nets for classification the learning methods for BNs we ve discussed so far can be thought of as being unsupervised the learned models are not constructed to predict the value of a special class variable instead, they can predict values for arbitrarily selected query variables now let s consider BN learning for a standard supervised task (learn a model to predict given X 1 X n ) Naïve Bayes one very simple BN approach for supervised tasks is naïve Bayes in naïve Bayes, we assume that all features X i are conditionally independent given the class X 1 X 2 X n-1 X n P(X 1,, X n, ) = P( ) P(X i ) n i=1 2

Naïve Bayes X 1 X 2 X n-1 X n Learning estimate P( = y) for each value of the class variable estimate P(X i =x = y) for each X i Classification: use Bayes Rule P( = y x) = y' values( ) P(y)P(x y) P(y')P(x y') = y' values( ) n P(y) y) i=1 P(y') n i=1 y') Naïve Bayes vs. BNs learned with an unsupervised structure search test-set error on 25 classification data sets from the UC-Irvine Repository Figure from Friedman et al., Machine Learning 1997 3

The Tree Augmented Network (TAN) algorithm [Friedman et al., Machine Learning 1997] learns a tree structure to augment the edges of a naïve Bayes network algorithm 1. compute weight I(X i, X j ) for each possible edge (X i, X j ) between features 2. find maximum weight spanning tree (MST) for graph over X 1 X n 3. assign edge directions in MST 4. construct a TAN model by adding node for and an edge from to each X i Conditional mutual information in the TAN algorithm conditional mutual information is used to calculate edge weights I(X i, X j ) = x i values( X i ) x j values( X j ) y values( ), x j, y)log 2, x j y) y)p(x j y) how much information X i provides about X j when the value of is known 4

Example TAN network class variable naïve Bayes edges edges determined by MST Classification with a TAN network As before use Bayes Rule: P = y x = P y P(x y),* P y * P(x y * ) In the example network, we calculate P x y as: P x y = P pregnant y P age y, pregnant P insulin y, age P dpf y, insulin P mass y, insulin P(glucose y, insulin) 5

TAN vs. Chow-Liu TAN is mostly focused on learning a Bayes net specifically for classification problems the MST includes only the feature variables (the class variable is used only for calculating edge weights) conditional mutual information is used instead of mutual information in determining edge weights in the undirected graph the directed graph determined from the MST is added to the X i edges that are in a naïve Bayes network although parameters are still set to maximize P(y, x) instead of P y x) TAN vs. Naïve Bayes test-set error on 25 data sets from the UC-Irvine Repository Figure from Friedman et al., Machine Learning 1997 6

Comments on Bayesian networks the BN representation has many advantages easy to encode domain knowledge (direct dependencies, causality) can represent uncertainty principled methods for dealing with missing values can answer arbitrary queries (in theory; in practice may be intractable) for supervised tasks, it may be advantageous to use a learning approach (e.g. TAN) that focuses on the dependencies that are most important Comments on Bayesian networks (continued) although very simplistic, naïve Bayes often learns highly accurate models we focused on learning Bayes nets with only discrete variables; can also have numeric variables (although not as parents) BNs are one instance of a more general class of probabilistic graphical models 7