Machine Learning

Similar documents
Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Naïve Bayes, Gaussian Distributions, Practical Applications

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

CS 343: Artificial Intelligence

COMP90051 Statistical Machine Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Introduction to Machine Learning CMU-10701

Probabilistic Graphical Models

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

CSEP 517 Natural Language Processing Autumn 2013

Inference and Representation

Learning Bayesian Networks (part 3) Goals for the lecture

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Time series, HMMs, Kalman Filters

K-Means and Gaussian Mixture Models

Bayesian Networks Inference (continued) Learning

Simple Model Selection Cross Validation Regularization Neural Networks

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Semi-supervised Document Classification with a Mislabeling Error Model

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

CS 343: Artificial Intelligence

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Machine Learning. Sourangshu Bhattacharya

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Graphical Models and Markov Blankets

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Clustering web search results

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

A Taxonomy of Semi-Supervised Learning Algorithms

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

7. Boosting and Bagging Bagging

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Building Classifiers using Bayesian Networks

CS 188: Artificial Intelligence Fall Machine Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Dynamic Bayesian network (DBN)

Expectation Maximization: Inferring model parameters and class labels

Artificial Intelligence Naïve Bayes

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Learning Undirected Models with Missing Data

CS 188: Artificial Intelligence

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Lecture 8: The EM algorithm

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Boosting Simple Model Selection Cross Validation Regularization

Auto-Encoding Variational Bayes

Unsupervised Learning: Clustering

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Collective classification in network data

Bayesian Networks Inference

Bayesian Machine Learning - Lecture 6

Machine Learning. Classification

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Expectation Maximization!

Directed Graphical Models (Bayes Nets) (9/4/13)

Machine Learning / Jan 27, 2010

Cambridge Interview Technical Talk

Expectation Maximization: Inferring model parameters and class labels

10708 Graphical Models: Homework 2

ECE 5424: Introduction to Machine Learning

Approximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)

COMP90051 Statistical Machine Learning

Latent Variable Models and Expectation Maximization

Inferring Ongoing Activities of Workstation Users by Clustering

Semantic Segmentation. Zhongang Qi

Clustering. Image segmentation, document clustering, protein class discovery, compression

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Introduction to Hidden Markov models

RNNs as Directed Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Probabilistic Graphical Models Part III: Example Applications

Expectation Maximization (EM) and Gaussian Mixture Models

CS 188: Artificial Intelligence Fall 2011

Graphical Models & HMMs

Sequence Labeling: The Problem

CS 188: Artificial Intelligence Fall Announcements

Announcements. CS 188: Artificial Intelligence Fall Machine Learning. Classification. Classification. Bayes Nets for Classification

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Learning with Scope, with Application to Information Extraction and Classification

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

Introduction to Graphical Models

CS 343: Artificial Intelligence

Deep Boltzmann Machines

Transcription:

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 17, 2011 Today: Graphical models Learning from fully labeled data Learning from partly observed data EM Readings: Required: Bishop chapter 8, through 8.2 Bayes Network Definition A Bayes network represents the joint probability distribution over a collection of random variables A Bayes network is a directed acyclic graph and a set of CPD s Each node denotes a random variable Edges denote dependencies CPD for each node X i defines P(X i Pa(X i )) The joint distribution over all variables is defined as Pa(X) = immediate parents of X in the graph 1

Markov Blanket from [Bishop, 8.2] What You Should Know Bayes nets are convenient representation for encoding dependencies / conditional independence BN = Graph plus parameters of CPD s Defines joint distribution over variables Can calculate everything else from that Though inference may be intractable Reading conditional independence relations from the graph Each node is cond indep of non-descendents, given its immediate parents D-separation Explaining away 2

Java Bayes Net Applet http://www.pmr.poli.usp.br/ltd/software/javabayes/home/applet.html by Fabio Gagliardi Cozman Learning of Bayes Nets Four categories of learning problems Graph structure may be known/unknown Variable values may be fully observed / partly unobserved Easy case: learn parameters for graph structure is known, and data is fully observed Interesting case: graph known, data partly known Gruesome case: graph structure unknown, data partly unobserved 3

Learning CPTs from Fully Observed Data Example: Consider learning the parameter MLE (Max Likelihood Estimate) is k th training example Remember why? MLE estimate of from fully observed data Maximum likelihood estimate Our case: 4

Estimate from partly observed data What if FAHN observed, but not S? Can t calculate MLE Let X be all observed variable values (over all examples) Let Z be all unobserved variable values Can t calculate MLE: WHAT TO DO? Estimate from partly observed data What if FAHN observed, but not S? Can t calculate MLE Let X be all observed variable values (over all examples) Let Z be all unobserved variable values Can t calculate MLE: EM seeks* to estimate: * EM guaranteed to find local maximum 5

EM seeks estimate: here, observed X={F,A,H,N}, unobserved Z={S} EM Algorithm EM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S}) Define Iterate until convergence: E Step: Use X and current θ to calculate P(Z X,θ) M Step: Replace current θ by Guaranteed to find local maximum. Each iteration increases 6

E Step: Use X, θ, to Calculate P(Z X,θ) observed X={F,A,H,N}, unobserved Z={S} How? Bayes net inference problem. E Step: Use X, θ, to Calculate P(Z X,θ) observed X={F,A,H,N}, unobserved Z={S} How? Bayes net inference problem. 7

EM and estimating observed X = {F,A,H,N}, unobserved Z={S} E step: Calculate P(Z k X k ; θ) for each training example, k M step: update all relevant parameters. For example: Recall MLE was: EM and estimating More generally, Given observed set X, unobserved set Z of boolean values E step: Calculate for each training example, k the expected value of each unobserved variable M step: Calculate estimates similar to MLE, but replacing each count by its expected count 8

Using Unlabeled Data to Help Train Naïve Bayes Classifier Learn P(Y X) Y X 1 X 2 X 3 X 4 Y X1 X2 X3 X4 1 0 0 1 1 0 0 1 0 0 0 0 0 1 0? 0 1 1 0? 0 1 0 1 E step: Calculate for each training example, k the expected value of each unobserved variable 9

EM and estimating Given observed set X, unobserved set Y of boolean values E step: Calculate for each training example, k the expected value of each unobserved variable Y M step: Calculate estimates similar to MLE, but replacing each count by its expected count let s use y(k) to indicate value of Y on kth example EM and estimating Given observed set X, unobserved set Y of boolean values E step: Calculate for each training example, k the expected value of each unobserved variable Y M step: Calculate estimates similar to MLE, but replacing each count by its expected count MLE would be: 10

From [Nigam et al., 2000] Experimental Evaluation Newsgroup postings 20 newsgroups, 1000/group Web page classification student, faculty, course, project 4199 web pages Reuters newswire articles 12,902 articles 90 topics categories 11

word w ranked by P(w Y=course) / P(w Y course) Using one labeled example per class 20 Newsgroups 12

20 Newsgroups Usupervised clustering Just extreme case for EM with zero labeled examples 13