Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Similar documents
Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

10-701/15-781, Fall 2006, Final

Lecture 5: Exact inference

Bayes Net Learning. EECS 474 Fall 2016

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

ECE 5424: Introduction to Machine Learning

Generative and discriminative classification techniques

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Note Set 4: Finite Mixture Models and the EM Algorithm

K-Means and Gaussian Mixture Models

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

COMS 4771 Clustering. Nakul Verma

Bayes Classifiers and Generative Methods

Introduction to Machine Learning CMU-10701

Computer Vision. Exercise Session 10 Image Categorization

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

10708 Graphical Models: Homework 4

Mixture Models and EM

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Latent Variable Models and Expectation Maximization

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Mixture Models and the EM Algorithm

Machine Learning

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering web search results

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Introduction to Mobile Robotics

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

Probabilistic Learning Classification using Naïve Bayes

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

CSC 411: Lecture 12: Clustering

List of Exercises: Data Mining 1 December 12th, 2015

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

SD 372 Pattern Recognition

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Probabilistic Graphical Models

Inference and Representation

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Machine Learning. Supervised Learning. Manfred Huber

Generative and discriminative classification techniques

Clustering Lecture 5: Mixture Model

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

Machine Learning Lecture 3

Machine Learning Classifiers and Boosting

Lecture 9: Undirected Graphical Models Machine Learning

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Machine Learning

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Performance Evaluation of Various Classification Algorithms

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Machine Learning Lecture 3

Content-based image and video analysis. Machine learning

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Machine Learning Lecture 3

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Programming Exercise 3: Multi-class Classification and Neural Networks

Adaptive Learning of an Accurate Skin-Color Model

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

The exam is closed book, closed notes except your one-page cheat sheet.

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Deep Generative Models Variational Autoencoders

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Building Classifiers using Bayesian Networks

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Fall 09, Homework 5

Introduction to Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning. Chao Lan

9.1. K-means Clustering

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Markov Random Fields and Segmentation with Graph Cuts

Machine Learning. Unsupervised Learning. Manfred Huber

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling

Three Unsupervised Models

Mixture Models and EM

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Scalable Bayes Clustering for Outlier Detection Under Informative Sampling

Transcription:

Machine Learning A 708.064 13W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence a) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B, C, D) = P (A)P (B)P (C A, B)P (D C) show that the following conditional independence assumptions hold. (i) A B (ii) A D C b) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B, C, D) = P (A)P (B A)P (C A)P (D C) show that the following conditional independence assumptions hold. (i) A D C (ii) B D C c) [1* P] Find a probability distribution P (A, B, C) in form of a probability table which fulfills A B but not A B C. Proof that your probability distribution really fulfills the two criteria. 2 Bayesian Networks a) [1 P] Construct a Bayesian network which represents the conditional independence assumptions of the probability distributions from example 1 and 2.

b) [2 P] Construct two different Bayesian network which encodes exactly the following conditional independence assumptions A C B A D B C D B c) [2 P] A doctor gives a patient a drug dependent on their age and gender. The patient has a probability to recover depending on whether s/he receives the drug, how old s/he is and which gender the patient has. Additionally it is known that age and gender are conditional independent if nothing else is known from the patient. 1 (i) Draw the Bayesian network which describes this situation. (ii) How does the factorized probability distribution look like? (iii) Write down the formula to compute the probability that a patient recovers given that you know if s/he gets the drug. Write down the formula using only probabilities which are part of the factorized probability distribution. 3 d-separation a) [2 P] In Figure 1 you can see a Bayesian network which is used for the diagnosis of lung cancer and tuberculosis. Check if the following conditional independence assumptions are true or false. 2 (i) tuberculosis smoking dyspnea (ii) lungcancer bronchitis smoking (iii) visit to Asia smoking lungcancer (iv) visit to Asia smoking lungcancer, dyspnea 1 Adapted from exercise 3.9 in D. Barber: Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012. 2 Exercise 3.3 in D. Barber: Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012.

Figure 1: Bayesian network for the diagnosis of lung cancer and tuberculosis. Figure 2: Factor graph for example 4 4 Inference in Factor Graphs [2 P] Show how the given marginal probabilities can be computed in the factor graph in Fig. 2 using the sum-product algorithm. Start with drawing arrows to the factor graph which show how the messages are passed. After that write down how each of the messages can be computed and finally write down the formula for the marginal distribution. a) P (A) b) P (B) 5 Factor graphs: HMM [5 P] Implement the sum-product algorithm for factor graphs in MATLAB to infer the hidden states of a HMM for the following problem. An agent is located randomly in a 10 10 grid world. In each of T = 20 time steps he either stays at his current location with a probability of 25% or

Figure 3: Grid world. moves up, down, left or right, where each of these four actions is chosen with a probability of 18.75%. Each cell in the grid world is painted with a certain color that is chosen randomly from a set of k possible colors as shown in Fig. 3. This color is observed by the agent at each time step and reported to us. Only these color values are available to infer the initial location and the subsequent positions of the agent resulting in the Hidden Markov model (HMM) illustrated in Fig. 4. Figure 4: Hidden Markov model (HMM). Modify the file hmmmessagepassingtask.m available for download on the course homepage 3 and implement the sum-product algorithm to solve this inference problem. Investigate and discuss the dependence of the solution on the number of different colors in the grid world (k). Hand in figures of representative results that show the actual agent trajectories, the most likely trajectories 3 http://www.igi.tugraz.at/lehre/intern/mla_ws1314_hw5.zip

and the probabilities of the agent positions at each time step as inferred by the sum-product algorithm. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to florian.hubner@igi.tugraz.at 6 Markov Networks a) [2 P]For the given joint probability draw an appropriate Markov network and write down the formula for the given conditional probability. Simplify the formula for the conditional probability as far as possible. (i) P (A, B, C, D, E) = 1 φ(a, C)φ(B, C)φ(C, D)φ(C, E), Z P (A C) =? (ii) P (A, B, C, D, E) = 1 φ(a, B, C)φ(C, D)φ(D, E), Z P (A D) =? b) [3 P]In this example you have to write down the joint distribution for the given Markov network and proof that a given independence assumption holds. (i) B C A (ii) A E D

7 Junction Trees [3 P] Consider the following distribution 4 P (A, B, C, D, E, F, G, H, I) = P (A)P (B A)P (C A)P (D A)P (E B)P (F C)P (G D)P (H E, F )P (I F, G) a) Draw the Bayesian network for this distribution. b) Draw the moralised graph. c) Draw the triangulated graph. Your triangulated should contain cliques of the smallest size possible. d) Draw a junction tree for the above graph and verify that it satisfies the running intersection property. e) Write down a suitable initialization of clique potentials. f) Find an appropriate message updating schedule. 8 Parameter Learning: Beta distribution [2 P]You want to estimate the bias of a coin using a Bayesian approach. For that you toss the coin ten times and get the following result {h, t, t, t, h, t, h, t, t, t} a) Compute the parameters of the posterior probability using a uniform Beta distribution as prior. Plot the posterior probability density function using the MATLAB function betapdf. b) You do another experiment with ten tosses and this time you get the following result {h, h, t, h, t, t, h, t, t, h} Again compute the parameters of the posterior probability with (i) a uniform Beta distribution (ii) using the parameters you obtained after the first experiment. Plot the two posterior probability density function. c) Compute the probability that the next toss will be h using the results from b). d) Explain the different results you get in b) and c). 4 Exercise 6.4 in D. Barber: Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012.

9 Beta distribution [2* P]Show that the mean, variance, and mode of the beta distribution are given respectively by E[µ] = var[µ] = mode[µ] = a a + b ab (a + b) 2 (a + b + 1) a 1 a + b 2. 10 Parameter Learning: Naive Bayes Classifiers [5 P]Implement an algorithm for learning a naive Bayes classifier and apply it to a spam email data set. You are required to use MATLAB for this assignment. The spam dataset is available for download on the course homepage 5. a) Write a function called nbayes learn.m that takes a training dataset for a binary classification task with binary attributes and returns the posterior Beta distributions of all model parameters (specified by variables a i and b i for the ith model parameter) of a naive Bayes classifier given a prior Beta distribution for each of the model parameters (specified by variables a i and b i for the ith model parameter). b) Write a function called nbayes predict.m that takes a set of test data vectors and returns the most likely class label predictions for each input vector based on the posterior parameter distributions obtained in a). c) Use both functions to conduct the following experiment. For your assignment you will be working with a data set that was created a few years ago at the Hewlett Packard Research Labs as a testbed data set to test different spam email classification algorithms. (i) Train a naive Bayes model on the first 2500 samples (using Laplace uniform prior distributions) and report the classification error of the trained model on a test data set consisting of the remaining examples that were not used for training. (ii) Repeat the previous step, now training on the first {10, 50, 100, 200,..., 500} samples, and again testing on the same test data as used in point 1 (samples 2501 through 4601). Report the classification error on the test dataset as a function of the number of training examples. Hand in a plot of this function. 5 http://www.igi.tugraz.at/lehre/intern/mla_ws1314_hw10.zip

(iii) Comment on how accurate the classifier would be if it would randomly guess a class label or it would always pick the most common label in the training data. Compare these performance values to the results obtained for the naive Bayes model. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to florian.hubner@igi.tugraz.at 11 K-means: Image compression [3 P]Apply the k-means algorithm for lossy image compression by means of vector quantization. Download the 512 512 image mandrill.tif. 6 Each pixel represents a point in a three dimensional (r,g,b) color space. Each color dimension encodes the corresponding intensity with an 8 bit integer. Cluster the pixels in color space using k-means with k {2, 4, 8, 16, 32, 64, 128} clusters and replace the original color values with the indices of the closest cluster centers. You can use the MATLAB function kmeans to do the clustering. Determine the compression factor for each value of k and relate it to the quality of the image. Apply an appropriate quality measure of you choice. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to florian.hubner@igi.tugraz.at 6 http://www.igi.tugraz.at/lehre/intern/mla_ws1314_hw11.zip

12 EM Algorithm for Gaussian Mixture Model [5 P] In this task you have to implement the EM algorithm for Mixture of Gaussians. The algorithm can be found in the slides or in Bishop p.438,439. 1. Write the function em which takes the dataset and the number of clusters K as arguments. Make sure that you initialize the parameters (µ k, Σ k, π k ) appropriately. 2. Download the dataset provided on the homepage 7. It contains the complete dataset where x contains the data and z contains the correct classes. You can use z to check if the correct clusters are found. Let the algorithm run several times with K = 5 but with different initial cluster centers to see how it performs. Show some plots and describe the behaviour of the algorithm. Does it always find the correct clusters? Does the algorithm have any problems? 3. Now use different values for K and analyze what the algorithm does for K being lower or bigger than the correct amount of clusters. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to florian.hubner@igi.tugraz.at 13 EM Algorithm for Mixture of Lines Assume that the training examples x n R 2 with n = 1,..., N were generated from a mixture of K lines where P (x n,2 z n,k = 1) = N (x n,2 θ k,1 x n,1 + θ k,2, σ k ) (1) N (x µ, σ) = ) 1 (x µ)2 exp ( 2πσ 2σ 2 and the hidden variable z n,k = 1 if x n is generated from line k and 0 otherwise. 1. [1* P]Derive the update equations for the M-step of the EM algorithm for the variables θ k and σ k. 2. [2* P]Implement the EM algorithm for Mixture of Lines using the update equations you derived in 1. Use the provided dataset 8 to evaluate your implementation. Show some plots of intermediate steps and describe what is happening. 7 http://www.igi.tugraz.at/lehre/intern/mla WS1314 HW12.zip 8 http://www.igi.tugraz.at/lehre/intern/mla WS1314 HW13.zip (2)

Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to florian.hubner@igi.tugraz.at