Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Similar documents
Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

10-701/15-781, Fall 2006, Final

Lecture 5: Exact inference

Bayes Net Learning. EECS 474 Fall 2016

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Clustering web search results

Generative and discriminative classification techniques

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Computer Vision. Exercise Session 10 Image Categorization

Note Set 4: Finite Mixture Models and the EM Algorithm

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

COMS 4771 Clustering. Nakul Verma

Bayes Classifiers and Generative Methods

Machine Learning. Supervised Learning. Manfred Huber

ECE 5424: Introduction to Machine Learning

Machine Learning

K-Means and Gaussian Mixture Models

Introduction to Machine Learning CMU-10701

CSC 411: Lecture 12: Clustering

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Latent Variable Models and Expectation Maximization

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Mixture Models and the EM Algorithm

Markov Random Fields and Segmentation with Graph Cuts

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

10708 Graphical Models: Homework 4

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

Introduction to Mobile Robotics

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Mixture Models and EM

Probabilistic Learning Classification using Naïve Bayes

Machine Learning

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

List of Exercises: Data Mining 1 December 12th, 2015

SD 372 Pattern Recognition

Computer vision: models, learning and inference. Chapter 10 Graphical Models

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Probabilistic Graphical Models

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Clustering Lecture 5: Mixture Model

Building Classifiers using Bayesian Networks

Generative and discriminative classification techniques

Programming Exercise 3: Multi-class Classification and Neural Networks

Machine Learning. Unsupervised Learning. Manfred Huber

Statistics 202: Statistical Aspects of Data Mining

Machine Learning Lecture 3

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 9: Undirected Graphical Models Machine Learning

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Performance Evaluation of Various Classification Algorithms

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Fall 09, Homework 5

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Inference and Representation

Scalable Bayes Clustering for Outlier Detection Under Informative Sampling

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Machine Learning Lecture 3

Content-based image and video analysis. Machine learning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Bayesian Machine Learning - Lecture 6

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Machine Learning Lecture 3

INF 4300 Classification III Anne Solberg The agenda today:

Adaptive Learning of an Accurate Skin-Color Model

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

The exam is closed book, closed notes except your one-page cheat sheet.

Machine Learning Classifiers and Boosting

Deep Generative Models Variational Autoencoders

Nearest neighbors classifiers

Introduction to Machine Learning

Machine Learning. Chao Lan

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Binomial Coefficient Identities and Encoding/Decoding

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling

Transcription:

Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B, C, D) = P (A)P (B)P (C A, B)P (D C) show that the following conditional independence assumptions hold. (i)) A B (ii)) A D C b) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B, C, D) = P (A)P (B A)P (C A)P (D C) show that the following conditional independence assumptions hold. (i)) A D C (ii)) B D C c) [1* P] Find a probability distribution P (A, B, C) in form of a probability table which fulfills A B but not A B C. Proof that your probability distribution really fulfills the two criteria.

2 Bayesian Networks [5 P] a) [1 P] Construct a Bayesian network which represents the conditional independence assumptions of the probability distributions from example 1 and 2. b) [2 P] Construct two different Bayesian networks which encode exactly the following conditional independence assumptions A C B A D B C D B c) [2 P] A doctor gives patients a drug based on their age and gender. So a patients probability to recover is dependent on receiving the drug, the age and gender. Additionally age and gender are conditionally independent if nothing else is known about the patient. (i) Draw the Bayesian network which describes this situation. (ii) How does the factorized probability distribution look like? (iii) Write down the formula to compute the probability that a patient recovers, given that you know if the drug was given. Write down the formula using only probabilities which are part of the factorized probability distribution. 3 D-Separation [2 P] [2 P] In Figure 1 you can see a Bayesian network which is used for the diagnosis of lung cancer and tuberculosis. Check if the following conditional independence assumptions are true or false.2 (i) tuberculosis smoking dyspnea (ii) lungcancer bronchitis smoking (iii) visit to Asia smoking lungcancer (iv) visit to Asia smoking lungcancer, dyspnea

Figure 1: Bayesian network for the diagnosis of lung cancer and tuberculosis. 4 Inference in Factor Graphs [2 P] Show how the given marginal probabilities can be computed in the factor graph in Fig. 2 using the sum-product algorithm. Start with drawing arrows to the factor graph which show how the messages are passed. After that write down how each of the messages can be computed and finally write down the formula for the marginal distribution. a) P (A) b) P (B) Figure 2: Factor graph for problem 4

5 Factor graphs: HMM [5 P] Implement the sum-product algorithm for factor graphs in MATLAB to infer the hidden states of a HMM for the following problem. Figure 3: Grid world An agent is located randomly in a 10 10 grid world. In each of T = 20 time steps he either stays at his current location with a probability of 25% or moves up, down, left or right, where each of these four actions is chosen with a probability of 18.75%. Each cell in the grid world is painted with a certain color that is chosen randomly from a set of k possible colors as shown in Fig. 3. This color is observed by the agent at each time step and reported to us. Only these color values are available to infer the initial location and the subsequent positions of the agent resulting in the Hidden Markov model (HMM) illustrated in Fig. 4. Figure 4: Hidden Markov model (HMM)

Modify the file hmmmessagepassingtask.m available for download on the course homepage and implement the sum-product algorithm to solve this inference problem. Investigate and discuss the dependence of the solution on the number of different colors in the grid world (k ). Hand in figures of representative results that show the actual agent trajectories, the most likely trajectories and the probabilities of the agent positions at each time step as inferred by the sum-product algorithm. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to anand@igi.tugraz.at with the subject MLA SS15 HW5 - <your name> 6 Markov Networks [5 P] a) [2 P]For the given joint probability draw an appropriate Markov network and write down the formula for the given conditional probability. Simplify the formula for the conditional probability as far as possible. (i) P (A, B, C, D, E) = 1 φ(a, C)φ(B, C)φ(C, D)φ(C, E), Z P (A C) =? (ii) P (A, B, C, D, E) = 1 φ(a, B, C)φ(C, D)φ(D, E), Z P (A D) =? b) [3 P]In this example you have to write down the joint distribution for the given Markov network and proof that a given independence assumption holds. (i) B C A (ii) A E D

7 Junction Trees [3 P] Consider the following distribution: P (A, B, C, D, E, F, G, H, I) = P (A)P (B A)P (C A)P (D A)P (E B)P (F C)P (G D)P (H E, F )P (I F, G) a) Draw the Bayesian network for this distribution. b) Draw the moralised graph. c) Draw the triangulated graph. Your triangulated should contain cliques of the smallest size possible. d) Draw a junction tree for the above graph and verify that it satisfies the running intersection property. e) Write down a suitable initialization of clique potentials. f) Find an appropriate message updating schedule. 8 Parameter Learning: Beta distribution [2 P] You want to estimate the bias of a coin using a Bayesian approach. For that you toss the coin ten times and get the following result {h, t, t, t, h, t, h, t, t, t} a) Compute the parameters of the posterior probability using a uniform Beta distribution as prior. Plot the posterior probability density function using the MATLAB function betapdf. b) You do another experiment with ten tosses and this time you get the following result {h, h, t, h, t, t, h, t, t, h} Again compute the parameters of the posterior probability with (i) a uniform Beta distribution (ii) using the parameters you obtained after the first experiment. Plot the two posterior probability density function. c) Compute the probability that the next toss will be h using the results from b). d) Explain the different results you get in b) and c).

9 Beta distribution [2* P] Show that the mean, variance, and mode of the beta distribution are given respectively by E[µ] = var[µ] = a a + b ab (a + b) 2 (a + b + 1) mode[µ] = a 1 a + b 2. 10 Parameter Learning: Naive Bayes Classifiers [5 P] Implement an algorithm for learning a naive Bayes classifier and apply it to a spam email data set. You are required to use MATLAB for this assignment. The spam dataset is available for download on the course homepage. a) Write a function called nbayes learn.m that takes a training dataset for a binary classification task with binary attributes and returns the posterior Beta distributions of all model parameters (specified by variables a i and b i for the i th model parameter) of a naive Bayes classifier given a prior Beta distribution for each of the model parameters (specified by variables a i and b i for the i th model parameter). b) Write a function called nbayes predict.m that takes a set of test data vectors and returns the most likely class label predictions for each input vector based on the posterior parameter distributions obtained in a). c) Use both functions to conduct the following experiment. For your assignment you will be working with a data set that was created a few years ago at the Hewlett Packard Research Labs as a testbed data set to test different spam email classification algorithms. (i) Train a naive Bayes model on the first 2500 samples (using Laplace uniform prior distributions) and report the classification error of the trained model on a test data set consisting of the remaining examples that were not used for training. (ii) Repeat the previous step, now training on the first 10, 50, 100, 200,..., 500 samples, and again testing on the same test data as used in point 1 (samples 2501 through 4601). Report the classification error on the test dataset as a function of the number of training examples. Hand in a plot of this function.

(iii) Comment on how accurate the classifier would be, if it would randomly guess a class label or it would always pick the most common label in the training data. Compare these performance values to the results obtained for the naive Bayes model. Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless. Send the code of your solution to griesbacher@tugraz.at with subject [MLA A10] code submission before 19th January 2016 2pm. 11 K-means: Image compression [3 P] Apply the k-means algorithm for lossy image compression by means of vector quantization. ˆ Download the 512 512 image mandrill.tif (available for download on the course website). Each pixel represents a point in a three dimensional (r,g,b) color space. Each color dimension encodes the corresponding intensity with an 8 bit integer. ˆ Cluster the pixels in color space using k-means with k {2, 4, 8, 16, 32, 64, 128} clusters and replace the original color values with the indices of the closest cluster centers. You can use the MATLAB function kmeans to do the clustering. Determine the compression factor for each value of k and relate it to the quality of the image. Apply an appropriate quality measure of you choice. Present your results clearly, legibly and in a well structured manner. Document them in such a way that anybody can reproduce them effortlessly. Send the code of your solution to anand@igi.tugraz.at with the subject MLA SS15 HW11 - <your name>. 12 EM Algorithm for Gaussian Mixture Model [5 P] In this task you have to implement the EM algorithm for Mixture of Gaussians. The algorithm can be found in the slides or in Bishop p.438,439. 1. Write the function em which takes the dataset and the number of clusters K as arguments. Make sure that you initialize the parameters ( µ k, Σ k, π k ) appropriately.

2. Download the dataset provided on the course website. It contains the complete dataset where x contains the data and z contains the correct classes. You can use z to check if the correct clusters are found. Let the algorithm run several times with K = 5 but with different initial cluster centers to see how it performs. Show some plots and describe the behaviour of the algorithm. Does it always find the correct clusters? Does the algorithm have any problems? 3. Now use different values for K and analyze what the algorithm does for K being lower or bigger than the correct amount of clusters. Present your results clearly, legibly and in a well structured manner. Document them in such a way that anybody can reproduce them effortlessly. Send the code of your solution to anand@igi.tugraz.at with the subject MLA SS15 HW12 - <your name>. 13 EM Algorithm for Mixture of Lines [3* P] Assume that the training examples x n R 2 with n = 1,..., N were generated from a mixture of K lines where P (x n,2 z n,k = 1) = N (x n,2 θ k,1 x n,1 + θ k,2, σ k ) (1) N (x µ, σ) = 1 ) (x µ)2 exp ( 2πσ 2σ 2 and the hidden variable z n,k = 1 if x n is generated from line k and 0 otherwise. 1. [1* P] Derive the update equations for the M-step of the EM algorithm for the variables θ k and σ k. 2. [2* P] Implement the EM algorithm for Mixture of Lines using the update equations you derived in 1. Use the dataset provided on the course website to evaluate your implementation. Show some plots of intermediate steps and describe what is happening. Present your results clearly, legibly and in a well structured manner. Document them in such a way that anybody can reproduce them effortlessly. Send the code of your solution to anand@igi.tugraz.at with the subject MLA SS15 HW13 - <your name>. (2)