Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Similar documents
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Network Traffic Measurements and Analysis

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Unsupervised Learning

Grundlagen der Künstlichen Intelligenz

ECE 5424: Introduction to Machine Learning

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Clustering web search results

Clustering Lecture 5: Mixture Model

Introduction to Machine Learning CMU-10701

Mixture Models and EM

Supervised vs. Unsupervised Learning

Learning from Data Mixture Models

University of Washington Department of Computer Science and Engineering / Department of Statistics

K-Means and Gaussian Mixture Models

Content-based image and video analysis. Machine learning

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Generative and discriminative classification techniques

Inference and Representation

Statistics 202: Statistical Aspects of Data Mining

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

1 Case study of SVM (Rob)

University of Washington Department of Computer Science and Engineering / Department of Statistics

Expectation Maximization (EM) and Gaussian Mixture Models

Supervised vs unsupervised clustering

Normalized Texture Motifs and Their Application to Statistical Object Modeling

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Introduction to Machine Learning. Xiaojin Zhu

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

University of Washington Department of Computer Science and Engineering / Department of Statistics

10-701/15-781, Fall 2006, Final

COMS 4771 Clustering. Nakul Verma

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Machine Learning for OR & FE

Clustering and The Expectation-Maximization Algorithm

An Introduction to PDF Estimation and Clustering

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Fall 09, Homework 5

Approaches to Clustering

Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

CS 229 Midterm Review

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

CIS192 Python Programming

Machine Learning Lecture 3

Clustering: Classic Methods and Modern Views

Generative and discriminative classification techniques

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning Lecture 3

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

732A54/TDDE31 Big Data Analytics

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Unsupervised Learning: Clustering

Machine Learning / Jan 27, 2010

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Digital Image Processing Laboratory: Markov Random Fields and MAP Image Segmentation

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Segmentation: Clustering, Graph Cut and EM

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm


STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

CS 195-5: Machine Learning Problem Set 5

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

IBL and clustering. Relationship of IBL with CBR

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Clustering. Supervised vs. Unsupervised Learning

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Clustering algorithms

Expectation Maximization: Inferring model parameters and class labels

Kernels and Clustering

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Image Segmentation Using Iterated Graph Cuts BasedonMulti-scaleSmoothing

Nearest neighbors classifiers

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Improving K-Means by Outlier Removal

Note Set 4: Finite Mixture Models and the EM Algorithm

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

K-Means Clustering 3/3/17

SD 372 Pattern Recognition

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

K-Means Clustering. Sargur Srihari

CS 343: Artificial Intelligence

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Machine Learning Lecture 3

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Latent Variable Models and Expectation Maximization

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Norbert Schuff VA Medical Center and UCSF

Unsupervised Learning

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Based on Raymond J. Mooney s slides

Missing Data Analysis for the Employee Dataset

Transcription:

CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this repository or the names of any files we have added to it. Please perform a git pull to retrieve these files. You will find within it: baboon.tiff, data loader.py, gmm.py, gmm.sh, gmmtest.py, kmeans.py, kmeans.sh, kmeanstest.py, utils.py. Depending on your environment you may need to install the python library named, pillow, which is used by matplotlib to process some of the images needed for this assignment. You do not need to import pillow in the files we provide, it only needs to be installed in your environment. You can install it by running pip install pillow in your command line. High Level Description 0.1 Tasks In this assignment you are asked to implement K-means clustering to identify main clusters in the data, use the discovered centroid of cluster for classification, and implement Gaussian Mixture Models to learn a generative model of the data. Specifically, you will Implement K-means clustering algorithm to identify clusters in a two-dimensional toy-dataset. Implement image compression using K-means clustering algorithm. Implement classification using the centroids identified by clustering. Implement Gaussian Mixture Models to learn a generative model and generate samples from a mixture distribution. 0.2 Running the code We have provided two scripts to run the code. Run kmeans.sh after you finish implementation of k-means clustering, classification and compression. Run gmm.sh after you finish implementing Gaussian Mixture Models. 0.3 Dataset Through out the assignment we will use two datasets (See fig. 1) Toy Dataset and Digits Dataset (you do not need to download). Toy Dataset is a two-dimensional dataset generated from 4 Gaussian distributions. We will use this dataset to visualize the results of our algorithm in two dimensions. We will use digits dataset from sklearn [1] to test K-means based classifier and generate digits using Gaussian Mixture model. Each data point is a 8 8 image of a digit. This is similar to MIST but less complex. 1

(a) Digits (b) 2-D Toy Dataset Figure 1: Datasets 0.4 Cautions Please DO OT import packages that are not listed in the provided code. Follow the instructions in each section strictly to code up your solutions. Do not change the output format. Do not modify the code unless we instruct you to do so. A homework solution that does not match the provided setup, such as format, name, initializations, etc., will not be graded. It is your responsibility to make sure that your code runs with the provided commands and scripts on the VM. Finally, make sure that you git add, commit, and push all the required files, including your code and generated output files. 0.5 Final submission After you have solved problem 1 and 2, execute bash kmeans.sh command and bash gmm.sh command. Git add, commit and push plots and results folder and all the *.py files. Problem 1 K-means Clustering Recall that for a dataset x 1,..., x R D, the K-means distortion objective is J({µ k }, {r ik }) = 1 K i=1 k=1 r ik µ k x i 2 2 (1) where µ 1,..., µ K are centroids of the K clusters and r ik {0, 1} represents whether example i belongs to cluster k. Clearly, fixing the centroids and minimizing J over the assignment give { 1 k = arg min ˆr ik = k µ k x i 2 2 (2) 0 Otherwise. On the other hand, fixing the assignment and minimizing J over the centroids give ˆµ k = i=1 r ikx i i=1 r ik (3) What the K-means algorithm does is simply to alternate between these two steps. See Algorithm 1 for the pseudocode. 2

Algorithm 1 K-means clustering algorithm 1: Inputs: An array of size D denoting the training set, x Maximum number of iterations, max iter umber of clusters, K Error tolerance, e 2: Outputs: Array of size K D of means, {µ k } k=1 K Membership vector R of size, where R[i] [K] is the index of the cluster that example i belongs to. 3: Initialize: Set means {µ k } k=1 K to be K points selected from x uniformly at random (with replacement), and J to be a large number (e.g. 10 10 ) 4: repeat 5: Compute membership r ik using eq. 2 6: Compute distortion measure J new using eq. 1 7: if J J new e then STOP 8: end if 9: Set J := J new 10: Compute means µ k using eq. 3 11: until maximum iteration is reached 1.1 Implementing K-means clustering algorithm Implement Algorithm 1 by filling out the TODO parts in class KMeans of file kmeans.py. ote the following: Use numpy.random.choice for the initialization step. If at some iteration, there exists a cluster k with no points assigned to it, then do not update the centroid of this cluster for this round. While assigning a sample to a cluster, if there s a tie (i.e. the sample is equidistant from two centroids), you should choose the one with smaller index (like what numpy.argmin does). After you complete the implementation, execute bash kmeans.sh command to run k-means on toy dataset. You should be able to see three images generated in plots folder. In particular, you can see toy dataset predicted labels.png and toy dataset real labels.png and compare the clusters identified by the algorithm against the real clusters. Your implementation should be able to recover the correct clusters sufficiently well. Representative images are shown in fig. 2. Red dots are cluster centroids. ote that color coding of recovered clusters may not match that of correct clusters. This is due to mis-match in ordering of retrieved clusters and correct clusters (which is fine). 3

(a) Predicted Clusters (b) Real Clusters Figure 2: Clustering on toy dataset 1.2 Image compression with K-means In the next part, we will look at lossy image compression as an application of clustering. The idea is simply to treat each pixel of an image as a point x i, then perform K-means algorithm to cluster these points, and finally replace each pixel with its centroid. What you need to implement is to compress an image with K centroids given. Specifically, complete the function transform image in the file kmeanstest.py. After your implementation, execute bash kmeans.sh again and you should be able to see an image baboon compressed.png in the plots folder. You can see that this image is distorted as compared to the original baboon.tiff. 1.3 Classification with k-means Another application of clustering is to obtain a faster version of the nearest neighbor algorithm. Recall that nearest neighbor evaluates the distance of a test sample from every training point to predict its class, which can be very slow. Instead, we can compress the entire training set to just the K centroids, where each centroid is now labeled as the majority class of the corresponding cluster. After this compression the prediction time of nearest neighbor is reduced from O() to just O(K) (see Algorithm 2 for the pseudocode). Algorithm 2 Classification with K-means clustering 1: Inputs: Training Data : {X,Y} Parameters for running K-means clustering 2: Training: Run K-means clustering to find centroids and membership (reuse your code from Problem 1.1) Label each centroid with majority voting from its members. i.e. arg max c i r ik I{y i = c} 3: Prediction: Predict the same label as the nearest centroid (that is, 1- on centroids). ote: 1) break ties in the same way as in previous problems; 2) if some centroid doesn t contain any point, set the label of this centroid as 0. Complete the fit and predict function in KMeansClassifier in file kmeans.py. Once completed, run kmeans.sh to evaluate the classifier on a test set. For comparison, the script will also print accuracy of a logistic classifier and a nearest neighbor classifier. (ote: a naive K-means classifier may not do well but it can be an effective unsupervised method in a classification pipeline [2].) 4

Problem 2 Gaussian Mixture Model ext you will implement Gaussian Mixture Model (GMM) for clustering and also generate data after learning the model. Recall the key steps of EM algorithm for learning GMMs on Slide 52 of Lec 8 (we change the notation ω k to π k ): γ ik = π k (x i ; µ k, Σ k ) k π k (x i ; µ k, Σ k ) k = γ ik (5) i=1 µ k = i=1 γ ikx i k (6) Σ k = i=1 γ ik(x i µ k )(x i µ k ) T k (7) π k = k Algorithm 3 provides a more detailed pseudocode. Also recall the incomplete log-likelihood is ln p(x n ) = n=1 n=1 ln K k=1 π k (x i ; µ k, Σ k ) = n=1 ln K k=1 π k (2π) D Σ k exp ( 1 2 (x i µ k ) T Σ 1 k (x i µ k ) ) (4) (8) (9) 2.1 Implementing EM Implement EM algorithm (class Gaussian pdf, function fit and function com pute log likelihood) in file gmm.py to estimate mixture model parameters. Please note the following: For K-means initialization, the inputs of K-means are the same as those of EM s. When computing the density of a Gaussian with covariance matrix Σ, use Σ = Σ + 10 3 I when Σ is not invertible (in case it s still not invertible, keep adding 10 3 I until it is invertible). After implementation execute bash gmm.sh command to estimate mixture parameters for toy dataset. You should see a Gaussian fitted to each cluster in the data. A representative image is shown in fig. 3. We evaluate both initialization methods and you should observe that initialization with K-means usually converges faster. Figure 3: Gaussian Mixture model on toy dataset 5

Algorithm 3 EM algorithm for estimating GMM parameters 1: Inputs: An array of size D denoting the training set, x Maximum number of iterations, max iter umber of clusters, K Error tolerance, e Init method K-means or random 2: Outputs: Array of size K D of means, {µ k } K k=1 Variance matrix Σ k of size K D D A vector of size K denoting the mixture weights, pi k 3: Initialize: For random init method: initialize means uniformly at random from [0,1) for each dimension (use numpy.random.rand), initialize variance to be identity matrix for each component, initialize mixture weight to be uniform. For K-means init method: run K-means, initialize means as the centroids found by K-means, and initialize variance and mixture weight according to Eq. (7) and Eq. (8) where γ ik is the binary membership found by K-means. 4: Compute the log-likelihood l using Eq. (9) 5: repeat 6: E Step: Compute responsibilities using Eq. (4) 7: M Step: Estimate means using Eq. (6) Estimate variance using Eq. (7) Estimate mixture weight using Eq. (8) 8: Compute new log-likelihood l new 9: if l l new e then STOP 10: end if 11: Set l := l new 12: until maximum iteration is reached 2.2 Implementing sampling We also fit a GMM with K = 30 using the digits dataset. An advantage of GMM compared to K-means is that we can sample from the learned distribution to generate new synthetic examples which look similar to the actual data. To do this, implement sample function in gmm.py which uses self.means, self.variances and self.pi k to generate digits. Recall that sampling from a GMM is a two step process: 1) first sample a component k according to the mixture weight; 2) then sample from a Gaussian distribution with mean µ k and variance Σ k. Use numpy.random for these sampling steps. After implementation, execute bash gmm.sh again. This should produce visualization of means µ k and some generated samples for the learned GMM. Representative images are shown in fig. 4. 6

(a) Means of GMM learnt on digits (b) Random digits sample generated from GMM Figure 4: Results on digits dataset References [1] sklearn.datasets.digits http://scikit-learn.org/stable/modules/generated/sklearn. datasets.load_digits.html#sklearn.datasets.load_digits [2] Coates, A., & g, A. Y. (2012). Learning feature representations with k-means. In eural networks: Tricks of the trade (pp. 561-580). Springer, Berlin, Heidelberg. 7