Machine Problem 8 - Mean Field Inference on Boltzman Machine

Similar documents
CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm

Markov Networks in Computer Vision

Markov Networks in Computer Vision. Sargur Srihari

Evaluating Classifiers

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

[Programming Assignment] (1)

Louis Fourrier Fabien Gaie Thomas Rolf

Classification Algorithms in Data Mining

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

CS 231A Computer Vision (Fall 2012) Problem Set 3

Online Algorithm Comparison points

Evaluating Classifiers

MNIST handwritten digits

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Deep Generative Models Variational Autoencoders

CS 223B Computer Vision Problem Set 3

Digital Image Processing Laboratory: MAP Image Restoration

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

Day 3 Lecture 1. Unsupervised Learning

Bilevel Sparse Coding

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 4: Neural Networks Learning

2. On classification and related tasks

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Probabilistic Classifiers DWML, /27

Image Processing. Filtering. Slide 1

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Evaluating Machine-Learning Methods. Goals for the lecture

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

Evaluating Machine Learning Methods: Part 1

CSE 490R P1 - Localization using Particle Filters Due date: Sun, Jan 28-11:59 PM

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

No more questions will be added

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

6.867 Machine Learning

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

CS145: INTRODUCTION TO DATA MINING

Learning and Generalization in Single Layer Perceptrons

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

CS Introduction to Data Mining Instructor: Abdullah Mueen

Dimension Reduction of Image Manifolds

Estimating Human Pose in Images. Navraj Singh December 11, 2009

The exam is closed book, closed notes except your one-page cheat sheet.

CPSC 535 Assignment 1: Introduction to Matlab and Octave

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

Efficient Feature Learning Using Perturb-and-MAP

Lecture 3: Conditional Independence - Undirected

Birkbeck (University of London) Department of Computer Science and Information Systems. Introduction to Computer Systems (BUCI008H4)

Implicit Mixtures of Restricted Boltzmann Machines

Birkbeck (University of London) Department of Computer Science and Information Systems. Introduction to Computer Systems (BUCI008H4)

To earn the extra credit, one of the following has to hold true. Please circle and sign.

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT

THE VARIABLE LIST Sort the Variable List Create New Variables Copy Variables Define Value Labels... 4

08 An Introduction to Dense Continuous Robotic Mapping

Feature extraction. Bi-Histogram Binarization Entropy. What is texture Texture primitives. Filter banks 2D Fourier Transform Wavlet maxima points

Keras: Handwritten Digit Recognition using MNIST Dataset

CS 224N: Assignment #1

Lecture 25: Review I

CS446: Machine Learning Spring Problem Set 1

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

ECE 5470 Classification, Machine Learning, and Neural Network Review

List of Exercises: Data Mining 1 December 12th, 2015

From processing to learning on graphs

Random projection for non-gaussian mixture models

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

All lecture slides will be available at CSC2515_Winter15.html

Autoencoders, denoising autoencoders, and learning deep networks

Lecture 19: November 5

A Fast Learning Algorithm for Deep Belief Nets

Data Mining and Knowledge Discovery: Practice Notes

Massachusetts Institute of Technology. Department of Computer Science and Electrical Engineering /6.866 Machine Vision Quiz I

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Deep Boltzmann Machines

CS294-1 Assignment 2 Report

CS 231A Computer Vision (Winter 2014) Problem Set 3

Perceptron: This is convolution!

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

9.1. K-means Clustering

Data Representation 1

Learning Meanings for Sentences with Recursive Autoencoders

Neural Networks: promises of current research

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Deep Learning Photon Identification in a SuperGranular Calorimeter

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Skin and Face Detection

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

3 Nonlinear Regression

Tutorial Deep Learning : Unsupervised Feature Learning

CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007.

Transcription:

CS498: Applied Machine Learning Spring 2018 Machine Problem 8 - Mean Field Inference on Boltzman Machine Professor David A. Forsyth Auto-graded assignment Introduction Mean-Field Approximation is a useful method for inference, that originated in statistical physics. The Ising model in its two-dimensional form is difficult to solve exactly, and therefore the mean-field approximation methodology was utilized. You can find the similarity of the ising model with the probabilistic models you learned in this chapter. This assignment will incorporate Mean field inference for denoising binary images. The MNIST dataset consists of 60,000 images of handwritten digits, curated by Yann LeCun, Corinna Cortes, and Chris Burges. You can find the dataset at http://yann.lecun.com/exdb/mnist/ together with a collection of statistics on recognition, etc. You will be using an autograder for this assignment. Therefore, we advise you to follow the steps exactly. Figure 1: A handful of images in the MNIST dataset 1

CS498: Applied Machine Learning 2 Setting up the machine 1. Obtaining the dataset Obtain the MNIST training set, and binarize the first 20 images by mapping any value below.5 to -1 and any value above to 1. Hint: The original dataset at http://yann.lecun.com/exdb/mnist/ is in compressed format. You can utilize other people s code at the web for converting this data to a usable matrix format. For instance, we found some online code that can take care of this for you, but we do not guarantee that these online code-pieces would be matching the original dataset. You should know that only the original dataset is the reference for this assignment. https://github.com/datapythonista/mnist is a python package for downloading the dataset and loading it into some numpy arrays. https://gist.github.com/brendano/39760 is some piece of code that could be used for processing the MNIST data on R. 2. Adding pre-determined noise to the dataset For each image, create a noisy version by flipping some of the pixels. The exact location of the pixels you need to flip for each image is given to you in the supplementary file NoiseCoordinates.csv. All values in this supplementary file are 0-based. In other words, the images are indexed as Image 0, 1,, 19. Also, the top left pixel of an image is indexed as being in row 0 and column 0. Description Noisy bit 0 Noisy bit 1 Noisy bit 2 Noisy bit 3 Noisy bit 4 Image 0 Row 21 3 10 27 19 Image 0 Column 17 17 22 9 20 Image 1 Row 26 9 4 0 20 Image 1 Column 17 3 20 22 6 Image 2 Row 13 24 24 11 0 Image 2 Column 27 1 21 3 13 Image 3 Row 10 16 25 15 27 Image 3 Column 3 22 6 12 23 Image 4 Row 6 14 23 16 7 Image 4 Column 9 22 17 12 14 Image 5 Row 9 13 14 20 26 Image 5 Column 27 5 20 25 20 Table 1: A sample of the noise location matrix as given in NoiseCoordinates.csv supplementary file. The first flipping noise in image 0 happens at the pixel in row number 21 and column number 17. Please notice that all values in this matrix are 0-based. For instance, the top left pixel of an image is indexed as being in row 0 and column 0. 3. Building a Boltzman Machine for denoising the images and using Mean-Field Inference

CS498: Applied Machine Learning 3 Now denoise each image using a Boltzmann machine model and mean field inference. For this: Use θ ij = 0.8 for the H i, H j terms and θ ij = 2 for the H i, X j terms. The order in which you have to update for each image is given in the supplementary file UpdateOrderCoordinates.csv. Please see table 2 as a sample of this matrix. Description Pixel 0 Pixel 1 Pixel 2 Pixel 3 Pixel 4 Pixel 5 Pixel 6 Image 0 Row 26 15 13 1 22 27 9 Image 0 Column 4 15 5 4 17 17 27 Image 1 Row 0 27 3 25 20 22 5 Image 1 Column 18 13 25 26 22 4 0 Image 2 Row 10 7 8 22 23 1 3 Image 2 Column 22 17 26 0 21 19 12 Image 3 Row 1 17 18 20 7 14 9 Image 3 Column 19 22 17 3 20 11 13 Table 2: A sample of the update order coordinates matrix as given in UpdateOrderCoordinates.csv supplementary file. Please notice that all values in this matrix are 0-based, which means the top left pixel of an image is indexed as being in row 0 and column 0. As an example, in image 0, the following order of updating the distribution parameters should happen: First the pixel indexed at (26,4) row and column respectively, then the pixel at (15,15), then the pixel at (13,5), then the pixel at (1,4), and so on. The initial parameters of the model can be found in the supplementary file named as InitalParametersModel.csv. This file is the initial matrix Q stored as comma-separated values, with a dimension of 28 28. Each entry of the matrix falls in the [0, 1] interval, and the Q r,c entry denotes the q r,c [H r,c = 1] initial probability. Here, a slight change of notation happened with respect to what you have learned in the course; what you have been denoting as q i [H i = 1] through the course, is now named q r,c [H r,c = 1] since we have a hidden state for each pixel at the r row and c column index. Please note that the same initial parameters are used for all the Boltzman machines built for each image. You should run the Mean-Field inference for exactly 10 iterations, where in each iteration the Q model distribution is updated with the same order shown and discussed in table 2. 4. Turning in the energy function values computed initially and after each iteration You have to compute the variational free energy as given in the following. E Q = E Q [log Q] E Q [log P (H, X)] where log P (H, X) = θ ij H i H j + i H j N (i) H i H j N (i) X θ ij H i X j + K Remember that we made a model assumption about the distribution

CS498: Applied Machine Learning 4 Q[H] = 28 28 r=1 c=1 q r,c [H r,c ] Using this independence assumption, you can easily compute the entropy term r=1 c=1 E Q [log Q] = E qr,c [log q r,c ] r=1 c=1 [ ] = q r,c [H r,c = 1] log(q r,c [H r,c = 1] + ɛ) + q r,c [H r,c = 1] log(q r,c [H r,c = 1] + ɛ) In order to avoid any computational complications in cases where you may need to compute 0 log 0, we have added a very tiny value of ɛ = 10 10 insider the log. Furthermore, we can simplify the log-likelihood term as well, thanks to the independence assumption made when defining the Q distribution [ E Q [log P (H, X)] = K + E Q θ ij H i H j + ] θ ij H i X j i H j N (i) H i H j N (i) X = K + θ ij E Q [H i H j ] + θ ij E Q [H i ]X j i H j N (i) H i H j N (i) X = K + θ ij E qi [H i ]E qj [H j ] + θ ij E qi [H i ]X j i H j N (i) H i H j N (i) X where E qk [H k ] = (q k [H k = 1]) (1) + (1 q k [H k = 1]) ( 1) = 2q k [H k = 1] 1 Since the K value is intractable to compute and has no effect on the optimization process of mean field approximation, we will ignore it by setting K = 0. Compute the E Q energy, and evaluate it both initially and after each iteration of updating the Q matrix. You will have to turn in a comma-separated values file which includes the variational free energy values as defined below. E = [ Image 10 E (0) Q 10 Image 11 E (0) Q 11 Initially After one iteration E (1) Q 10 E (1) Q 11 ] where E (n) Q denotes the variational free energy of the Boltzman Machine used for the image indexed m m after the n th iteration of mean field inference. Please note that the first column has the energy of the initial Q matrix given in the supplementary file with respect to each image. Do not include any headers or row names in the CSV file. You can find a sample of the energy matrix we computed for the first ten images of the dataset in the supplementary file EnergySamples.csv. As you can follow in the textbook formulation, the θ ij H i H j term appears in the summation twice; one time when your counting the neighbors of i node, and one time when you re counting the neighbors of the j node. We use the same formulation of the textbook in this assignment. Make sure that you are following this formulation when computing the Energy values, and the reconstructed images.

CS498: Applied Machine Learning 5 5. Displaying the reconstructed images Run the mean-field approximation method for exactly 10 iterations (i.e. going over and updating the hidden distribution of each pixel for exactly 100 times) on the images indexed 10, 11,, 19. After that, based on the approximated Q distribution, compute the MAP image, and store it in a binary {0, 1} matrix format. You should turn in a prediction matrix with a shape of 28 rows, and 280 columns as defined in the following. ˆX = ˆX 10 {0, 1} 28 28 ˆX11 {0, 1} 28 28 ˆX19 {0, 1} 28 28 As you can see ˆX {0, 1} 28 280. Turn in this matrix as a binary CSV file with exactly 28 rows and 280 columns. You can find the denoised images of the first ten images of the dataset stored properly in SampleDenoised.csv supplementary file. In order to give you a hint of what the sample results look like, you can see figure 2, where we have provided you with the result of the method on the first ten images of the dataset. Please note that you have to report your results on the second ten images of the dataset as mentioned above. Those images can be seen in figure 3. Figure 2: A sample figure for the first ten digits in the training set of the MNIST dataset. The top row shows the original images, the middle row shows the noisy version, and the bottom row shows the reconstructed images. The leftmost digit is the image indexed 0 in the dataset. The second leftmost digit is the image indexed 1 in the dataset, and the rightmost image is indexed 9 in the dataset. 6. Construction of an ROC curve Assume that θ ij for the H i, H j terms takes a constant value c. We will investigate the effect of different values of c on the performance of the denoising algorithm. Think of your algorithm as a noise detection device that accepts an image, and for each pixel, predicts 1 (i.e. positive) whenever the pixel is flipped because of noise or -1 (i.e. negative) otherwise. You can evaluate this in the same way we evaluate a binary classifier, because you know the right value of each pixel. A receiver operating curve is a curve plotting the true positive rate against the false positive rate for a predictor, for different values of some useful parameter. We will use c as our parameter.

CS498: Applied Machine Learning 6 Figure 3: The ten images of the dataset that you have to report the results for. The top row shows the original images, and the bottom row shows the noisy images by adding the predetermined noise. The leftmost digit is the image indexed 10 in the dataset. The second leftmost digit is the image indexed 11 in the dataset, and the rightmost image is indexed 19 in the dataset. Compute the FPR and TPR statistics of this device for c {5, 0.6, 0.4, 0.35, 0.3, 0.1} using the images indexed at 10, 11,, 19 (again, this is a zero-based indexing which means that the image indexed at 10 is the eleventh actual image of the dataset). The first column must contain the FPR values, and the second column must contain the TPR values. First row must correspond to c = 5, and the last row must correspond to c = 0.1. Do not include any header or row names in your generated CSV file.