CSC 2515 Introduction to Machine Learning Assignment 2

Size: px
Start display at page:

Download "CSC 2515 Introduction to Machine Learning Assignment 2"

Transcription

1 CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu( )

2 Problem 1 See attached scan files for question 1.

3 2. Neural Network 2.1 Examine the statistics and plots of training error and validation error (generalization). Since this problem needs to run 5 times and each time run 100 epochs for Matlab. We need to run more than 500 times straightly for python or save models and I prefer the first solution. I set eps=0.1, momentum remains 0 and number of epochs equals The cross entropy and classification rate are as below: Step 4999 Train CE Validation CE mean_classification_error Error: Train Validation Test fr_rate: Train Validation Test Step 4999 Train CE Validation CE mean_classification_error Error: Train Validation Test fr_rate: Train Validation Test Above is the final results after 5000 eopchs. From the detail printing out, I figured out that: 1. Within 5000 iterations, the cross entropy of training set keeps going down. On the other hand, the cross entropy of validation set reaches the lowest point (0.0520) at about 2600 epochs, after which the curves start to go up due to overfitting again. 2. The training set becomes 100% fit the model within 5000 iterations, approximately starting from 2400 epochs. However, the lowest point (0.015) of the validation set reaches at around 800 epochs, after which the curves start to go up due to overfitting again. 3. Since the cross entropy doesn t change too much, we are supposed to more focus on the accuracy of validation set, which means 800 epochs might be a nice choice for observation of the local optima for the curve. Also, due to the cross entropy and accuracy are around 2600 and 800 epochs reaching optima, we should observe the curve in the longer range, which probably

4 should extends to 5000 epochs to see how the whole curve goes.

5 2.2 Classification Error Let s set eps = 0.1, mom = 0.0 and number of epochs stays in 3000, because such number of epochs are good enough to see convergence. The results are as below: Step 0 Train CE Validation CE mean_classification_error Step 100 Train CE Validation CE mean_classification_error Step 200 Train CE Validation CE mean_classification_error Step 2999 Train CE Validation CE mean_classification_error Error: Train Validation Test fr_rate: Train Validation Test

6 2.3 Learning rate Since last experiment we figure out that some models require over 2000 iterations, I therefore want to set 5000 epochs to see further changes of curves. I will record the optimal accuracy and classification error rate for the lowest point. Firstly, I want to do control variable method to see what is going on when we keep eps or momentum unchanged to see the other variable effect the curves. Then I want to try different combination of all eps and momentum to look for more patterns and the best parameters Control variable method( for eps and mom respectively) eps I set momentum remains 0.0, hidden layer remains 10 and let s try 5000 epochs to see the whole picture of the curves.

7 Above is when eps = 0.01.

8 Above is when eps = 0.2.

9 Above is when eps = 0.5. Conclusion: 1) As the learning rate(eps) goes bigger, the less number of epochs it

10 took to get converged, for both correction rate and cross entropy. 2)Expect when eps = 0.01, the curves have not converged yet. The lowest points before overfitting for other parameters are pretty close, no big difference momentum I will set eps = 0.5, which converged fastest in the last case, as well as remains other parameters unchanged to see more about momentum as below( since I ve shown eps = 0.5 and momentum =0 in the last case, I will skip this one):

11 Above is when momentum = 0.5.

12 Above is when momentum = 0.9. Conclusion: 1) Similar to eps, as momentum goes bigger, the quicker curves get converged for both cross entropy and correction rate. 2) When curves reach the best point of the curves, there are no big difference for best cross entropy or correction rate Try all combination I run all the 9 combinations, e.g eps={0.01, 0.2, 0.5} & momentum={0, 0.5, 0.9} and the overall results as below: eps mom Cross error optimal iter optimal iter for ment um Entropy rate for CE error rate everything still goes down

13 it goes up and down many times in the begininig From this chart, it s very obvious to see that as the eps or momentum goes bigger, the quicker curves would get convergence for both correction rate and cross entropy. In addition, the best points for different combination are of no big difference. However, there is a very interesting combination when eps=0.01 and momentum = 0.9. The chart of their correction rate is as below: You can see there are some fluctuations in the beginning of the curve, which means the algorithm bounce around in the beginning. In this case, if we don t set

14 our number of epochs well, there might be reaching fake optima before really converging. In order to avoid this case, I would suggest let s set our parameters as close to each other as possible. Conclusion: 1) Bigger eps and momentum helps curves converge more quickly. 2) Don t set eps and momentum far away from each other, which perhaps intrigue fake optima. 3) If we only focus on the speed of convergence, I would suggest the parameters to be set as bigger as possible because there are no big difference for the performance of correction rate and cross entropy. 2.4 Number of hidden units Since we have seen this influence of different parameters under 5000 epochs, I will try 1000 epochs to see convergence, which is good enough to see. Above is when number of hidden units = 2.

15 Above is when number of hidden units = 5. Above is when number of hidden units = 10.

16 Above is when number of hidden units = 30.

17 Above is when number of hidden units = 100. The effect of this modification on the convergence properties:! The larger numbers of hidden units, the quicker curves converge.! The larger numbers of hidden units, the lower cross entropy we get after training with same number of epochs.! The fraction rate doesn t differ much by using different number of hidden units by the end of training. 2.5 Compare k- NN and Neural Networks when k = 1, classification rate of valid is 0.985, of test is when k = 3, classification rate of valid is 0.990, of test is when k = 5, classification rate of valid is 0.980, of test is when k = 7, classification rate of valid is 0.985, of test is when k = 9, classification rate of valid is 0.985, of test is The results of knn is generally like above. The error rate of knn is slightly better than neural network does, which is versus 0.99 for validation set and 0.97

18 versus 0.99 for test set. Interestingly, the classification rate of knn are very close to each other (round to 0.01) by using different k values, which means the classification of sample data is insensitive to the size of k to some extent.! Efficiency: In this case, it seems knn executes a little bit more quickly than neural network does (hidden units = 10). However, if the size of sample goes larger, it will be definitely clear that neural network plays better performance on the speed of classification.! Non- parametric vs parametric: It s quite clear that knn is of no parameters while neural network requires a lot of parameters to implement the algorithm.! Model sophistication: Since neural network involves input layer, output layer and perhaps many hidden layers and each iteration requires back- propagation, it s inevitable that neural network is a very complicated model. On the other hand, knn is very easy to implement and only take the distance into account.! Accuracy: In this case, knn is slightly better than neural network on test set and even reaches exactly same accuracy for validation set. However, after I read some papers, they say neural network plays better performance in general.

19 3 Mixture of Gaussians 3.2 Trainings 3.2.1RandConst & iter I tried randconst = {0.01,0.1,1,2,5,10} and find out that there is no big difference choosing ranconst. I wrote a new function called q2_1 in which I tried each randconst for more than 100 times and return me the biggest log- likelihood it gets. It turns out that the best randconst different for every time. Therefore, I run 100 times and set a mark to record the highest log- likelihood corresponding to which randconst within {0.01,0.1,1,2,5,10}. The results are as follow: 1: 22 2: 32 5: 24 10: : 0 0.1: 4 From the result that I run for 100 times, we can see there are no big difference for ranconst bigger than 1. Since randconst = 2 does the best performance among these parameters, I will generally choose 2 as my best choice for randconst. After I run several times, I found iter = 10 cannot show the whole picture of curve getting convergence. Usually, for model 2 s, curves converge among 6-15 iterations while 3 s converges among iterations. Therefore, I will set iter=20 in this question Mean vectors and variance vectors I extracted sample data from 2 and 3 respectively and show the mean and variance of both of classes after training by using mixture of Gaussian as below:

20 Above is mean and variance for 2 s.

21 Above is mean and variance of 3 s Mixing proportions

22 For 2 s, the mixing proportion is and For 3 s, the mixing proportion is and LogP(Training Data) I run my code for 2 s and 3 s respectively and they all get converged within 15 iterations. For model 2 s, the final log probability is On the other hand, for model 3 s, the final log probability is The plots are as below: Above is the plot of 2 s.

23 Above is the plot of 3 s. 3.3 Initializing a mixture of Gaussian with k- means Speed of convergence I tried usual initialization and k- means initialization respectively and print the logp for them. The plot graphs are as below:

24 Above is the one with using k- means. Above is the one without using k- means.

25 You can see the one without k- means to initialize parameters converges around more than 15 iterations of EM but the one with k- means, on the other hand, quickly converges at around 6 to 10 iteration of EM. It indeed demonstrates the mixture of Gaussian could be accelerated to a faster convergence by using k- means to initialize mu. This makes sense intuitively since running k- means as a form of preprocessing will start the Gaussians off in a more reasonable and sensible place than just initializing them randomly Final log- prob using traditional initialization Iter 0 logprob Iter 1 logprob Iter 2 logprob Iter 3 logprob Iter 4 logprob Iter 5 logprob Iter 6 logprob Iter 7 logprob Iter 8 logprob Iter 9 logprob Iter 10 logprob Iter 11 logprob Iter 12 logprob Iter 13 logprob Iter 14 logprob Iter 15 logprob Iter 16 logprob Iter 17 logprob Iter 18 logprob Iter 19 logprob Logprob : Train Valid Test

26 initialize with kmeans and iter=20 Iter 0 logprob Iter 1 logprob Iter 2 logprob Iter 3 logprob Iter 4 logprob Iter 5 logprob Iter 6 logprob Iter 7 logprob Iter 8 logprob Iter 9 logprob Iter 10 logprob Iter 11 logprob Iter 12 logprob Iter 13 logprob Iter 14 logprob Iter 15 logprob Iter 16 logprob Iter 17 logprob Iter 18 logprob Iter 19 logprob Logprob : Train Valid Test From the print- out result we can see, the one starting with kmeans to initialize their parameters ends up with higher log probability. This means we can have better maximization with our model by using kmeans to initialize parameters. Not only quicker to converge, but also have better performance on max the log probability. 3.4 Classification using MoGs I'm assuming that the priors of each model are 0.5. Notice that since both P(x) and the priors are constant across both classes, we can just use P(x d) to classify,

27 since P(d x) ~ P(x d). I tried the number of mixing component in {2,3,5,10, 15,20,25, 30} You should find that the error rates on the training sets generally decrease as the number of clusters increases. Explain why. This is because the more numbers of clusters you try to classify, the easier to classify the new input data into the right cluster after training. This is like you set more variables in logistic regression to describe the model, in which more variables is more likely to fit the model within certain number of iterations Examine the error rate curve for the test set and discuss its properties. Explain the trends that you observe. I run the script for several times and the graphs of result are not exactly the same each time. Sometimes the result is as above, that k=25 leads to the lowest average error rate and sometime is as below that k=20 is the optimized point.

28 I think the reason of being that is the model could be fit better as we increase the number of mixture components at first, which brings more accurate clustering for data. However, beyond certain number of mixture components, let s say 20 or 25 in this problem, the model becomes relatively overfitting so that as training data keeps going down, test data starts to go up for the classification error rate If you wanted to choose a particular model from your experiments as the best, how would you choose it? If your aim is to achieve the lowest error rate possible on the new images your system will receive, which model (number of clusters) would you select? Why? I d like to choose it by observing and comparing the error rate. If I could the number of mixture component from the parameters I selected, I probably choose 20 as my best choice, since I run my code for over 10 times. However, if I could only choose from the parameters given from 3.4, I might choose 15 as my selection. It s because when number of mixture component is too small, the training set still get space(by making more clusters) to promote accuracy but when number of mixture component is too big, the model will become overfit and have negative influence on the validation and test set. K=15 is just in

29 between that perfectly train the training data and doesn t overfit the model. There is a graph I plotted explaining a lot:

30 3.5 Bonus Question: Mixture of Gaussians vs Neural Network Visualize and compare Above is the last 3 out of 30 subplots for neural network s input layer weigths. Since we have 15 as our best choice for number of components for 2 s and 3 s respectively, I suppose we should use 30 clusters i.e. K=30 for neural network because it directly use the whole 600 dataset. Due to 30 clusters, the output is quite small( so I only plot last three to see more clearly) but enough to see the pixel inside each subplot is quite blur and rough in comparison of MoG ones. I suppose there are several reasons towards this phenomenon:! In MoG, we trained 2 s and 3 s respectively, which must be more accurate than neural network to train them together.! In MoG, we plotted mean and variance, in which tell the properties of after- trained model. However, in neural network, the input to hidden layers is only a part of the model properties and the rest of properties are within hidden layers. In this way, the plot of W1 only tells a little part of story.

31 As for classification rate, the result is as below. The classification error rates for validation and test set are and respectively, which is slightly poorer than MoG but still good enough. CE: Train Validation Test fr_rate: Train Validation Test Neural Networks (30 hidden units) MoG (K=15) Training Set 1 [0/600] 1 [0/600] Validation Set [5/200] [1/200] Test Set [13/400] [6/400] Visualize the input to hidden weights as images to see what your network has learned I only picked up the last three subplots so we can see better.

32 This is the randomly initial weights of input layer. This is after 30 iterations. This is the end of 60 iterations.

33 After 90 iterations, the curves have converged and similarly the pixel plots don t change much since then. As is shown above, the plots get clearer as the training more iterations. In the beginning, pixels are dispersed randomly while, in the end, starts to have some obviously bright places and dark places in every plot Compare hidden unit weights versus mixture component Intuitively, the essence the probability in mixture of Gaussian or the hidden unit weights in neural network is supposed to represent some features. In MoG, the probability is by no means to be negative and therefore the weights being close to zero could push the results close to zero, which makes it close to 2 and on the other around to 3. In neural network, the weights in hidden layers could be negative, which inclined to make the final result to zero, or could be positive, which inclined to make the final result close to 1, being 3. I printed the last three w2 and plot the last three subplots accordingly, as below:

34 What I got from w2 accordingly is as below: [ ] [ ] [ ]! You can see clearly from the pictures. If we deem that the bright part of the picture represent the feature of either 2 or 3, first picture is mostly resemble to 2 and the rest two are resemble to 3 apparently. I also examined all other subplots and found that the negative number stands for 2 s features while positive number stands for 3 s features.! It makes sense. If the weights are negative, the final results are more likely to push to 0, which makes the subplot resemble 2, and vice versa.! It s in some way similar to the mixing proportion, i.e. probability in the code. When it s close to 0 then it push the final result close to 0, which makes it resemble to 2 and vice versa as well.! I would say hidden units are more like features instead of clusters. In MoG, K(mixture component) means the number of clusters and p(mixing proportion) represents the probability to belong to which cluster. On the

35 contrary, in neural network, the weights is like trying to stand for a part of the 2 s or 3 s model as a feature or pattern to recognize a particular part of the number.! The way of how neural network works is to let nn learn those features part by part.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5. More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems

More information

COSC 311: ALGORITHMS HW1: SORTING

COSC 311: ALGORITHMS HW1: SORTING COSC 311: ALGORITHMS HW1: SORTIG Solutions 1) Theoretical predictions. Solution: On randomly ordered data, we expect the following ordering: Heapsort = Mergesort = Quicksort (deterministic or randomized)

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Introduction to SNNS

Introduction to SNNS Introduction to SNNS Caren Marzban http://www.nhn.ou.edu/ marzban Introduction In this lecture we will learn about a Neural Net (NN) program that I know a little about - the Stuttgart Neural Network Simulator

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

Classification and K-Nearest Neighbors

Classification and K-Nearest Neighbors Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Lecture 8: The EM algorithm

Lecture 8: The EM algorithm 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

MA 1128: Lecture 02 1/22/2018

MA 1128: Lecture 02 1/22/2018 MA 1128: Lecture 02 1/22/2018 Exponents Scientific Notation 1 Exponents Exponents are used to indicate how many copies of a number are to be multiplied together. For example, I like to deal with the signs

More information

Deep Neural Networks Optimization

Deep Neural Networks Optimization Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy

More information

The Fly & Anti-Fly Missile

The Fly & Anti-Fly Missile The Fly & Anti-Fly Missile Rick Tilley Florida State University (USA) rt05c@my.fsu.edu Abstract Linear Regression with Gradient Descent are used in many machine learning applications. The algorithms are

More information

1

1 Zeros&asymptotes Example 1 In an early version of this activity I began with a sequence of simple examples (parabolas and cubics) working gradually up to the main idea. But now I think the best strategy

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

3.7. Vertex and tangent

3.7. Vertex and tangent 3.7. Vertex and tangent Example 1. At the right we have drawn the graph of the cubic polynomial f(x) = x 2 (3 x). Notice how the structure of the graph matches the form of the algebraic expression. The

More information

The Problem of Overfitting with Maximum Likelihood

The Problem of Overfitting with Maximum Likelihood The Problem of Overfitting with Maximum Likelihood In the previous example, continuing training to find the absolute maximum of the likelihood produced overfitted results. The effect is much bigger if

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Patch-Based Image Classification Using Image Epitomes

Patch-Based Image Classification Using Image Epitomes Patch-Based Image Classification Using Image Epitomes David Andrzejewski CS 766 - Final Project December 19, 2005 Abstract Automatic image classification has many practical applications, including photo

More information

The Mathematics Behind Neural Networks

The Mathematics Behind Neural Networks The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the

More information

RAJESH KEDIA 2014CSZ8383

RAJESH KEDIA 2014CSZ8383 SIV895: Special Module on Intelligent Information Processing Project Report Title: Classification of Iris flower species: Analysis using Neural Network. Submitted By: RAJESH KEDIA 14CSZ8383 Date: -Apr-16

More information

Animations involving numbers

Animations involving numbers 136 Chapter 8 Animations involving numbers 8.1 Model and view The examples of Chapter 6 all compute the next picture in the animation from the previous picture. This turns out to be a rather restrictive

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired? Page: 1 of 14 1 R1 And this is tell me what this is? 2 Stephanie x times y plus x times y or hm? 3 R1 What are you thinking? 4 Stephanie I don t know. 5 R1 Tell me what you re thinking. 6 Stephanie Well.

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

More information

HTML and CSS a further introduction

HTML and CSS a further introduction HTML and CSS a further introduction By now you should be familiar with HTML and CSS and what they are, HTML dictates the structure of a page, CSS dictates how it looks. This tutorial will teach you a few

More information

Here is the data collected.

Here is the data collected. Introduction to Scientific Analysis of Data Using Spreadsheets. Computer spreadsheets are very powerful tools that are widely used in Business, Science, and Engineering to perform calculations and record,

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

Classification and Detection in Images. D.A. Forsyth

Classification and Detection in Images. D.A. Forsyth Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

Divisibility Rules and Their Explanations

Divisibility Rules and Their Explanations Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Image Compression With Haar Discrete Wavelet Transform

Image Compression With Haar Discrete Wavelet Transform Image Compression With Haar Discrete Wavelet Transform Cory Cox ME 535: Computational Techniques in Mech. Eng. Figure 1 : An example of the 2D discrete wavelet transform that is used in JPEG2000. Source:

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.

CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007. CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007. Many image features, such as image lines, curves, local image velocity, and local stereo disparity, can be

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1 Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Predicting Diabetes using Neural Networks and Randomized Optimization

Predicting Diabetes using Neural Networks and Randomized Optimization Predicting Diabetes using Neural Networks and Randomized Optimization Kunal Sharma GTID: ksharma74 CS 4641 Machine Learning Abstract This paper analysis the following randomized optimization techniques

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Clustering Images. John Burkardt (ARC/ICAM) Virginia Tech... Math/CS 4414:

Clustering Images. John Burkardt (ARC/ICAM) Virginia Tech... Math/CS 4414: John (ARC/ICAM) Virginia Tech... Math/CS 4414: http://people.sc.fsu.edu/ jburkardt/presentations/ clustering images.pdf... ARC: Advanced Research Computing ICAM: Interdisciplinary Center for Applied Mathematics

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Programming Exercise 4: Neural Networks Learning

Programming Exercise 4: Neural Networks Learning Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

Derivatives and Graphs of Functions

Derivatives and Graphs of Functions Derivatives and Graphs of Functions September 8, 2014 2.2 Second Derivatives, Concavity, and Graphs In the previous section, we discussed how our derivatives can be used to obtain useful information about

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Missing variable problems

Missing variable problems Missing variable problems In many vision problems, if some variables were known the maximum likelihood inference problem would be easy fitting; if we knew which line each token came from, it would be easy

More information

Pong in Unity a basic Intro

Pong in Unity a basic Intro This tutorial recreates the classic game Pong, for those unfamiliar with the game, shame on you what have you been doing, living under a rock?! Go google it. Go on. For those that now know the game, this

More information

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability

More information

1 Machine Learning System Design

1 Machine Learning System Design Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

OVERVIEW & RECAP COLE OTT MILESTONE WRITEUP GENERALIZABLE IMAGE ANALOGIES FOCUS

OVERVIEW & RECAP COLE OTT MILESTONE WRITEUP GENERALIZABLE IMAGE ANALOGIES FOCUS COLE OTT MILESTONE WRITEUP GENERALIZABLE IMAGE ANALOGIES OVERVIEW & RECAP FOCUS The goal of my project is to use existing image analogies research in order to learn filters between images. SIMPLIFYING

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

If Statements, For Loops, Functions

If Statements, For Loops, Functions Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

SD 372 Pattern Recognition

SD 372 Pattern Recognition SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be

More information

Reading on the Accumulation Buffer: Motion Blur, Anti-Aliasing, and Depth of Field

Reading on the Accumulation Buffer: Motion Blur, Anti-Aliasing, and Depth of Field Reading on the Accumulation Buffer: Motion Blur, Anti-Aliasing, and Depth of Field 1 The Accumulation Buffer There are a number of effects that can be achieved if you can draw a scene more than once. You

More information

Fitting D.A. Forsyth, CS 543

Fitting D.A. Forsyth, CS 543 Fitting D.A. Forsyth, CS 543 Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local can t tell whether a set of points lies on

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016 CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions

More information

Activity A 1-D Free-Fall with v i = 0

Activity A 1-D Free-Fall with v i = 0 Physics 151 Practical 3 Python Programming Freefall Kinematics Department of Physics 60 St. George St. NOTE: Today's activities must be done in teams of one or two. There are now twice as many computers

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Learning from Data Mixture Models

Learning from Data Mixture Models Learning from Data Mixture Models Copyright David Barber 2001-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 1 It is not uncommon for data to come

More information