CSC 2515 Introduction to Machine Learning Assignment 2
|
|
- Wilfrid Palmer
- 5 years ago
- Views:
Transcription
1 CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu( )
2 Problem 1 See attached scan files for question 1.
3 2. Neural Network 2.1 Examine the statistics and plots of training error and validation error (generalization). Since this problem needs to run 5 times and each time run 100 epochs for Matlab. We need to run more than 500 times straightly for python or save models and I prefer the first solution. I set eps=0.1, momentum remains 0 and number of epochs equals The cross entropy and classification rate are as below: Step 4999 Train CE Validation CE mean_classification_error Error: Train Validation Test fr_rate: Train Validation Test Step 4999 Train CE Validation CE mean_classification_error Error: Train Validation Test fr_rate: Train Validation Test Above is the final results after 5000 eopchs. From the detail printing out, I figured out that: 1. Within 5000 iterations, the cross entropy of training set keeps going down. On the other hand, the cross entropy of validation set reaches the lowest point (0.0520) at about 2600 epochs, after which the curves start to go up due to overfitting again. 2. The training set becomes 100% fit the model within 5000 iterations, approximately starting from 2400 epochs. However, the lowest point (0.015) of the validation set reaches at around 800 epochs, after which the curves start to go up due to overfitting again. 3. Since the cross entropy doesn t change too much, we are supposed to more focus on the accuracy of validation set, which means 800 epochs might be a nice choice for observation of the local optima for the curve. Also, due to the cross entropy and accuracy are around 2600 and 800 epochs reaching optima, we should observe the curve in the longer range, which probably
4 should extends to 5000 epochs to see how the whole curve goes.
5 2.2 Classification Error Let s set eps = 0.1, mom = 0.0 and number of epochs stays in 3000, because such number of epochs are good enough to see convergence. The results are as below: Step 0 Train CE Validation CE mean_classification_error Step 100 Train CE Validation CE mean_classification_error Step 200 Train CE Validation CE mean_classification_error Step 2999 Train CE Validation CE mean_classification_error Error: Train Validation Test fr_rate: Train Validation Test
6 2.3 Learning rate Since last experiment we figure out that some models require over 2000 iterations, I therefore want to set 5000 epochs to see further changes of curves. I will record the optimal accuracy and classification error rate for the lowest point. Firstly, I want to do control variable method to see what is going on when we keep eps or momentum unchanged to see the other variable effect the curves. Then I want to try different combination of all eps and momentum to look for more patterns and the best parameters Control variable method( for eps and mom respectively) eps I set momentum remains 0.0, hidden layer remains 10 and let s try 5000 epochs to see the whole picture of the curves.
7 Above is when eps = 0.01.
8 Above is when eps = 0.2.
9 Above is when eps = 0.5. Conclusion: 1) As the learning rate(eps) goes bigger, the less number of epochs it
10 took to get converged, for both correction rate and cross entropy. 2)Expect when eps = 0.01, the curves have not converged yet. The lowest points before overfitting for other parameters are pretty close, no big difference momentum I will set eps = 0.5, which converged fastest in the last case, as well as remains other parameters unchanged to see more about momentum as below( since I ve shown eps = 0.5 and momentum =0 in the last case, I will skip this one):
11 Above is when momentum = 0.5.
12 Above is when momentum = 0.9. Conclusion: 1) Similar to eps, as momentum goes bigger, the quicker curves get converged for both cross entropy and correction rate. 2) When curves reach the best point of the curves, there are no big difference for best cross entropy or correction rate Try all combination I run all the 9 combinations, e.g eps={0.01, 0.2, 0.5} & momentum={0, 0.5, 0.9} and the overall results as below: eps mom Cross error optimal iter optimal iter for ment um Entropy rate for CE error rate everything still goes down
13 it goes up and down many times in the begininig From this chart, it s very obvious to see that as the eps or momentum goes bigger, the quicker curves would get convergence for both correction rate and cross entropy. In addition, the best points for different combination are of no big difference. However, there is a very interesting combination when eps=0.01 and momentum = 0.9. The chart of their correction rate is as below: You can see there are some fluctuations in the beginning of the curve, which means the algorithm bounce around in the beginning. In this case, if we don t set
14 our number of epochs well, there might be reaching fake optima before really converging. In order to avoid this case, I would suggest let s set our parameters as close to each other as possible. Conclusion: 1) Bigger eps and momentum helps curves converge more quickly. 2) Don t set eps and momentum far away from each other, which perhaps intrigue fake optima. 3) If we only focus on the speed of convergence, I would suggest the parameters to be set as bigger as possible because there are no big difference for the performance of correction rate and cross entropy. 2.4 Number of hidden units Since we have seen this influence of different parameters under 5000 epochs, I will try 1000 epochs to see convergence, which is good enough to see. Above is when number of hidden units = 2.
15 Above is when number of hidden units = 5. Above is when number of hidden units = 10.
16 Above is when number of hidden units = 30.
17 Above is when number of hidden units = 100. The effect of this modification on the convergence properties:! The larger numbers of hidden units, the quicker curves converge.! The larger numbers of hidden units, the lower cross entropy we get after training with same number of epochs.! The fraction rate doesn t differ much by using different number of hidden units by the end of training. 2.5 Compare k- NN and Neural Networks when k = 1, classification rate of valid is 0.985, of test is when k = 3, classification rate of valid is 0.990, of test is when k = 5, classification rate of valid is 0.980, of test is when k = 7, classification rate of valid is 0.985, of test is when k = 9, classification rate of valid is 0.985, of test is The results of knn is generally like above. The error rate of knn is slightly better than neural network does, which is versus 0.99 for validation set and 0.97
18 versus 0.99 for test set. Interestingly, the classification rate of knn are very close to each other (round to 0.01) by using different k values, which means the classification of sample data is insensitive to the size of k to some extent.! Efficiency: In this case, it seems knn executes a little bit more quickly than neural network does (hidden units = 10). However, if the size of sample goes larger, it will be definitely clear that neural network plays better performance on the speed of classification.! Non- parametric vs parametric: It s quite clear that knn is of no parameters while neural network requires a lot of parameters to implement the algorithm.! Model sophistication: Since neural network involves input layer, output layer and perhaps many hidden layers and each iteration requires back- propagation, it s inevitable that neural network is a very complicated model. On the other hand, knn is very easy to implement and only take the distance into account.! Accuracy: In this case, knn is slightly better than neural network on test set and even reaches exactly same accuracy for validation set. However, after I read some papers, they say neural network plays better performance in general.
19 3 Mixture of Gaussians 3.2 Trainings 3.2.1RandConst & iter I tried randconst = {0.01,0.1,1,2,5,10} and find out that there is no big difference choosing ranconst. I wrote a new function called q2_1 in which I tried each randconst for more than 100 times and return me the biggest log- likelihood it gets. It turns out that the best randconst different for every time. Therefore, I run 100 times and set a mark to record the highest log- likelihood corresponding to which randconst within {0.01,0.1,1,2,5,10}. The results are as follow: 1: 22 2: 32 5: 24 10: : 0 0.1: 4 From the result that I run for 100 times, we can see there are no big difference for ranconst bigger than 1. Since randconst = 2 does the best performance among these parameters, I will generally choose 2 as my best choice for randconst. After I run several times, I found iter = 10 cannot show the whole picture of curve getting convergence. Usually, for model 2 s, curves converge among 6-15 iterations while 3 s converges among iterations. Therefore, I will set iter=20 in this question Mean vectors and variance vectors I extracted sample data from 2 and 3 respectively and show the mean and variance of both of classes after training by using mixture of Gaussian as below:
20 Above is mean and variance for 2 s.
21 Above is mean and variance of 3 s Mixing proportions
22 For 2 s, the mixing proportion is and For 3 s, the mixing proportion is and LogP(Training Data) I run my code for 2 s and 3 s respectively and they all get converged within 15 iterations. For model 2 s, the final log probability is On the other hand, for model 3 s, the final log probability is The plots are as below: Above is the plot of 2 s.
23 Above is the plot of 3 s. 3.3 Initializing a mixture of Gaussian with k- means Speed of convergence I tried usual initialization and k- means initialization respectively and print the logp for them. The plot graphs are as below:
24 Above is the one with using k- means. Above is the one without using k- means.
25 You can see the one without k- means to initialize parameters converges around more than 15 iterations of EM but the one with k- means, on the other hand, quickly converges at around 6 to 10 iteration of EM. It indeed demonstrates the mixture of Gaussian could be accelerated to a faster convergence by using k- means to initialize mu. This makes sense intuitively since running k- means as a form of preprocessing will start the Gaussians off in a more reasonable and sensible place than just initializing them randomly Final log- prob using traditional initialization Iter 0 logprob Iter 1 logprob Iter 2 logprob Iter 3 logprob Iter 4 logprob Iter 5 logprob Iter 6 logprob Iter 7 logprob Iter 8 logprob Iter 9 logprob Iter 10 logprob Iter 11 logprob Iter 12 logprob Iter 13 logprob Iter 14 logprob Iter 15 logprob Iter 16 logprob Iter 17 logprob Iter 18 logprob Iter 19 logprob Logprob : Train Valid Test
26 initialize with kmeans and iter=20 Iter 0 logprob Iter 1 logprob Iter 2 logprob Iter 3 logprob Iter 4 logprob Iter 5 logprob Iter 6 logprob Iter 7 logprob Iter 8 logprob Iter 9 logprob Iter 10 logprob Iter 11 logprob Iter 12 logprob Iter 13 logprob Iter 14 logprob Iter 15 logprob Iter 16 logprob Iter 17 logprob Iter 18 logprob Iter 19 logprob Logprob : Train Valid Test From the print- out result we can see, the one starting with kmeans to initialize their parameters ends up with higher log probability. This means we can have better maximization with our model by using kmeans to initialize parameters. Not only quicker to converge, but also have better performance on max the log probability. 3.4 Classification using MoGs I'm assuming that the priors of each model are 0.5. Notice that since both P(x) and the priors are constant across both classes, we can just use P(x d) to classify,
27 since P(d x) ~ P(x d). I tried the number of mixing component in {2,3,5,10, 15,20,25, 30} You should find that the error rates on the training sets generally decrease as the number of clusters increases. Explain why. This is because the more numbers of clusters you try to classify, the easier to classify the new input data into the right cluster after training. This is like you set more variables in logistic regression to describe the model, in which more variables is more likely to fit the model within certain number of iterations Examine the error rate curve for the test set and discuss its properties. Explain the trends that you observe. I run the script for several times and the graphs of result are not exactly the same each time. Sometimes the result is as above, that k=25 leads to the lowest average error rate and sometime is as below that k=20 is the optimized point.
28 I think the reason of being that is the model could be fit better as we increase the number of mixture components at first, which brings more accurate clustering for data. However, beyond certain number of mixture components, let s say 20 or 25 in this problem, the model becomes relatively overfitting so that as training data keeps going down, test data starts to go up for the classification error rate If you wanted to choose a particular model from your experiments as the best, how would you choose it? If your aim is to achieve the lowest error rate possible on the new images your system will receive, which model (number of clusters) would you select? Why? I d like to choose it by observing and comparing the error rate. If I could the number of mixture component from the parameters I selected, I probably choose 20 as my best choice, since I run my code for over 10 times. However, if I could only choose from the parameters given from 3.4, I might choose 15 as my selection. It s because when number of mixture component is too small, the training set still get space(by making more clusters) to promote accuracy but when number of mixture component is too big, the model will become overfit and have negative influence on the validation and test set. K=15 is just in
29 between that perfectly train the training data and doesn t overfit the model. There is a graph I plotted explaining a lot:
30 3.5 Bonus Question: Mixture of Gaussians vs Neural Network Visualize and compare Above is the last 3 out of 30 subplots for neural network s input layer weigths. Since we have 15 as our best choice for number of components for 2 s and 3 s respectively, I suppose we should use 30 clusters i.e. K=30 for neural network because it directly use the whole 600 dataset. Due to 30 clusters, the output is quite small( so I only plot last three to see more clearly) but enough to see the pixel inside each subplot is quite blur and rough in comparison of MoG ones. I suppose there are several reasons towards this phenomenon:! In MoG, we trained 2 s and 3 s respectively, which must be more accurate than neural network to train them together.! In MoG, we plotted mean and variance, in which tell the properties of after- trained model. However, in neural network, the input to hidden layers is only a part of the model properties and the rest of properties are within hidden layers. In this way, the plot of W1 only tells a little part of story.
31 As for classification rate, the result is as below. The classification error rates for validation and test set are and respectively, which is slightly poorer than MoG but still good enough. CE: Train Validation Test fr_rate: Train Validation Test Neural Networks (30 hidden units) MoG (K=15) Training Set 1 [0/600] 1 [0/600] Validation Set [5/200] [1/200] Test Set [13/400] [6/400] Visualize the input to hidden weights as images to see what your network has learned I only picked up the last three subplots so we can see better.
32 This is the randomly initial weights of input layer. This is after 30 iterations. This is the end of 60 iterations.
33 After 90 iterations, the curves have converged and similarly the pixel plots don t change much since then. As is shown above, the plots get clearer as the training more iterations. In the beginning, pixels are dispersed randomly while, in the end, starts to have some obviously bright places and dark places in every plot Compare hidden unit weights versus mixture component Intuitively, the essence the probability in mixture of Gaussian or the hidden unit weights in neural network is supposed to represent some features. In MoG, the probability is by no means to be negative and therefore the weights being close to zero could push the results close to zero, which makes it close to 2 and on the other around to 3. In neural network, the weights in hidden layers could be negative, which inclined to make the final result to zero, or could be positive, which inclined to make the final result close to 1, being 3. I printed the last three w2 and plot the last three subplots accordingly, as below:
34 What I got from w2 accordingly is as below: [ ] [ ] [ ]! You can see clearly from the pictures. If we deem that the bright part of the picture represent the feature of either 2 or 3, first picture is mostly resemble to 2 and the rest two are resemble to 3 apparently. I also examined all other subplots and found that the negative number stands for 2 s features while positive number stands for 3 s features.! It makes sense. If the weights are negative, the final results are more likely to push to 0, which makes the subplot resemble 2, and vice versa.! It s in some way similar to the mixing proportion, i.e. probability in the code. When it s close to 0 then it push the final result close to 0, which makes it resemble to 2 and vice versa as well.! I would say hidden units are more like features instead of clusters. In MoG, K(mixture component) means the number of clusters and p(mixing proportion) represents the probability to belong to which cluster. On the
35 contrary, in neural network, the weights is like trying to stand for a part of the 2 s or 3 s model as a feature or pattern to recognize a particular part of the number.! The way of how neural network works is to let nn learn those features part by part.
More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.
More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationEstablishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation
Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems
More informationCOSC 311: ALGORITHMS HW1: SORTING
COSC 311: ALGORITHMS HW1: SORTIG Solutions 1) Theoretical predictions. Solution: On randomly ordered data, we expect the following ordering: Heapsort = Mergesort = Quicksort (deterministic or randomized)
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationIntroduction to SNNS
Introduction to SNNS Caren Marzban http://www.nhn.ou.edu/ marzban Introduction In this lecture we will learn about a Neural Net (NN) program that I know a little about - the Stuttgart Neural Network Simulator
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationClassification and K-Nearest Neighbors
Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More informationLecture 8: The EM algorithm
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationThe first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.
Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationLab 9. Julia Janicki. Introduction
Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationMA 1128: Lecture 02 1/22/2018
MA 1128: Lecture 02 1/22/2018 Exponents Scientific Notation 1 Exponents Exponents are used to indicate how many copies of a number are to be multiplied together. For example, I like to deal with the signs
More informationDeep Neural Networks Optimization
Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy
More informationThe Fly & Anti-Fly Missile
The Fly & Anti-Fly Missile Rick Tilley Florida State University (USA) rt05c@my.fsu.edu Abstract Linear Regression with Gradient Descent are used in many machine learning applications. The algorithms are
More information1
Zeros&asymptotes Example 1 In an early version of this activity I began with a sequence of simple examples (parabolas and cubics) working gradually up to the main idea. But now I think the best strategy
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationClustering & Dimensionality Reduction. 273A Intro Machine Learning
Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning
More information3.7. Vertex and tangent
3.7. Vertex and tangent Example 1. At the right we have drawn the graph of the cubic polynomial f(x) = x 2 (3 x). Notice how the structure of the graph matches the form of the algebraic expression. The
More informationThe Problem of Overfitting with Maximum Likelihood
The Problem of Overfitting with Maximum Likelihood In the previous example, continuing training to find the absolute maximum of the likelihood produced overfitted results. The effect is much bigger if
More informationLecture 3: Linear Classification
Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.
More informationPatch-Based Image Classification Using Image Epitomes
Patch-Based Image Classification Using Image Epitomes David Andrzejewski CS 766 - Final Project December 19, 2005 Abstract Automatic image classification has many practical applications, including photo
More informationThe Mathematics Behind Neural Networks
The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the
More informationRAJESH KEDIA 2014CSZ8383
SIV895: Special Module on Intelligent Information Processing Project Report Title: Classification of Iris flower species: Analysis using Neural Network. Submitted By: RAJESH KEDIA 14CSZ8383 Date: -Apr-16
More informationAnimations involving numbers
136 Chapter 8 Animations involving numbers 8.1 Model and view The examples of Chapter 6 all compute the next picture in the animation from the previous picture. This turns out to be a rather restrictive
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More information9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?
Page: 1 of 14 1 R1 And this is tell me what this is? 2 Stephanie x times y plus x times y or hm? 3 R1 What are you thinking? 4 Stephanie I don t know. 5 R1 Tell me what you re thinking. 6 Stephanie Well.
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationSegmentation: Clustering, Graph Cut and EM
Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More information10.4 Linear interpolation method Newton s method
10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationLogistic Regression and Gradient Ascent
Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence
More informationHTML and CSS a further introduction
HTML and CSS a further introduction By now you should be familiar with HTML and CSS and what they are, HTML dictates the structure of a page, CSS dictates how it looks. This tutorial will teach you a few
More informationHere is the data collected.
Introduction to Scientific Analysis of Data Using Spreadsheets. Computer spreadsheets are very powerful tools that are widely used in Business, Science, and Engineering to perform calculations and record,
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationClassification and Detection in Images. D.A. Forsyth
Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train
More informationSingular Value Decomposition, and Application to Recommender Systems
Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation
More informationDivisibility Rules and Their Explanations
Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationImage Compression With Haar Discrete Wavelet Transform
Image Compression With Haar Discrete Wavelet Transform Cory Cox ME 535: Computational Techniques in Mech. Eng. Figure 1 : An example of the 2D discrete wavelet transform that is used in JPEG2000. Source:
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationLecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationCS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007.
CS787: Assignment 3, Robust and Mixture Models for Optic Flow Due: 3:30pm, Mon. Mar. 12, 2007. Many image features, such as image lines, curves, local image velocity, and local stereo disparity, can be
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More informationProblems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1
Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationPredicting Diabetes using Neural Networks and Randomized Optimization
Predicting Diabetes using Neural Networks and Randomized Optimization Kunal Sharma GTID: ksharma74 CS 4641 Machine Learning Abstract This paper analysis the following randomized optimization techniques
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationClustering Images. John Burkardt (ARC/ICAM) Virginia Tech... Math/CS 4414:
John (ARC/ICAM) Virginia Tech... Math/CS 4414: http://people.sc.fsu.edu/ jburkardt/presentations/ clustering images.pdf... ARC: Advanced Research Computing ICAM: Interdisciplinary Center for Applied Mathematics
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationA Formal Approach to Score Normalization for Meta-search
A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationClassification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging
1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.
More informationThe problem we have now is called variable selection or perhaps model selection. There are several objectives.
STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationProgramming Exercise 4: Neural Networks Learning
Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written
More informationGraph Structure Over Time
Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines
More informationDerivatives and Graphs of Functions
Derivatives and Graphs of Functions September 8, 2014 2.2 Second Derivatives, Concavity, and Graphs In the previous section, we discussed how our derivatives can be used to obtain useful information about
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationMissing variable problems
Missing variable problems In many vision problems, if some variables were known the maximum likelihood inference problem would be easy fitting; if we knew which line each token came from, it would be easy
More informationPong in Unity a basic Intro
This tutorial recreates the classic game Pong, for those unfamiliar with the game, shame on you what have you been doing, living under a rock?! Go google it. Go on. For those that now know the game, this
More informationMini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class
Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability
More information1 Machine Learning System Design
Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationOVERVIEW & RECAP COLE OTT MILESTONE WRITEUP GENERALIZABLE IMAGE ANALOGIES FOCUS
COLE OTT MILESTONE WRITEUP GENERALIZABLE IMAGE ANALOGIES OVERVIEW & RECAP FOCUS The goal of my project is to use existing image analogies research in order to learn filters between images. SIMPLIFYING
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationIf Statements, For Loops, Functions
Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More informationReading on the Accumulation Buffer: Motion Blur, Anti-Aliasing, and Depth of Field
Reading on the Accumulation Buffer: Motion Blur, Anti-Aliasing, and Depth of Field 1 The Accumulation Buffer There are a number of effects that can be achieved if you can draw a scene more than once. You
More informationFitting D.A. Forsyth, CS 543
Fitting D.A. Forsyth, CS 543 Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local can t tell whether a set of points lies on
More informationSection 4 General Factorial Tutorials
Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2016
CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions
More informationActivity A 1-D Free-Fall with v i = 0
Physics 151 Practical 3 Python Programming Freefall Kinematics Department of Physics 60 St. George St. NOTE: Today's activities must be done in teams of one or two. There are now twice as many computers
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationLearning from Data Mixture Models
Learning from Data Mixture Models Copyright David Barber 2001-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 1 It is not uncommon for data to come
More information