Cross-validation for detecting and preventing overfitting

Size: px
Start display at page:

Download "Cross-validation for detecting and preventing overfitting"

Transcription

1 Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them to fit our own needs. PowerPoint originals are available. If ou make use of a significant portion of these slides in our own lecture, please include this message, or the following link to the source repositor of Andrew s tutorials http// Comments and corrections gratefull received. Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon Universit awm@cs.cmu.edu Copright 2001, Andrew W. Moore Oct 15th, 2001 A Regression Problem = f() + noise Can we learn f from this data? Let s consider three methods Copright 2001, Andrew W. Moore Cross-Validation Slide 2 1

2 Linear Regression Copright 2001, Andrew W. Moore Cross-Validation Slide Linear Regression Univariate Linear regression with a constant term X 1 Y 7 X= = =().. 1 =7.. Originall discussed in the previous Andrew Lecture Neural Nets Copright 2001, Andrew W. Moore Cross-Validation Slide 4 2

3 Linear Regression Univariate Linear regression with a constant term X 1 Y 7 X= = Z= 1 =().. 1 = 7 1 = z 1 =(1,).. z k =(1, k ) 1 = Copright 2001, Andrew W. Moore Cross-Validation Slide 5 Linear Regression Univariate Linear regression with a constant term X 1 Y 7 X= = 1 Z= 1 =().. 1 =7.. 1 = β=(z T Z) -1 (Z T ) z 1 =(1,).. z k =(1, k ) 1 =7.. 7 est = β 0 + β 1 Copright 2001, Andrew W. Moore Cross-Validation Slide 6

4 Quadratic Regression Copright 2001, Andrew W. Moore Cross-Validation Slide 7 Quadratic Regression X Y 7 1 Z= z=(1,, 2,) X= = =(,2).. 1 =7.. = 7 Much more about this in the future Andrew Lecture Favorite Regression Algorithms β=(z T Z) -1 (Z T ) est = β 0 + β 1 + β 2 2 Copright 2001, Andrew W. Moore Cross-Validation Slide 8 4

5 Join-the-dots Also known as piecewise linear nonparametric regression if that makes ou feel better Copright 2001, Andrew W. Moore Cross-Validation Slide 9 Which is best? Wh not choose the method with the best fit to the data? Copright 2001, Andrew W. Moore Cross-Validation Slide 10 5

6 What do we reall want? Wh not choose the method with the best fit to the data? How well are ou going to predict future data drawn from the same distribution? Copright 2001, Andrew W. Moore Cross-Validation Slide 11 The test set method 1. Randoml choose 0% of the data to be in a test set 2. The remainder is a training set Copright 2001, Andrew W. Moore Cross-Validation Slide 12 6

7 The test set method 1. Randoml choose 0% of the data to be in a test set 2. The remainder is a training set. Perform our regression on the training set (Linear regression eample) Copright 2001, Andrew W. Moore Cross-Validation Slide 1 The test set method (Linear regression eample) Mean Squared Error = Randoml choose 0% of the data to be in a test set 2. The remainder is a training set. Perform our regression on the training set 4. Estimate our future performance with the test set Copright 2001, Andrew W. Moore Cross-Validation Slide 14 7

8 The test set method 1. Randoml choose 0% of the data to be in a test set 2. The remainder is a training set. Perform our regression on the training set (Quadratic regression eample) 4. Estimate our future performance with the test Mean Squared Error = 0.9 set Copright 2001, Andrew W. Moore Cross-Validation Slide 15 The test set method (Join the dots eample) Mean Squared Error = Randoml choose 0% of the data to be in a test set 2. The remainder is a training set. Perform our regression on the training set 4. Estimate our future performance with the test set Copright 2001, Andrew W. Moore Cross-Validation Slide 16 8

9 Good news Ver ver simple The test set method Can then simpl choose the method with the best test-set score Bad news What s the downside? Copright 2001, Andrew W. Moore Cross-Validation Slide 17 Good news Ver ver simple The test set method Can then simpl choose the method with the best test-set score Bad news Wastes data we get an estimate of the best method to appl to 0% less data If we don t have much data, our test-set might just be luck or unluck We sa the test-set estimator of performance has high variance Copright 2001, Andrew W. Moore Cross-Validation Slide 18 9

10 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record Copright 2001, Andrew W. Moore Cross-Validation Slide 19 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset Copright 2001, Andrew W. Moore Cross-Validation Slide 20 10

11 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset. Train on the remaining R-1 datapoints Copright 2001, Andrew W. Moore Cross-Validation Slide 21 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) Copright 2001, Andrew W. Moore Cross-Validation Slide 22 11

12 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. Copright 2001, Andrew W. Moore Cross-Validation Slide 2 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV = 2.12 Copright 2001, Andrew W. Moore Cross-Validation Slide 24 12

13 LOOCV for Quadratic Regression For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV =0.962 Copright 2001, Andrew W. Moore Cross-Validation Slide 25 LOOCV for Join The Dots For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV =. Copright 2001, Andrew W. Moore Cross-Validation Slide 26 1

14 Which kind of Cross Validation? Test-set Leaveone-out Downside Variance unreliable estimate of future performance Epensive. Has some weird behavior Upside Cheap Doesn t waste data..can we get the best of both worlds? Copright 2001, Andrew W. Moore Cross-Validation Slide 27 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) Copright 2001, Andrew W. Moore Cross-Validation Slide 28 14

15 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) For the red partition Train on all the points not in the red partition. Find the test-set sum of errors on the red points. Copright 2001, Andrew W. Moore Cross-Validation Slide 29 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) For the red partition Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition Train on all the points not in the green partition. Find the test-set sum of errors on the green points. Copright 2001, Andrew W. Moore Cross-Validation Slide 0 15

16 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) For the red partition Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Copright 2001, Andrew W. Moore Cross-Validation Slide 1 k-fold Cross Validation Linear Regression MSE FOLD =2.05 Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) For the red partition Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright 2001, Andrew W. Moore Cross-Validation Slide 2 16

17 k-fold Cross Validation Quadratic Regression MSE FOLD =1.11 Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) For the red partition Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright 2001, Andrew W. Moore Cross-Validation Slide k-fold Cross Validation Joint-the-dots MSE FOLD =2.9 Randoml break the dataset into k partitions (in our eample we ll have k= partitions colored Red Green and Blue) For the red partition Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright 2001, Andrew W. Moore Cross-Validation Slide 4 17

18 Which kind of Cross Validation? Downside Upside Test-set Leaveone-out 10-fold -fold R-fold Variance unreliable estimate of future performance Epensive. Has some weird behavior Wastes 10% of the data. 10 times more epensive than test set Wastier than 10-fold. Epensivier than test set Identical to Leave-one-out Cheap Doesn t waste data Onl wastes 10%. Onl 10 times more epensive instead of R times. Copright 2001, Andrew W. Moore Cross-Validation Slide 5 Which kind of Cross Validation? Downside Upside Test-set Slightl better than testset Leaveone-out 10-fold -fold R-fold Variance unreliable Cheap estimate of future performance But note One of Epensive. Andrew s Doesn t jos waste in data life is Has some weird behavioralgorithmic tricks for Wastes 10% of the data. making Onl wastes these cheap 10%. Onl 10 times more epensive 10 times more epensive than testset instead of R times. Wastier than 10-fold. Epensivier than testset Identical to Leave-one-out Slightl better than testset Copright 2001, Andrew W. Moore Cross-Validation Slide 6 18

19 CV-based Model Selection We re tring to decide which algorithm to use. We train each machine and make a table i f i f 1 f 2 f f 4 f 5 f 6 TRAINERR 10-FOLD-CV-ERR Choice Copright 2001, Andrew W. Moore Cross-Validation Slide 7 Alternatives to CV-based model selection Model selection methods 1. Cross-validation 2. AIC (Akaike Information Criterion). BIC (Baesian Information Criterion) 4. VC-dimension (Vapnik-Chervonenkis Dimension) Onl directl applicable to choosing classifiers Described in a future Andrew Lecture Copright 2001, Andrew W. Moore Cross-Validation Slide 8 19

20 Which model selection method is best? 1. (CV) Cross-validation 2. AIC (Akaike Information Criterion). BIC (Baesian Information Criterion) 4. (SRMVC) Structural Risk Minimize with VC-dimension AIC, BIC and SRMVC advantage ou onl need the training error. CV error might have more variance SRMVC is wildl conservative Asmptoticall AIC and Leave-one-out CV should be the same Asmptoticall BIC and carefull chosen k-fold should be same You want BIC ou want the best structure instead of the best predictor (e.g. for clustering or Baes Net structure finding) Man alternatives---including proper Baesian approaches. It s an emotional issue. Copright 2001, Andrew W. Moore Cross-Validation Slide 9 Other Cross-validation issues Can do leave all pairs out or leave-allntuples-out if feeling resourceful. Some folks do k-folds in which each fold is an independentl-chosen subset of the data Do ou know what AIC and BIC are? If so LOOCV behaves like AIC asmptoticall. k-fold behaves like BIC if ou choose k carefull If not Nardel nardel noo noo Copright 2001, Andrew W. Moore Cross-Validation Slide 40 20

21 Cross-Validation for regression Choosing the number of hidden units in a neural net Feature selection (see later) Choosing a polnomial degree Choosing which regressor to use Copright 2001, Andrew W. Moore Cross-Validation Slide 41 Supervising Gradient Descent This is a weird but common use of Test-set validation Suppose ou have a neural net with too man hidden units. It will overfit. As gradient descent progresses, maintain a graph of MSE-testset-error vs. Iteration Mean Squared Error Use the weights ou found on this iteration Training Set Test Set Iteration of Gradient Descent Copright 2001, Andrew W. Moore Cross-Validation Slide 42 21

22 Supervising Gradient Descent This is a weird but common use of Test-set validation Suppose ou have a neural net with too Relies on an intuition that a not-full- man hidden units. It will overfit. As gradient descent having fewer progresses, parameters. maintain a graph of MSE-testset-error Works prett well in vs. practice, Iteration apparentl minimized set of weights is somewhat like Mean Squared Error Use the weights ou found on this iteration Training Set Test Set Iteration of Gradient Descent Copright 2001, Andrew W. Moore Cross-Validation Slide 4 Cross-validation for classification Instead of computing the sum squared errors on a test set, ou should compute Copright 2001, Andrew W. Moore Cross-Validation Slide 44 22

23 Cross-validation for classification Instead of computing the sum squared errors on a test set, ou should compute The total number of misclassifications on a testset. Copright 2001, Andrew W. Moore Cross-Validation Slide 45 Cross-validation for classification Instead of computing the sum squared errors on a test set, ou should compute The total number of misclassifications on a testset. But there s a more sensitive alternative Compute log P(all test outputs all test inputs, our model) Copright 2001, Andrew W. Moore Cross-Validation Slide 46 2

24 Cross-Validation for classification Choosing the pruning parameter for decision trees Feature selection (see later) What kind of Gaussian to use in a Gaussianbased Baes Classifier Choosing which classifier to use Copright 2001, Andrew W. Moore Cross-Validation Slide 47 Cross-Validation for densit estimation Compute the sum of log-likelihoods of test points Eample uses Choosing what kind of Gaussian assumption to use Choose the densit estimator NOT Feature selection (testset densit will almost alwas look better with fewer features) Copright 2001, Andrew W. Moore Cross-Validation Slide 48 24

25 Feature Selection Suppose ou have a learning algorithm LA and a set of input attributes { X 1, X 2.. X m } You epect that LA will onl find some subset of the attributes useful. Question How can we use cross-validation to find a useful subset? Four ideas Forward selection Backward elimination Hill Climbing Another fun area in which Andrew has spent a lot of his wild outh Stochastic search (Simulated Annealing or GAs) Copright 2001, Andrew W. Moore Cross-Validation Slide 49 Ver serious warning Intensive use of cross validation can overfit. How? What can be done about it? Copright 2001, Andrew W. Moore Cross-Validation Slide 50 25

26 Ver serious warning Intensive use of cross validation can overfit. How? Imagine a dataset with 50 records and 1000 attributes. You tr 1000 linear regression models, each one using one of the attributes. What can be done about it? Copright 2001, Andrew W. Moore Cross-Validation Slide 51 Ver serious warning Intensive use of cross validation can overfit. How? Imagine a dataset with 50 records and 1000 attributes. You tr 1000 linear regression models, each one using one of the attributes. The best of those 1000 looks good! What can be done about it? Copright 2001, Andrew W. Moore Cross-Validation Slide 52 26

27 Ver serious warning Intensive use of cross validation can overfit. How? Imagine a dataset with 50 records and 1000 attributes. You tr 1000 linear regression models, each one using one of the attributes. The best of those 1000 looks good! But ou realize it would have looked good even if the output had been purel random! What can be done about it? Hold out an additional testset before doing an model selection. Check the best model performs well even on the additional testset. Copright 2001, Andrew W. Moore Cross-Validation Slide 5 What ou should know Wh ou can t use training-set-error to estimate the qualit of our learning algorithm on our data. Wh ou can t use training set error to choose the learning algorithm Test-set cross-validation Leave-one-out cross-validation k-fold cross-validation Feature selection methods CV for classification, regression & densities Copright 2001, Andrew W. Moore Cross-Validation Slide 54 27

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Andrew W. Moore/Anna Goldenberg School of Computer Science Carnegie Mellon Universit Copright 2001, Andrew W. Moore Apr 1st, 2004 Want to learn

More information

Bias-Variance Decomposition Error Estimators Cross-Validation

Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data,

More information

Bias-Variance Decomposition Error Estimators

Bias-Variance Decomposition Error Estimators Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data,

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

LECTURE 6: CROSS VALIDATION

LECTURE 6: CROSS VALIDATION LECTURE 6: CROSS VALIDATION CSCI 4352 Machine Learning Dongchul Kim, Ph.D. Department of Computer Science A Regression Problem Given a data set, how can we evaluate our (linear) model? Cross Validation

More information

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler Lecture 7 Machine Learning: Cross Validation Outline Performance evaluation methods test/train sets cross-validation k-fold Leave-one-out 1 A

More information

Overfitting, Model Selection, Cross Validation, Bias-Variance

Overfitting, Model Selection, Cross Validation, Bias-Variance Statistical Machine Learning Notes 2 Overfitting, Model Selection, Cross Validation, Bias-Variance Instructor: Justin Domke Motivation Suppose we have some data TRAIN = {(, ), ( 2, 2 ),..., ( N, N )} that

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

Non-linear models. Basis expansion. Overfitting. Regularization.

Non-linear models. Basis expansion. Overfitting. Regularization. Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Resampling methods (Ch. 5 Intro)

Resampling methods (Ch. 5 Intro) Zavádějící faktor (Confounding factor), ale i 'současně působící faktor' Resampling methods (Ch. 5 Intro) Key terms: Train/Validation/Test data Crossvalitation One-leave-out = LOOCV Bootstrup key slides

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Machine Learning. Cross Validation

Machine Learning. Cross Validation Machine Learning Cross Validation Cross Validation Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication

More information

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE

More information

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5. More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood

More information

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11: Lecture 11: Overfitting and Capacity Control high bias low variance Typical Behaviour low bias high variance Sam Roweis error test set training set November 23, 4 low Model Complexity high Generalization,

More information

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance CSC55 Machine Learning Sam Roweis high bias low variance Typical Behaviour low bias high variance Lecture : Overfitting and Capacity Control error training set test set November, 6 low Model Complexity

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Lecture 19: Decision trees

Lecture 19: Decision trees Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

CS 8520: Artificial Intelligence

CS 8520: Artificial Intelligence CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Clustering Part 2. A Partitional Clustering

Clustering Part 2. A Partitional Clustering Universit of Florida CISE department Gator Engineering Clustering Part Dr. Sanja Ranka Professor Computer and Information Science and Engineering Universit of Florida, Gainesville Universit of Florida

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Topic 2 Transformations of Functions

Topic 2 Transformations of Functions Week Topic Transformations of Functions Week Topic Transformations of Functions This topic can be a little trick, especiall when one problem has several transformations. We re going to work through each

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Linear Feature Engineering 22

Linear Feature Engineering 22 Linear Feature Engineering 22 3 Overfitting Remember our dataset from last time. We have a bunch of inputs x i and corresponding outputs y i. In the previous lecture notes, we considered how to fit polynomials

More information

Introduction to Homogeneous Transformations & Robot Kinematics

Introduction to Homogeneous Transformations & Robot Kinematics Introduction to Homogeneous Transformations & Robot Kinematics Jennifer Ka Rowan Universit Computer Science Department. Drawing Dimensional Frames in 2 Dimensions We will be working in -D coordinates,

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

Week 3. Topic 5 Asymptotes

Week 3. Topic 5 Asymptotes Week 3 Topic 5 Asmptotes Week 3 Topic 5 Asmptotes Introduction One of the strangest features of a graph is an asmptote. The come in three flavors: vertical, horizontal, and slant (also called oblique).

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology Intermediate Algebra Gregg Waterman Oregon Institute of Technolog c 2017 Gregg Waterman This work is licensed under the Creative Commons Attribution 4.0 International license. The essence of the license

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Closing Thoughts on Machine Learning (ML in Practice)

Closing Thoughts on Machine Learning (ML in Practice) Closing Thoughts on (ML in Practice) 1 Closing Thoughts on (ML in Practice) 1 When someone asks What is? Learning is any process by which a system improves performance from experience. - Herbert Simon

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation. Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Recap Arc Consistency Example Local Search Hill Climbing. Local Search. CPSC 322 CSPs 4. Textbook 4.8. Local Search CPSC 322 CSPs 4, Slide 1

Recap Arc Consistency Example Local Search Hill Climbing. Local Search. CPSC 322 CSPs 4. Textbook 4.8. Local Search CPSC 322 CSPs 4, Slide 1 Local Search CPSC 322 CSPs 4 Textbook 4.8 Local Search CPSC 322 CSPs 4, Slide 1 Lecture Overview 1 Recap 2 Arc Consistency Example 3 Local Search 4 Hill Climbing Local Search CPSC 322 CSPs 4, Slide 2 Arc

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization [POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization L. Jason Anastasopoulos ljanastas@uga.edu February 2, 2017 Gradient descent Let s begin with our simple problem of estimating

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Machine Learning in Telecommunications

Machine Learning in Telecommunications Machine Learning in Telecommunications Paulos Charonyktakis & Maria Plakia Department of Computer Science, University of Crete Institute of Computer Science, FORTH Roadmap Motivation Supervised Learning

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin CS 343H: Honors AI Lecture 23: Kernels and clustering 4/15/2014 Kristen Grauman UT Austin Slides courtesy of Dan Klein, except where otherwise noted Announcements Office hours Kim s office hours this week:

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information