Bias-Variance Decomposition Error Estimators Cross-Validation

Size: px
Start display at page:

Download "Bias-Variance Decomposition Error Estimators Cross-Validation"

Transcription

1 Bias-Variance Decomposition Error Estimators Cross-Validation

2 Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data, solution changes a lot high-variance solution.

3 Epected Loss Let h = E[t ] be the optimal predictor and our actual predictor, which will incur the following epected loss ddt p t h d p h t h t h h h E ddt t p t h h t E, 0 ] [ since dt t p -t t E dt p t h d p h t E Noise term Source of error 1

4 Sources of error 1 noise What if we have perfect learner, infinite data? Our learning solution satisfies =h Still have remaining, unavoidable error of σ due to noise ε t t p f t p dtd h h, ~ N0, E[ ]

5 Bias-variance decomposition Focus on Let us first eamine epected loss averaging over data sets. For one data set D and one test point : Hence d p h ], [ ], [, ], [ ], [, ], [ ], [,, h D E D E D h D E D E D h D E D E D h D D D D D D D variance bias ], [, ], [, D E D E h D E h D E D D D D of this cancels to zero E D

6 Bias Suppose ou are given a dataset D with m samples from some distribution. You learn function from data D If ou sample a different datasets, ou will learn different Epected hpothesis: E D [,D] Bias: difference between what ou epect to learn and the truth. Measures how well ou epect to represent true solution Decreases with more comple model bias E [, D] h p d D Epected to learn True model

7 Variance Variance: difference between what ou epect to learn and what ou learn from a particular dataset Measures how sensitive learner is to a specific dataset Decreases with simpler model var iance, D E [, D] p d E D D Learned for D Epected to learn, averaged over datasets

8 Bias-Variance Tradeoff Choice of hpothesis class introduces learning bias More comple class less bias More comple class more variance

9 The Epected Prediction Error The epected prediction squared error over fied size training sets D drawn from PX,T can be epressed as sum of three components: unavoidabl e error bias variance where unavoidabl e error bias E [ ] h p d D var iance E E [ ] p d D D

10 Higher variance Lower bias

11 Training Error Given a training data, choose a loss function. e.g., squared error L for regression Training set error: For a particular set of parameters, loss function on training data: Err train N 1 train M w t i Ntrain i1 j1 w i j

12 Error Training Error vs. Model Compleit Training error low Model compleit high

13 Prediction Generalization Error Training set error can be poor measure of qualit" of solution Prediction error: We reall care about error over all possible input points, not just training data: d w t w t E w Err M j j i M j j i true X 1 1 X

14 Error Prediction Error vs. Model Compleit True Error Training error low want Model compleit high

15 Wh training set error doesn t approimate prediction error? Sampling approimation of prediction error: Err Training error Err train true w Ver similar equations!!! Wh is training set a bad measure of prediction error??? 1 p p i1 t M j1 w N 1 train M w t i Ntrain i1 j1 i j w i j

16 Wh is training set a bad measure of prediction error??? Training error is a good estimate for a single w, But we optimized w with respect to the training error, and found w that is good for this set of samples. Training error is a optimisticall biased estimate of prediction error

17 Test Error Given a dataset, randoml split it into two parts: Training data { 1,, Ntrain } Test data { 1,, Ntest } Use training data to optimize parameters w Test set error: For the final solution w*, evaluate the error using: Err 1 N test M * test w t i Ntest i1 j1 Unbiased but has variance w * i Test set onl unbiased if ou never do an learning on the test data For eample, if ou use the test set to select the degree of the polnomial no longer unbiased!!! j

18 Error Test Set Error vs. Model Compleit Test Error True Error Training error low want Model comleit high

19 How man points to use for training/testing? Ver hard question to answer! Too few training points, learned w is bad Too few test points, ou never know if ou reached a good solution Theor proposes error bounds advanced course Tpicall: if ou have a reasonable amount of data, pick test set large enough for a reasonable estimate of error, and use the rest for learning if ou have little data, then ou need to use some special techniques e.g., bootstrapping

20 Error Error as a function of number of training eamples for a fied model compleit Test error Training error Little data Infinite data

21 Overfitting With too few training eamples our model ma achieve zero training error but never the less has a large generalization error When the training error no longer bears an relation to the generalization error the model overfits the data

22 Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them to fit our own needs. PowerPoint originals are available. If ou make use of a significant portion of these slides in our own lecture, please include this message, or the following link to the source repositor of Andrew s tutorials: Comments and corrections gratefull received. Andrew W. Moore Professor School of Computer Science Carnegie Mellon Universit awm@cs.cmu.edu Copright Andrew W. Moore Slide

23 A Regression Problem = f + noise Can we learn f from this data? Let s consider three methods Copright Andrew W. Moore Slide 3

24 Linear Regression Copright Andrew W. Moore Slide 4

25 Quadratic Regression Copright Andrew W. Moore Slide 5

26 Join-the-dots Also known as piecewise linear nonparametric regression if that makes ou feel better Copright Andrew W. Moore Slide 6

27 Which is best? Wh not choose the method with the best fit to the data? Copright Andrew W. Moore Slide 7

28 What do we reall want? Wh not choose the method with the best fit to the data? How well are ou going to predict future data drawn from the same distribution? Copright Andrew W. Moore Slide 8

29 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set Copright Andrew W. Moore Slide 9

30 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set 3. Perform our regression on the training set Linear regression eample Copright Andrew W. Moore Slide 30

31 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set Linear regression eample Mean Squared Error =.4 3. Perform our regression on the training set 4. Estimate our future performance with the test set Copright Andrew W. Moore Slide 31

32 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set Quadratic regression eample Mean Squared Error = Perform our regression on the training set 4. Estimate our future performance with the test set Copright Andrew W. Moore Slide 3

33 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set Join the dots eample Mean Squared Error =. 3. Perform our regression on the training set 4. Estimate our future performance with the test set Copright Andrew W. Moore Slide 33

34 Good news: Ver ver simple The test set method Can then simpl choose the method with the best test-set score Bad news: What s the downside? Copright Andrew W. Moore Slide 34

35 Good news: Ver ver simple The test set method Can then simpl choose the method with the best test-set score Bad news: Wastes data: we get an estimate of the best method to appl to 30% less data If we don t have much data, our test-set might just be luck or unluck We sa the test-set estimator of performance has high variance Copright Andrew W. Moore Slide 35

36 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record Copright Andrew W. Moore Slide 36

37 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset Copright Andrew W. Moore Slide 37

38 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints Copright Andrew W. Moore Slide 38

39 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k Copright Andrew W. Moore Slide 39

40 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. Copright Andrew W. Moore Slide 40

41 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. MSE LOOCV =.1 Copright Andrew W. Moore Slide 41

42 LOOCV for Quadratic Regression For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. MSE LOOCV =0.96 Copright Andrew W. Moore Slide 4

43 LOOCV for Join The Dots For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. MSE LOOCV =3.33 Copright Andrew W. Moore Slide 43

44 Which kind of Cross Validation? Test-set Leaveone-out Downside Variance: unreliable estimate of future performance Epensive. Has some weird behavior Upside Cheap Doesn t waste data..can we get the best of both worlds? Copright Andrew W. Moore Slide 44

45 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue Copright Andrew W. Moore Slide 45

46 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. Copright Andrew W. Moore Slide 46

47 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. Copright Andrew W. Moore Slide 47

48 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Copright Andrew W. Moore Slide 48

49 k-fold Cross Validation Linear Regression MSE 3FOLD =.05 Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright Andrew W. Moore Slide 49

50 k-fold Cross Validation Quadratic Regression MSE 3FOLD =1.11 Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright Andrew W. Moore Slide 50

51 k-fold Cross Validation Joint-the-dots MSE 3FOLD =.93 Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright Andrew W. Moore Slide 51

52 Which kind of Cross Validation? Test-set Leaveone-out 10-fold 3-fold R-fold Downside Variance: unreliable estimate of future performance Epensive. Has some weird behavior Wastes 10% of the data. 10 times more epensive than test set Wastier than 10-fold. Epensivier than test set Identical to Leave-one-out Upside Cheap Doesn t waste data Onl wastes 10%. Onl 10 times more epensive instead of R times. Slightl better than testset Copright Andrew W. Moore Slide 5

Bias-Variance Decomposition Error Estimators

Bias-Variance Decomposition Error Estimators Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data,

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

LECTURE 6: CROSS VALIDATION

LECTURE 6: CROSS VALIDATION LECTURE 6: CROSS VALIDATION CSCI 4352 Machine Learning Dongchul Kim, Ph.D. Department of Computer Science A Regression Problem Given a data set, how can we evaluate our (linear) model? Cross Validation

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Andrew W. Moore/Anna Goldenberg School of Computer Science Carnegie Mellon Universit Copright 2001, Andrew W. Moore Apr 1st, 2004 Want to learn

More information

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler Lecture 7 Machine Learning: Cross Validation Outline Performance evaluation methods test/train sets cross-validation k-fold Leave-one-out 1 A

More information

Overfitting, Model Selection, Cross Validation, Bias-Variance

Overfitting, Model Selection, Cross Validation, Bias-Variance Statistical Machine Learning Notes 2 Overfitting, Model Selection, Cross Validation, Bias-Variance Instructor: Justin Domke Motivation Suppose we have some data TRAIN = {(, ), ( 2, 2 ),..., ( N, N )} that

More information

Non-linear models. Basis expansion. Overfitting. Regularization.

Non-linear models. Basis expansion. Overfitting. Regularization. Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance

More information

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More

More information

Machine Learning. Cross Validation

Machine Learning. Cross Validation Machine Learning Cross Validation Cross Validation Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication

More information

CSE 446 Bias-Variance & Naïve Bayes

CSE 446 Bias-Variance & Naïve Bayes CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

EE 511 Linear Regression

EE 511 Linear Regression EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due

More information

Model Complexity and Generalization

Model Complexity and Generalization HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Clustering Part 2. A Partitional Clustering

Clustering Part 2. A Partitional Clustering Universit of Florida CISE department Gator Engineering Clustering Part Dr. Sanja Ranka Professor Computer and Information Science and Engineering Universit of Florida, Gainesville Universit of Florida

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Linear Feature Engineering 22

Linear Feature Engineering 22 Linear Feature Engineering 22 3 Overfitting Remember our dataset from last time. We have a bunch of inputs x i and corresponding outputs y i. In the previous lecture notes, we considered how to fit polynomials

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)

More information

Graphing square root functions. What would be the base graph for the square root function? What is the table of values?

Graphing square root functions. What would be the base graph for the square root function? What is the table of values? Unit 3 (Chapter 2) Radical Functions (Square Root Functions Sketch graphs of radical functions b appling translations, stretches and reflections to the graph of Analze transformations to identif the of

More information

2.4 Polynomial and Rational Functions

2.4 Polynomial and Rational Functions Polnomial Functions Given a linear function f() = m + b, we can add a square term, and get a quadratic function g() = a 2 + f() = a 2 + m + b. We can continue adding terms of higher degrees, e.g. we can

More information

Introduction to Homogeneous Transformations & Robot Kinematics

Introduction to Homogeneous Transformations & Robot Kinematics Introduction to Homogeneous Transformations & Robot Kinematics Jennifer Ka Rowan Universit Computer Science Department. Drawing Dimensional Frames in 2 Dimensions We will be working in -D coordinates,

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

SECTION 3-4 Rational Functions

SECTION 3-4 Rational Functions 20 3 Polnomial and Rational Functions 0. Shipping. A shipping bo is reinforced with steel bands in all three directions (see the figure). A total of 20. feet of steel tape is to be used, with 6 inches

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials

More information

5.2. Exploring Quotients of Polynomial Functions. EXPLORE the Math. Each row shows the graphs of two polynomial functions.

5.2. Exploring Quotients of Polynomial Functions. EXPLORE the Math. Each row shows the graphs of two polynomial functions. YOU WILL NEED graph paper coloured pencils or pens graphing calculator or graphing software Eploring Quotients of Polnomial Functions EXPLORE the Math Each row shows the graphs of two polnomial functions.

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Partial Fraction Decomposition

Partial Fraction Decomposition Section 7. Partial Fractions 53 Partial Fraction Decomposition Algebraic techniques for determining the constants in the numerators of partial fractions are demonstrated in the eamples that follow. Note

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Week 3. Topic 5 Asymptotes

Week 3. Topic 5 Asymptotes Week 3 Topic 5 Asmptotes Week 3 Topic 5 Asmptotes Introduction One of the strangest features of a graph is an asmptote. The come in three flavors: vertical, horizontal, and slant (also called oblique).

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

Transformations of Functions. 1. Shifting, reflecting, and stretching graphs Symmetry of functions and equations

Transformations of Functions. 1. Shifting, reflecting, and stretching graphs Symmetry of functions and equations Chapter Transformations of Functions TOPICS.5.. Shifting, reflecting, and stretching graphs Smmetr of functions and equations TOPIC Horizontal Shifting/ Translation Horizontal Shifting/ Translation Shifting,

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Chapter 3. Interpolation. 3.1 Introduction

Chapter 3. Interpolation. 3.1 Introduction Chapter 3 Interpolation 3 Introduction One of the fundamental problems in Numerical Methods is the problem of interpolation, that is given a set of data points ( k, k ) for k =,, n, how do we find a function

More information

Graphing Polynomial Functions

Graphing Polynomial Functions LESSON 7 Graphing Polnomial Functions Graphs of Cubic and Quartic Functions UNDERSTAND A parent function is the most basic function of a famil of functions. It preserves the shape of the entire famil.

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

It s Not Complex Just Its Solutions Are Complex!

It s Not Complex Just Its Solutions Are Complex! It s Not Comple Just Its Solutions Are Comple! Solving Quadratics with Comple Solutions 15.5 Learning Goals In this lesson, ou will: Calculate comple roots of quadratic equations and comple zeros of quadratic

More information

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons

More information

Answers. Investigation 4. ACE Assignment Choices. Applications

Answers. Investigation 4. ACE Assignment Choices. Applications Answers Investigation ACE Assignment Choices Problem. Core Other Connections, ; Etensions ; unassigned choices from previous problems Problem. Core, 7 Other Applications, ; Connections ; Etensions ; unassigned

More information

Investigation Free Fall

Investigation Free Fall Investigation Free Fall Name Period Date You will need: a motion sensor, a small pillow or other soft object What function models the height of an object falling due to the force of gravit? Use a motion

More information

Lecture 12 Date:

Lecture 12 Date: Information Technolog Delhi Lecture 2 Date: 3.09.204 The Signal Flow Graph Information Technolog Delhi Signal Flow Graph Q: Using individual device scattering parameters to analze a comple microwave network

More information

Hyperparameters and Validation Sets. Sargur N. Srihari

Hyperparameters and Validation Sets. Sargur N. Srihari Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation

More information

Section 1.4 Limits involving infinity

Section 1.4 Limits involving infinity Section. Limits involving infinit (/3/08) Overview: In later chapters we will need notation and terminolog to describe the behavior of functions in cases where the variable or the value of the function

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

5.2 Graphing Polynomial Functions

5.2 Graphing Polynomial Functions Locker LESSON 5. Graphing Polnomial Functions Common Core Math Standards The student is epected to: F.IF.7c Graph polnomial functions, identifing zeros when suitable factorizations are available, and showing

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

How do we obtain reliable estimates of performance measures?

How do we obtain reliable estimates of performance measures? How do we obtain reliable estimates of performance measures? 1 Estimating Model Performance How do we estimate performance measures? Error on training data? Also called resubstitution error. Not a good

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

3.6 Graphing Piecewise-Defined Functions and Shifting and Reflecting Graphs of Functions

3.6 Graphing Piecewise-Defined Functions and Shifting and Reflecting Graphs of Functions 76 CHAPTER Graphs and Functions Find the equation of each line. Write the equation in the form = a, = b, or = m + b. For Eercises through 7, write the equation in the form f = m + b.. Through (, 6) and

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology Intermediate Algebra Gregg Waterman Oregon Institute of Technolog c 2017 Gregg Waterman This work is licensed under the Creative Commons Attribution 4.0 International license. The essence of the license

More information

Stereo: the graph cut method

Stereo: the graph cut method Stereo: the graph cut method Last lecture we looked at a simple version of the Marr-Poggio algorithm for solving the binocular correspondence problem along epipolar lines in rectified images. The main

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

Introduction to Homogeneous Transformations & Robot Kinematics

Introduction to Homogeneous Transformations & Robot Kinematics Introduction to Homogeneous Transformations & Robot Kinematics Jennifer Ka, Rowan Universit Computer Science Department Januar 25. Drawing Dimensional Frames in 2 Dimensions We will be working in -D coordinates,

More information

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Earl Stopping Rich Caruana CALD, CMU 5 Forbes Ave. Pittsburgh, PA 53 caruana@cs.cmu.edu Steve Lawrence NEC Research Institute 4 Independence

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

2.4. Families of Polynomial Functions

2.4. Families of Polynomial Functions 2. Families of Polnomial Functions Crstal pieces for a large chandelier are to be cut according to the design shown. The graph shows how the design is created using polnomial functions. What do all the

More information

Answers Investigation 4

Answers Investigation 4 Answers Investigation Applications. a. At seconds, the flare will have traveled to a maimum height of 00 ft. b. The flare will hit the water when the height is 0 ft, which will occur at 0 seconds. c. In

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade

Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade What is SIFT? It is a technique for detecting salient stable feature points in an image. For ever such point it also provides a set of features

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

2.3 Polynomial Functions of Higher Degree with Modeling

2.3 Polynomial Functions of Higher Degree with Modeling SECTION 2.3 Polnomial Functions of Higher Degree with Modeling 185 2.3 Polnomial Functions of Higher Degree with Modeling What ou ll learn about Graphs of Polnomial Functions End Behavior of Polnomial

More information

Graphing Review. Math Tutorial Lab Special Topic

Graphing Review. Math Tutorial Lab Special Topic Graphing Review Math Tutorial Lab Special Topic Common Functions and Their Graphs Linear Functions A function f defined b a linear equation of the form = f() = m + b, where m and b are constants, is called

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

9. f(x) = x f(x) = x g(x) = 2x g(x) = 5 2x. 13. h(x) = 1 3x. 14. h(x) = 2x f(x) = x x. 16.

9. f(x) = x f(x) = x g(x) = 2x g(x) = 5 2x. 13. h(x) = 1 3x. 14. h(x) = 2x f(x) = x x. 16. Section 4.2 Absolute Value 367 4.2 Eercises For each of the functions in Eercises 1-8, as in Eamples 7 and 8 in the narrative, mark the critical value on a number line, then mark the sign of the epression

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

model order p weights The solution to this optimization problem is obtained by solving the linear system

model order p weights The solution to this optimization problem is obtained by solving the linear system CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples

More information

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024 Cross- Valida+on & ROC curve Anna Helena Reali Costa PCS 5024 Resampling Methods Involve repeatedly drawing samples from a training set and refibng a model on each sample. Used in model assessment (evalua+ng

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05 IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS Chapter 05 IOM 530: Intro. to Statistical Learning 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold

More information

Recognition Tools: Support Vector Machines

Recognition Tools: Support Vector Machines CS 2770: Computer Vision Recognition Tools: Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh January 12, 2017 Announcement TA office hours: Tuesday 4pm-6pm Wednesday 10am-12pm Matlab

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Introduction to Automated Text Analysis. bit.ly/poir599

Introduction to Automated Text Analysis. bit.ly/poir599 Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

Section 4.2 Graphing Lines

Section 4.2 Graphing Lines Section. Graphing Lines Objectives In this section, ou will learn to: To successfull complete this section, ou need to understand: Identif collinear points. The order of operations (1.) Graph the line

More information

Topic 2 Transformations of Functions

Topic 2 Transformations of Functions Week Topic Transformations of Functions Week Topic Transformations of Functions This topic can be a little trick, especiall when one problem has several transformations. We re going to work through each

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

Bias-Variance Analysis of Ensemble Learning

Bias-Variance Analysis of Ensemble Learning Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition

More information