Bias-Variance Decomposition Error Estimators
|
|
- Paulina Powers
- 5 years ago
- Views:
Transcription
1 Bias-Variance Decomposition Error Estimators Cross-Validation
2 Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data, solution changes a lot high-variance solution.
3 Epected Loss Let h = E[t ] be the optimal predictor and our actual predictor, which will incur the following epected loss + = + + = + = dt p t h d p h t h t h h h E ddt t p t h h t E, + = dt p t h d p h 0 ] [ since = dt t p -t t E + = dt p t h d p h t E Noise term Source of error 1
4 Sources of error 1 noise What if we have perfect learner, infinite data? Our learning solution satisfies =h Still have remaining, unavoidable error of σ due to noise ε t t p f = t p dtd h h+ ε = ε, ε ~ N0, σ E[ ε ] = σ
5 Bias-variance decomposition Focus on Let us first eamine epected loss averaging over data sets. For one data set D and one test point : d p h ], [ ], [,, h D E D E D h D D D + = Hence ], [ ], [, ], [ ], [, h D E D E D h D E D E D D D D D + + = variance bias ], [, ], [, + = + = D E D E h D E h D E D D D D
6 Bias Suppose ou are given a dataset D with m samples from some distribution. You learn function from data D If ou sample a different datasets, ou will learn different Epected hpothesis: E D [,D] Bias: difference between what ou epect to learn and the truth. Measures how well ou epect to represent true solution Decreases with more comple model bias = E [, D] h p d D Epected to learn True model
7 Variance Variance: difference between what ou epect to learn and what ou learn from a particular dataset Measures how sensitive learner is to a specific dataset Decreases with simpler model var iance, D E [, D] p d = E D D Learned for D Epected to learn, averaged over datasets
8 Bias-Variance Tradeoff Choice of hpothesis class introduces learning bias More comple class less bias More comple class more variance
9 The Epected Prediction Error The epected prediction squared error over fied size training sets D drawn from PX,T can be epressed as sum of three components: unavoidable error + bias + variance where unavoidable error=σ bias = E [ ] h p d D var iance= E [ ] E [ ] p d D D
10 Higher variance Lower bias
11 Bias-Variance Decomposition Graphical Representation
12 Training Error Given a training data, choose a loss function. e.g., squared error L for regression Training set error: For a particular set of parameters, loss function on training data: Err train N 1 train M w = t i Ntrain i= 1 j= 1 w i j
13 Training Error vs. Model Compleit Error Training error low Model compleit high
14 Prediction Generalization Error Training set error can be poor measure of qualit" of solution Prediction error: We reall care about error over all possible input points, not just training data: M d w t w t E w Err M j j i M j j i true = = = = X 1 1 X
15 Prediction Error vs. Model Compleit True Error Error Training error low want Model compleit high
16 Wh training set error doesn t approimate prediction error? Sampling approimation of prediction error: Err Training error Err train true w 1 p p Ver similar equations!!! M t i= 1 j= 1 w Wh is training set a bad measure of prediction error??? i j N train M 1 w = t i wi j N train i= 1 j= 1
17 Wh is training set a bad measure of prediction error??? Training error is a good estimate for a single w, But we optimized w with respect to the training error, and found w that is good for this set of samples. Training error is a optimisticall biased estimate of prediction error
18 Test Error Given a dataset, randoml split it into two parts: Training data { 1,, Ntrain } Test data { 1,, Ntest } Use training data to optimize parameters w Test set error: For the final solution w*, evaluate the error using: Err 1 N test M * test w = t i Ntest i= 1 j= 1 Unbiased but has variance w * i Test set onl unbiased if ou never do an learning on the test data For eample, if ou use the test set to select the degree of the polnomial no longer unbiased!!! j
19 Test Set Error vs. Model Compleit Test Error Error Training error True Error low want Model comleit high
20 How man points to use for training/testing? Ver hard question to answer! Too few training points, learned w is bad Too few test points, ou never know if ou reached a good solution Theor proposes error bounds advanced course Tpicall: if ou have a reasonable amount of data, pick test set large enough for a reasonable estimate of error, and use the rest for learning if ou have little data, then ou need to use some special techniques e.g., bootstrapping
21 Error as a function of number of training eamples for a fied model compleit Test error Error Training error Little data Infinite data
22 Overfitting With too few training eamples our model ma achieve zero training error but never the less has a large generalization error When the training error no longer bears an relation to the generalization error the model overfits the data
23 Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them to fit our own needs. PowerPoint originals are available. If ou make use of a significant portion of these slides in our own lecture, please include this message, or the following link to the source repositor of Andrew s tutorials: Comments and corrections gratefull received. Andrew W. Moore Professor School of Computer Science Carnegie Mellon Universit awm@cs.cmu.edu Copright Andrew W. Moore Slide 3
24 A Regression Problem = f + noise Can we learn f from this data? Let s consider three methods Copright Andrew W. Moore Slide 4
25 Linear Regression Copright Andrew W. Moore Slide 5
26 Quadratic Regression Copright Andrew W. Moore Slide 6
27 Join-the-dots Also known as piecewise linear nonparametric regression if that makes ou feel better Copright Andrew W. Moore Slide 7
28 Which is best? Wh not choose the method with the best fit to the data? Copright Andrew W. Moore Slide 8
29 What do we reall want? Wh not choose the method with the best fit to the data? How well are ou going to predict future data drawn from the same distribution? Copright Andrew W. Moore Slide 9
30 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set Copright Andrew W. Moore Slide 30
31 The test set method 1. Randoml choose 30% of the data to be in a test set. The remainder is a training set 3. Perform our regression on the training set Linear regression eample Copright Andrew W. Moore Slide 31
32 The test set method 1. Randoml choose 30% of the data to be in a test set Linear regression eample Mean Squared Error =.4. The remainder is a training set 3. Perform our regression on the training set 4. Estimate our future performance with the test set Copright Andrew W. Moore Slide 3
33 The test set method 1. Randoml choose 30% of the data to be in a test set Quadratic regression eample Mean Squared Error = 0.9. The remainder is a training set 3. Perform our regression on the training set 4. Estimate our future performance with the test set Copright Andrew W. Moore Slide 33
34 The test set method 1. Randoml choose 30% of the data to be in a test set Join the dots eample Mean Squared Error =.. The remainder is a training set 3. Perform our regression on the training set 4. Estimate our future performance with the test set Copright Andrew W. Moore Slide 34
35 Good news: Ver ver simple The test set method Can then simpl choose the method with the best test-set score Bad news: What s the downside? Copright Andrew W. Moore Slide 35
36 Good news: Ver ver simple The test set method Can then simpl choose the method with the best test-set score Bad news: Wastes data: we get an estimate of the best method to appl to 30% less data If we don t have much data, our test-set might just be luck or unluck We sa the test-set estimator of performance has high variance Copright Andrew W. Moore Slide 36
37 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record Copright Andrew W. Moore Slide 37
38 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset Copright Andrew W. Moore Slide 38
39 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints Copright Andrew W. Moore Slide 39
40 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k Copright Andrew W. Moore Slide 40
41 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. Copright Andrew W. Moore Slide 41
42 LOOCV Leave-one-out Cross Validation For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. MSE LOOCV =.1 Copright Andrew W. Moore Slide 4
43 LOOCV for Quadratic Regression For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. MSE LOOCV =0.96 Copright Andrew W. Moore Slide 43
44 LOOCV for Join The Dots For k=1 to R 1. Let k, k be the k th record. Temporaril remove k, k from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error k, k When ou ve done all points, report the mean error. MSE LOOCV =3.33 Copright Andrew W. Moore Slide 44
45 Which kind of Cross Validation? Test-set Leaveone-out Downside Variance: unreliable estimate of future performance Epensive. Has some weird behavior Upside Cheap Doesn t waste data..can we get the best of both worlds? Copright Andrew W. Moore Slide 45
46 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue Copright Andrew W. Moore Slide 46
47 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. Copright Andrew W. Moore Slide 47
48 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. Copright Andrew W. Moore Slide 48
49 k-fold Cross Validation Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Copright Andrew W. Moore Slide 49
50 k-fold Cross Validation Linear Regression MSE 3FOLD =.05 Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright Andrew W. Moore Slide 50
51 k-fold Cross Validation Quadratic Regression MSE 3FOLD =1.11 Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright Andrew W. Moore Slide 51
52 k-fold Cross Validation Joint-the-dots MSE 3FOLD =.93 Randoml break the dataset into k partitions in our eample we ll have k=3 partitions colored Red Green and Blue For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error Copright Andrew W. Moore Slide 5
53 Which kind of Cross Validation? Test-set Leave- one-out 10-fold 3-fold R-fold Downside Variance: unreliable estimate of future performance Epensive. Has some weird behavior Wastes 10% of the data. 10 times more epensive than test set Wastier than 10-fold. Epensivier than test set Identical to Leave-one-out Upside Cheap Doesn t waste data Onl wastes 10%. Onl 10 times more epensive instead of R times. Slightl better than testset Copright Andrew W. Moore Slide 53
Bias-Variance Decomposition Error Estimators Cross-Validation
Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data,
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationLECTURE 6: CROSS VALIDATION
LECTURE 6: CROSS VALIDATION CSCI 4352 Machine Learning Dongchul Kim, Ph.D. Department of Computer Science A Regression Problem Given a data set, how can we evaluate our (linear) model? Cross Validation
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Andrew W. Moore/Anna Goldenberg School of Computer Science Carnegie Mellon Universit Copright 2001, Andrew W. Moore Apr 1st, 2004 Want to learn
More informationLecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods
CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler Lecture 7 Machine Learning: Cross Validation Outline Performance evaluation methods test/train sets cross-validation k-fold Leave-one-out 1 A
More informationOverfitting, Model Selection, Cross Validation, Bias-Variance
Statistical Machine Learning Notes 2 Overfitting, Model Selection, Cross Validation, Bias-Variance Instructor: Justin Domke Motivation Suppose we have some data TRAIN = {(, ), ( 2, 2 ),..., ( N, N )} that
More informationNon-linear models. Basis expansion. Overfitting. Regularization.
Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................
More informationPerformance Evaluation
Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance
More informationOverfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff
Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More
More informationMachine Learning. Cross Validation
Machine Learning Cross Validation Cross Validation Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationK-means and Hierarchical Clustering
K-means and Hierarchical Clustering Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationCross-validation. Cross-validation is a resampling method.
Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,
More informationEE 511 Linear Regression
EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due
More informationCSC 411 Lecture 4: Ensembles I
CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:
More informationModel Complexity and Generalization
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationClustering Part 2. A Partitional Clustering
Universit of Florida CISE department Gator Engineering Clustering Part Dr. Sanja Ranka Professor Computer and Information Science and Engineering Universit of Florida, Gainesville Universit of Florida
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationLinear Feature Engineering 22
Linear Feature Engineering 22 3 Overfitting Remember our dataset from last time. We have a bunch of inputs x i and corresponding outputs y i. In the previous lecture notes, we considered how to fit polynomials
More information6.867 Machine learning
6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationGraphing square root functions. What would be the base graph for the square root function? What is the table of values?
Unit 3 (Chapter 2) Radical Functions (Square Root Functions Sketch graphs of radical functions b appling translations, stretches and reflections to the graph of Analze transformations to identif the of
More information2.4 Polynomial and Rational Functions
Polnomial Functions Given a linear function f() = m + b, we can add a square term, and get a quadratic function g() = a 2 + f() = a 2 + m + b. We can continue adding terms of higher degrees, e.g. we can
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationIntroduction to Homogeneous Transformations & Robot Kinematics
Introduction to Homogeneous Transformations & Robot Kinematics Jennifer Ka Rowan Universit Computer Science Department. Drawing Dimensional Frames in 2 Dimensions We will be working in -D coordinates,
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationSECTION 3-4 Rational Functions
20 3 Polnomial and Rational Functions 0. Shipping. A shipping bo is reinforced with steel bands in all three directions (see the figure). A total of 20. feet of steel tape is to be used, with 6 inches
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias
More informationRESAMPLING METHODS. Chapter 05
1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More information6.867 Machine learning
6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More information5.2. Exploring Quotients of Polynomial Functions. EXPLORE the Math. Each row shows the graphs of two polynomial functions.
YOU WILL NEED graph paper coloured pencils or pens graphing calculator or graphing software Eploring Quotients of Polnomial Functions EXPLORE the Math Each row shows the graphs of two polnomial functions.
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationPartial Fraction Decomposition
Section 7. Partial Fractions 53 Partial Fraction Decomposition Algebraic techniques for determining the constants in the numerators of partial fractions are demonstrated in the eamples that follow. Note
More informationWeek 3. Topic 5 Asymptotes
Week 3 Topic 5 Asmptotes Week 3 Topic 5 Asmptotes Introduction One of the strangest features of a graph is an asmptote. The come in three flavors: vertical, horizontal, and slant (also called oblique).
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationTransformations of Functions. 1. Shifting, reflecting, and stretching graphs Symmetry of functions and equations
Chapter Transformations of Functions TOPICS.5.. Shifting, reflecting, and stretching graphs Smmetr of functions and equations TOPIC Horizontal Shifting/ Translation Horizontal Shifting/ Translation Shifting,
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationChapter 3. Interpolation. 3.1 Introduction
Chapter 3 Interpolation 3 Introduction One of the fundamental problems in Numerical Methods is the problem of interpolation, that is given a set of data points ( k, k ) for k =,, n, how do we find a function
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More informationGraphing Polynomial Functions
LESSON 7 Graphing Polnomial Functions Graphs of Cubic and Quartic Functions UNDERSTAND A parent function is the most basic function of a famil of functions. It preserves the shape of the entire famil.
More informationIt s Not Complex Just Its Solutions Are Complex!
It s Not Comple Just Its Solutions Are Comple! Solving Quadratics with Comple Solutions 15.5 Learning Goals In this lesson, ou will: Calculate comple roots of quadratic equations and comple zeros of quadratic
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationUVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons
More informationHyperparameters and Validation Sets. Sargur N. Srihari
Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationAnswers. Investigation 4. ACE Assignment Choices. Applications
Answers Investigation ACE Assignment Choices Problem. Core Other Connections, ; Etensions ; unassigned choices from previous problems Problem. Core, 7 Other Applications, ; Connections ; Etensions ; unassigned
More informationInvestigation Free Fall
Investigation Free Fall Name Period Date You will need: a motion sensor, a small pillow or other soft object What function models the height of an object falling due to the force of gravit? Use a motion
More informationLecture 12 Date:
Information Technolog Delhi Lecture 2 Date: 3.09.204 The Signal Flow Graph Information Technolog Delhi Signal Flow Graph Q: Using individual device scattering parameters to analze a comple microwave network
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More information5.2 Graphing Polynomial Functions
Locker LESSON 5. Graphing Polnomial Functions Common Core Math Standards The student is epected to: F.IF.7c Graph polnomial functions, identifing zeros when suitable factorizations are available, and showing
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationHow do we obtain reliable estimates of performance measures?
How do we obtain reliable estimates of performance measures? 1 Estimating Model Performance How do we estimate performance measures? Error on training data? Also called resubstitution error. Not a good
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More information3.6 Graphing Piecewise-Defined Functions and Shifting and Reflecting Graphs of Functions
76 CHAPTER Graphs and Functions Find the equation of each line. Write the equation in the form = a, = b, or = m + b. For Eercises through 7, write the equation in the form f = m + b.. Through (, 6) and
More informationIntermediate Algebra. Gregg Waterman Oregon Institute of Technology
Intermediate Algebra Gregg Waterman Oregon Institute of Technolog c 2017 Gregg Waterman This work is licensed under the Creative Commons Attribution 4.0 International license. The essence of the license
More informationStereo: the graph cut method
Stereo: the graph cut method Last lecture we looked at a simple version of the Marr-Poggio algorithm for solving the binocular correspondence problem along epipolar lines in rectified images. The main
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationAlgorithms: Decision Trees
Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders
More informationIntroduction to Homogeneous Transformations & Robot Kinematics
Introduction to Homogeneous Transformations & Robot Kinematics Jennifer Ka, Rowan Universit Computer Science Department Januar 25. Drawing Dimensional Frames in 2 Dimensions We will be working in -D coordinates,
More informationOverfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Earl Stopping Rich Caruana CALD, CMU 5 Forbes Ave. Pittsburgh, PA 53 caruana@cs.cmu.edu Steve Lawrence NEC Research Institute 4 Independence
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More information2.4. Families of Polynomial Functions
2. Families of Polnomial Functions Crstal pieces for a large chandelier are to be cut according to the design shown. The graph shows how the design is created using polnomial functions. What do all the
More informationAnswers Investigation 4
Answers Investigation Applications. a. At seconds, the flare will have traveled to a maimum height of 00 ft. b. The flare will hit the water when the height is 0 ft, which will occur at 0 seconds. c. In
More informationScale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade
Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade What is SIFT? It is a technique for detecting salient stable feature points in an image. For ever such point it also provides a set of features
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationSection 1.4 Limits involving infinity
Section. Limits involving infinit (/3/08) Overview: In later chapters we will need notation and terminolog to describe the behavior of functions in cases where the variable or the value of the function
More information2.3 Polynomial Functions of Higher Degree with Modeling
SECTION 2.3 Polnomial Functions of Higher Degree with Modeling 185 2.3 Polnomial Functions of Higher Degree with Modeling What ou ll learn about Graphs of Polnomial Functions End Behavior of Polnomial
More informationGraphing Review. Math Tutorial Lab Special Topic
Graphing Review Math Tutorial Lab Special Topic Common Functions and Their Graphs Linear Functions A function f defined b a linear equation of the form = f() = m + b, where m and b are constants, is called
More information2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy
2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.
More information9. f(x) = x f(x) = x g(x) = 2x g(x) = 5 2x. 13. h(x) = 1 3x. 14. h(x) = 2x f(x) = x x. 16.
Section 4.2 Absolute Value 367 4.2 Eercises For each of the functions in Eercises 1-8, as in Eamples 7 and 8 in the narrative, mark the critical value on a number line, then mark the sign of the epression
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationmodel order p weights The solution to this optimization problem is obtained by solving the linear system
CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples
More informationCross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024
Cross- Valida+on & ROC curve Anna Helena Reali Costa PCS 5024 Resampling Methods Involve repeatedly drawing samples from a training set and refibng a model on each sample. Used in model assessment (evalua+ng
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationIOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05
IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS Chapter 05 IOM 530: Intro. to Statistical Learning 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold
More informationRecognition Tools: Support Vector Machines
CS 2770: Computer Vision Recognition Tools: Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh January 12, 2017 Announcement TA office hours: Tuesday 4pm-6pm Wednesday 10am-12pm Matlab
More informationIntroduction to Automated Text Analysis. bit.ly/poir599
Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationSection 4.2 Graphing Lines
Section. Graphing Lines Objectives In this section, ou will learn to: To successfull complete this section, ou need to understand: Identif collinear points. The order of operations (1.) Graph the line
More informationTopic 2 Transformations of Functions
Week Topic Transformations of Functions Week Topic Transformations of Functions This topic can be a little trick, especiall when one problem has several transformations. We re going to work through each
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More information