LECTURE 6: CROSS VALIDATION

Size: px
Start display at page:

Download "LECTURE 6: CROSS VALIDATION"

Transcription

1 LECTURE 6: CROSS VALIDATION CSCI 4352 Machine Learning Dongchul Kim, Ph.D. Department of Computer Science

2 A Regression Problem Given a data set, how can we evaluate our (linear) model?

3 Cross Validation Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that the do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not alread seen. One wa to overcome this problem is to.. not use the entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on ``new'' data. This is the basic idea for a whole class of model evaluation methods called cross validation.

4 Which is best? Wh not choose the method with the best fit to the data?

5 What do we reall want? Wh not choose the method with the best fit to the data? How well are ou going to predict future data drawn from the same distribution?

6 The test set method 1. Randoml choose 30% of the data to be in a test set 2. The remainder is a training set

7 The test set method 1. Randoml choose 30% of the data to be in a test set 2. The remainder is a training set 3. Perform our regression on the training set (Linear regression eample)

8 The test set method 1. Randoml choose 30% of the data to be in a test set 2. The remainder is a training set 3. Perform our regression on the training set 4. Estimate our future performance with the test set (Linear regression eample) Mean Squared Error = 2.4

9 The test set method 1. Randoml choose 30% of the data to be in a test set 2. The remainder is a training set 3. Perform our regression on the training set 4. Estimate our future performance with the test set (Quadratic regression eample) Mean Squared Error = 0.9

10 The test set method 1. Randoml choose 30% of the data to be in a test set 2. The remainder is a training set 3. Perform our regression on the training set 4. Estimate our future performance with the test set (Join the dots eample) Mean Squared Error = 2.2

11 The test set method Good news: Ver ver simple Can then simpl choose the method with the best test-set score Bad news: Wastes data: we get an estimate of the best method to appl to 30% less data If we don t have much data, our test-set might just be luck or unluck We sa the test-set estimator of performance has high variance

12 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record

13 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset

14 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset 3. Train on the remaining R-1 datapoints

15 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error ( k, k )

16 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error.

17 LOOCV (Leave-one-out Cross Validation) For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV = 2.12

18 LOOCV for Quadratic Regression For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV =0. 962

19 LOOCV for Join The Dots For k=1 to R 1. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the dataset 3. Train on the remaining R-1 datapoints 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV =3. 33

20 Which kind of Cross Validation? Test-set Leaveone-out Downside Variance: unreliable estimate of future performance Epensive. Upside Cheap Doesn t waste data

21 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple)

22 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple) For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

23 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple) For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

24 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple) For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points.

25 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple) For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Linear Regression MSE 3FOLD =2.05 For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points. Then report the mean error

26 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple) For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Quadratic Regression MSE 3FOLD =1.11 For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points. Then report the mean error

27 k-fold Cross Validation Randoml break the dataset into k partitions (in our eample we ll have k=3 partitions colored Blue Green and Purple) For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Joint-the-dots MSE 3FOLD =2.93 For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the purple partition: Train on all the points not in the purple partition. Find the test-set sum of errors on the purple points. Then report the mean error

28 Which kind of Cross Validation? Test-set Leaveone-out 10-fold 3-fold R-fold Downside Variance: unreliable estimate of future performance Epensive. Wastes 10% of the data. 10 times more epensive than test set Wastier than 10-fold. Epensivier than test set Identical to Leave-one-out Upside Cheap Doesn t waste data Onl wastes 10%. Onl 10 times more epensive instead of R times. Slightl better than testset

29 CV is useful Preventing overfitting Comparing different learning algorithm Feature selection (see later)

30 Reference Dr. Andrew Moore s homepage

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler Lecture 7 Machine Learning: Cross Validation Outline Performance evaluation methods test/train sets cross-validation k-fold Leave-one-out 1 A

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Andrew W. Moore/Anna Goldenberg School of Computer Science Carnegie Mellon Universit Copright 2001, Andrew W. Moore Apr 1st, 2004 Want to learn

More information

Bias-Variance Decomposition Error Estimators Cross-Validation

Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data,

More information

Bias-Variance Decomposition Error Estimators

Bias-Variance Decomposition Error Estimators Bias-Variance Decomposition Error Estimators Cross-Validation Bias-Variance tradeoff Intuition Model too simple does not fit the data well a biased solution. Model too comple small changes to the data,

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

Machine Learning. Cross Validation

Machine Learning. Cross Validation Machine Learning Cross Validation Cross Validation Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication

More information

Overfitting, Model Selection, Cross Validation, Bias-Variance

Overfitting, Model Selection, Cross Validation, Bias-Variance Statistical Machine Learning Notes 2 Overfitting, Model Selection, Cross Validation, Bias-Variance Instructor: Justin Domke Motivation Suppose we have some data TRAIN = {(, ), ( 2, 2 ),..., ( N, N )} that

More information

Week 3. Topic 5 Asymptotes

Week 3. Topic 5 Asymptotes Week 3 Topic 5 Asmptotes Week 3 Topic 5 Asmptotes Introduction One of the strangest features of a graph is an asmptote. The come in three flavors: vertical, horizontal, and slant (also called oblique).

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

Flux Integrals. Solution. We want to visualize the surface together with the vector field. Here s a picture of exactly that:

Flux Integrals. Solution. We want to visualize the surface together with the vector field. Here s a picture of exactly that: Flu Integrals The pictures for problems # - #4 are on the last page.. Let s orient each of the three pictured surfaces so that the light side is considered to be the positie side. Decide whether each of

More information

Clustering Part 2. A Partitional Clustering

Clustering Part 2. A Partitional Clustering Universit of Florida CISE department Gator Engineering Clustering Part Dr. Sanja Ranka Professor Computer and Information Science and Engineering Universit of Florida, Gainesville Universit of Florida

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

Transformations of Functions. 1. Shifting, reflecting, and stretching graphs Symmetry of functions and equations

Transformations of Functions. 1. Shifting, reflecting, and stretching graphs Symmetry of functions and equations Chapter Transformations of Functions TOPICS.5.. Shifting, reflecting, and stretching graphs Smmetr of functions and equations TOPIC Horizontal Shifting/ Translation Horizontal Shifting/ Translation Shifting,

More information

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments Specifications Readability Assignments Assessment as per Schedule Oral Total 6 4 4 2 4 20 Date of Performance:... Expected Date of Completion:... Actual Date of Completion:... ----------------------------------------------------------------------------------------------------------------

More information

Using C# for Graphics and GUIs Handout #2

Using C# for Graphics and GUIs Handout #2 Using C# for Graphics and GUIs Handout #2 Learning Objectives: Review Math methods C# Arras Drawing Rectangles Getting height and width of window Coloring with FromArgb() Sierpinski Triangle 1 Math Methods

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Linear Feature Engineering 22

Linear Feature Engineering 22 Linear Feature Engineering 22 3 Overfitting Remember our dataset from last time. We have a bunch of inputs x i and corresponding outputs y i. In the previous lecture notes, we considered how to fit polynomials

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

Double Integrals in Polar Coordinates

Double Integrals in Polar Coordinates Double Integrals in Polar Coordinates. A flat plate is in the shape of the region in the first quadrant ling between the circles + and +. The densit of the plate at point, is + kilograms per square meter

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance

More information

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024 Cross- Valida+on & ROC curve Anna Helena Reali Costa PCS 5024 Resampling Methods Involve repeatedly drawing samples from a training set and refibng a model on each sample. Used in model assessment (evalua+ng

More information

Topic 2 Transformations of Functions

Topic 2 Transformations of Functions Week Topic Transformations of Functions Week Topic Transformations of Functions This topic can be a little trick, especiall when one problem has several transformations. We re going to work through each

More information

Non-linear models. Basis expansion. Overfitting. Regularization.

Non-linear models. Basis expansion. Overfitting. Regularization. Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................

More information

Graphing square root functions. What would be the base graph for the square root function? What is the table of values?

Graphing square root functions. What would be the base graph for the square root function? What is the table of values? Unit 3 (Chapter 2) Radical Functions (Square Root Functions Sketch graphs of radical functions b appling translations, stretches and reflections to the graph of Analze transformations to identif the of

More information

( ) ( ) Completing the Square. Alg 3 1 Rational Roots Solving Polynomial Equations. A Perfect Square Trinomials

( ) ( ) Completing the Square. Alg 3 1 Rational Roots Solving Polynomial Equations. A Perfect Square Trinomials Alg Completing the Square A Perfect Square Trinomials (± ) ± (± ) ± 4 4 (± ) ± 6 9 (± 4) ± 8 6 (± 5) ± 5 What is the relationship between the red term and the blue term? B. Creating perfect squares.. 6

More information

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology Intermediate Algebra Gregg Waterman Oregon Institute of Technolog c 2017 Gregg Waterman This work is licensed under the Creative Commons Attribution 4.0 International license. The essence of the license

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Determining the 2d transformation that brings one image into alignment (registers it) with another. And

Determining the 2d transformation that brings one image into alignment (registers it) with another. And Last two lectures: Representing an image as a weighted combination of other images. Toda: A different kind of coordinate sstem change. Solving the biggest problem in using eigenfaces? Toda Recognition

More information

12.4 The Ellipse. Standard Form of an Ellipse Centered at (0, 0) (0, b) (0, -b) center

12.4 The Ellipse. Standard Form of an Ellipse Centered at (0, 0) (0, b) (0, -b) center . The Ellipse The net one of our conic sections we would like to discuss is the ellipse. We will start b looking at the ellipse centered at the origin and then move it awa from the origin. Standard Form

More information

Special Products on Factoring

Special Products on Factoring Special Products on Factoring What Is This Module About? This module is a continuation of the module on polnomials. In the module entitled Studing Polnomials, ou learned what polnomials are as well as

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Lecture 12 Date:

Lecture 12 Date: Information Technolog Delhi Lecture 2 Date: 3.09.204 The Signal Flow Graph Information Technolog Delhi Signal Flow Graph Q: Using individual device scattering parameters to analze a comple microwave network

More information

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

2.4 Polynomial and Rational Functions

2.4 Polynomial and Rational Functions Polnomial Functions Given a linear function f() = m + b, we can add a square term, and get a quadratic function g() = a 2 + f() = a 2 + m + b. We can continue adding terms of higher degrees, e.g. we can

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Week 10. Topic 1 Polynomial Functions

Week 10. Topic 1 Polynomial Functions Week 10 Topic 1 Polnomial Functions 1 Week 10 Topic 1 Polnomial Functions Reading Polnomial functions result from adding power functions 1 together. Their graphs can be ver complicated, so the come up

More information

It s Not Complex Just Its Solutions Are Complex!

It s Not Complex Just Its Solutions Are Complex! It s Not Comple Just Its Solutions Are Comple! Solving Quadratics with Comple Solutions 15.5 Learning Goals In this lesson, ou will: Calculate comple roots of quadratic equations and comple zeros of quadratic

More information

THE INVERSE GRAPH. Finding the equation of the inverse. What is a function? LESSON

THE INVERSE GRAPH. Finding the equation of the inverse. What is a function? LESSON LESSON THE INVERSE GRAPH The reflection of a graph in the line = will be the graph of its inverse. f() f () The line = is drawn as the dotted line. Imagine folding the page along the dotted line, the two

More information

Resampling methods (Ch. 5 Intro)

Resampling methods (Ch. 5 Intro) Zavádějící faktor (Confounding factor), ale i 'současně působící faktor' Resampling methods (Ch. 5 Intro) Key terms: Train/Validation/Test data Crossvalitation One-leave-out = LOOCV Bootstrup key slides

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

Section 4.2 Graphing Lines

Section 4.2 Graphing Lines Section. Graphing Lines Objectives In this section, ou will learn to: To successfull complete this section, ou need to understand: Identif collinear points. The order of operations (1.) Graph the line

More information

y = f(x) x (x, f(x)) f(x) g(x) = f(x) + 2 (x, g(x)) 0 (0, 1) 1 3 (0, 3) 2 (2, 3) 3 5 (2, 5) 4 (4, 3) 3 5 (4, 5) 5 (5, 5) 5 7 (5, 7)

y = f(x) x (x, f(x)) f(x) g(x) = f(x) + 2 (x, g(x)) 0 (0, 1) 1 3 (0, 3) 2 (2, 3) 3 5 (2, 5) 4 (4, 3) 3 5 (4, 5) 5 (5, 5) 5 7 (5, 7) 0 Relations and Functions.7 Transformations In this section, we stud how the graphs of functions change, or transform, when certain specialized modifications are made to their formulas. The transformations

More information

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

More information

model order p weights The solution to this optimization problem is obtained by solving the linear system

model order p weights The solution to this optimization problem is obtained by solving the linear system CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)

More information

Introduction to Homogeneous Transformations & Robot Kinematics

Introduction to Homogeneous Transformations & Robot Kinematics Introduction to Homogeneous Transformations & Robot Kinematics Jennifer Ka Rowan Universit Computer Science Department. Drawing Dimensional Frames in 2 Dimensions We will be working in -D coordinates,

More information

3.2 Polynomial Functions of Higher Degree

3.2 Polynomial Functions of Higher Degree 71_00.qp 1/7/06 1: PM Page 6 Section. Polnomial Functions of Higher Degree 6. Polnomial Functions of Higher Degree What ou should learn Graphs of Polnomial Functions You should be able to sketch accurate

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

More Coordinate Graphs. How do we find coordinates on the graph?

More Coordinate Graphs. How do we find coordinates on the graph? Lesson Problem Solving: More Coordinate Graphs Problem Solving: More Coordinate Graphs How do we find coordinates on the graph? We use coordinates to find where the dot goes on the coordinate graph. From

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

How do we obtain reliable estimates of performance measures?

How do we obtain reliable estimates of performance measures? How do we obtain reliable estimates of performance measures? 1 Estimating Model Performance How do we estimate performance measures? Error on training data? Also called resubstitution error. Not a good

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

EECS 556 Image Processing W 09

EECS 556 Image Processing W 09 EECS 556 Image Processing W 09 Motion estimation Global vs. Local Motion Block Motion Estimation Optical Flow Estimation (normal equation) Man slides of this lecture are courtes of prof Milanfar (UCSC)

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

Lesson 2.1 Exercises, pages 90 96

Lesson 2.1 Exercises, pages 90 96 Lesson.1 Eercises, pages 9 96 A. a) Complete the table of values. 1 1 1 1 1. 1 b) For each function in part a, sketch its graph then state its domain and range. For : the domain is ; and the range is.

More information

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05 IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS Chapter 05 IOM 530: Intro. to Statistical Learning 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials

More information

Using a Table of Values to Sketch the Graph of a Polynomial Function

Using a Table of Values to Sketch the Graph of a Polynomial Function A point where the graph changes from decreasing to increasing is called a local minimum point. The -value of this point is less than those of neighbouring points. An inspection of the graphs of polnomial

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

CSE 446 Bias-Variance & Naïve Bayes

CSE 446 Bias-Variance & Naïve Bayes CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Hyperparameters and Validation Sets. Sargur N. Srihari

Hyperparameters and Validation Sets. Sargur N. Srihari Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

20 Calculus and Structures

20 Calculus and Structures 0 Calculus and Structures CHAPTER FUNCTIONS Calculus and Structures Copright LESSON FUNCTIONS. FUNCTIONS A function f is a relationship between an input and an output and a set of instructions as to how

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017 Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum

More information

I211: Information infrastructure II

I211: Information infrastructure II Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

More information

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining) Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what

More information

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias

More information

Optional: Building a processor from scratch

Optional: Building a processor from scratch Optional: Building a processor from scratch In this assignment we are going build a computer processor from the ground up, starting with transistors, and ending with a small but powerful processor. The

More information

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

M O T I O N A N D D R A W I N G

M O T I O N A N D D R A W I N G 2 M O T I O N A N D D R A W I N G Now that ou know our wa around the interface, ou re read to use more of Scratch s programming tools. In this chapter, ou ll do the following: Eplore Scratch s motion and

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

5.2 Graphing Polynomial Functions

5.2 Graphing Polynomial Functions Name Class Date 5.2 Graphing Polnomial Functions Essential Question: How do ou sketch the graph of a polnomial function in intercept form? Eplore 1 Investigating the End Behavior of the Graphs of Simple

More information

GLMSELECT for Model Selection

GLMSELECT for Model Selection Winnipeg SAS User Group Meeting May 11, 2012 GLMSELECT for Model Selection Sylvain Tremblay SAS Canada Education Copyright 2010 SAS Institute Inc. All rights reserved. Proc GLM Proc REG Class Statement

More information

Machine Learning - Lecture 2: Nearest-neighbour methods

Machine Learning - Lecture 2: Nearest-neighbour methods Machine Learning - Lecture 2: Nearest-neighbour methods Chris Thornton January 8, 22 Brighton pier Switchback Data Data Vertical axis is age; horizontal axis is alchohol consumption per week. Data Vertical

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Model Complexity and Generalization

Model Complexity and Generalization HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization

More information

Modeling with CMU Mini-FEA Program

Modeling with CMU Mini-FEA Program Modeling with CMU Mini-FEA Program Introduction Finite element analsis (FEA) allows ou analze the stresses and displacements in a bod when forces are applied. FEA determines the stresses and displacements

More information

2.4. Families of Polynomial Functions

2.4. Families of Polynomial Functions 2. Families of Polnomial Functions Crstal pieces for a large chandelier are to be cut according to the design shown. The graph shows how the design is created using polnomial functions. What do all the

More information

Lecture 19: Decision trees

Lecture 19: Decision trees Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2

More information

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin CS 343H: Honors AI Lecture 23: Kernels and clustering 4/15/2014 Kristen Grauman UT Austin Slides courtesy of Dan Klein, except where otherwise noted Announcements Office hours Kim s office hours this week:

More information

MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. 1. MAT 456 Applied Regression Analysis

MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. 1. MAT 456 Applied Regression Analysis MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. The Part II comprehensive examination is a three-hour closed-book exam that is offered on the second

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information