Lecture 8: Grid Search and Model Validation Continued

Similar documents
Lecture #11: The Perceptron

Lecture #17: Autoencoders and Random Forests with R. Mat Kallada Introduction to Data Mining with R

Introduction to Artificial Intelligence

k-nearest Neighbors + Model Selection

CSC 411 Lecture 4: Ensembles I

MACHINE LEARNING TOOLBOX. Random forests and wine

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Large Scale Data Analysis Using Deep Learning

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Introduction to Automated Text Analysis. bit.ly/poir599

CS489/698 Lecture 2: January 8 th, 2018

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Hands on Datamining & Machine Learning with Weka

CS5670: Computer Vision

PROBLEM 4

Machine Learning (CSE 446): Perceptron

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Applying Supervised Learning

CIS192 Python Programming

Evaluating Classifiers

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT

CSE 446 Bias-Variance & Naïve Bayes

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Applied Scaling & Classification Techniques in Political Science. Lecture 6 Dictionaries and Supervised classification methods

Evaluating Classifiers

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Lecture 7. CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler. Outline. Machine Learning: Cross Validation. Performance evaluation methods

CPSC 340: Machine Learning and Data Mining

K- Nearest Neighbors(KNN) And Predictive Accuracy

Lecture 25: Review I

Nearest Neighbor Classification

Lecture 37: ConvNets (Cont d) and Training

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

6.034 Design Assignment 2

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

Model Complexity and Generalization

Summary. Machine Learning: Introduction. Marcin Sydow

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

CSE 573: Artificial Intelligence Autumn 2010

Outline. Announcements. Homework 2. Boolean expressions 10/12/2007. Announcements Homework 2 questions. Boolean expression

Image Classification pipeline. Lecture 2-1

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Intro to Artificial Intelligence

Image Classification pipeline. Lecture 2-1

The Problem of Overfitting with Maximum Likelihood

Classification and K-Nearest Neighbors

Using Machine Learning to Optimize Storage Systems

CSCI567 Machine Learning (Fall 2014)

Notes on Multilayer, Feedforward Neural Networks

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Machine Learning (CSE 446): Practical Issues

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

CS 4349 Lecture September 13th, 2017

Introduction to Machine Learning. Xiaojin Zhu

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Data Mining Classification - Part 1 -

Lecture 20: Neural Networks for NLP. Zubin Pahuja

CSC411/2515 Tutorial: K-NN and Decision Tree

CSC 411: Lecture 02: Linear Regression

CSE 158. Web Mining and Recommender Systems. Midterm recap

Chapter 8 The C 4.5*stat algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Performance Evaluation of Various Classification Algorithms

CS 340 Lec. 4: K-Nearest Neighbors

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Perceptron: This is convolution!

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Network Traffic Measurements and Analysis

Supervised Learning: The Setup. Spring 2018

DETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018

Data Mining: STATISTICA

Ch.1 Introduction. Why Machine Learning (ML)?

Decision trees. For this lab, we will use the Carseats data set from the ISLR package. (install and) load the package with the data set

Recognition Tools: Support Vector Machines

Robust PDF Table Locator

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Lecture 19: Decision trees

Supervised Learning: K-Nearest Neighbors and Decision Trees

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Regularization and model selection

A Short Introduction to the caret Package

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Prediction of Dialysis Length. Adrian Loy, Antje Schubotz 2 February 2017

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Experimental Design + k- Nearest Neighbors

Data Mining Concepts & Techniques

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

A Short Introduction to the caret Package

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Transcription:

Lecture 8: Grid Search and Model Validation Continued Mat Kallada STAT2450 - Introduction to Data Mining with R

Outline for Today Model Validation Grid Search

Some Preliminary Notes Thank you for submitting Assignment 1! Assignment 2 has been posted.

Validating Predictive models We build predictive models using a supervised data mining method Decision Trees, KNN, Support Vector Machines, Neural Networks.. There are many supervised data mining techniques out there to build predictive models It is important that we validate if they actually work

Validating Predictive models What were the two approaches we saw to validate predictive models? Before I use this to predict stock prices, maybe I should test if this actually works That is - to get an estimate of how well it ll work in the real-world.

Validating Predictive models Hold-out validation: Split data into test/train; test for determining performance K-Fold Cross-Validation: Split data into test/train K different times; Average the results. These are the two main approaches to test whether a data mining method will actually work in the real-world

Hold-out validation: In a Nutshell Cut the rows of the data set into two separate parts Train predictive model on one part Validate performance on other part This method makes sense intuitively. Most of people use 25% for testing as a rule of thumb

Hold-out validation This model should work 99% of the time in the realworld. Let s use this and predict the stock!

Hold-out Validation We can compute two statistics here... Training score: How well does the model predict observations it has been trained on? Testing score: How well does the model work on unseen observations in the realworld?

Hold-out Validation: Illustrated This is our original data set. Height Species 2.3 Cat 4.65 Dog There is only one feature in this data set. 6.87 Cat 3.5 Cat 6.3 Dog 7.4 Cat

Hold-out Validation: Illustrated Split the data into two separate parts. Training Set Testing Set Height Species Height Species 6.87 Cat 3.5 Cat 2.3 Cat 4.65 Dog 6.3 Dog 7.4 Cat Typically 25% for testing provides a good assessment.

Hold-out Validation: Illustrated Build a predictive model using the training rows only Training Set Height Species 6.87 Cat 3.5 Cat 6.3 Dog Testing Set Height Species 2.3 Cat 4.65 Dog 7.4 Cat Predictive model

Hold-out Validation: Illustrated Validate whether predictive model works on the test set to get testing score Height Testing Set Species <2.3> Cat 2.3 Cat <4.65> Dog X 4.65 Dog Predictive model got 1 out of 2 predictions correctly. The classification accuracy on test set is 50%

Hold-out Validation: Illustrated Validate whether predictive model works on the training set to get training score Training Set Height Species 6.87 Cat 3.5 Cat 6.3 Dog 7.4 Cat <6.87> Cat <3.5> Cat <6.3> Dog <7.4> Dog X Predictive model got 3 out of 4 predictions correctly. The classification accuracy on training set is 75%

Hold-out Validation: Testing Score What does a high testing score mean?

Hold-out Validation: Testing Score What does a high testing score mean? It will probably work well in the real-world. Likely did not overfit or underfit the data.

Hold-out Validation: Testing Score What does a high training score mean?

Hold-out Validation: Training Score What does a high training score mean? It might have overfit the data.

Hold-out Validation: Training Score What does a high training score mean? It might have overfit the data. It predicted all the observations it trained on correctly.

Issues with Hold-out Validation What if the testing rows are easy? What if testing rows don t truly assess the predictive model? A more robust measure of real-world performance is K-Fold Cross Validation

2-Fold Cross Validation

K-Fold Cross Validation Wait - you have K predictive models now. Which one do you use in the real-world? In hold-out validation, we only had one predictive model.

K-Fold Cross Validation K-Fold Cross Validation gives a real-world estimate over the entire data set. It tells you that: This data mining method will perform X well on the entire data set

K-Fold Cross Validation After you compute K-Fold Cross-Validation score and are happy with the score... You should train a model using the entire dataset using that data mining technique

Model Validation: A Summary Two main ways to validate predictive models and answer the question: How likely will this predictive model work in the real-world? There are also two main statistics that you can compute when validating models

Hold-out Validation in R We can specify use trainingcontrol parameter to determine the evaluation technique we want. train_control = traincontrol(method="lgocv", p=0.75, number=1).. model = train(iris[,1:4], iris[,5], method = "knn", trcontrol = train_control, tunegrid=knn_options)

Hold-out Validation in R DataJoy Example: https://www.getdatajoy.com/examples/56aa2ce939dc02266e7b0322

K-Fold Cross-Validation Validation in R To do K-Fold Cross Validation, all we need to do is change the method and the number at hand! train_control = traincontrol(method="cv", number=10).. model = train(iris[,1:4], iris[,5], method = "knn", trcontrol = train_control, tunegrid=knn_options)

Hold-out Validation in R DataJoy Example: https://www.getdatajoy.com/examples/56aa2faf1d2486f244a6943f

Outline for Today Model Validation Grid Search

Hyperparameters: What are they again? Things you specify before running a data mining method

Hyperparameters: What are they again? Things you specify before running a data mining method These define how complex your predictive model will be.

Hyperparameters: What are they again? Things you specify before running a data mining method These define how complex your predictive model will be. This includes things like K in KNN, cp and max_depth in Decision Trees As well as C and the Kernel for Support Vector Machines

Tuning Hyperparameters properly We need to specify them before running a data mining technique

Tuning Hyperparameters properly Finding the right combination of hyperparameters is called tuning In other words, finding the optimal combination of hyperparameters which leads to the best performance Why do we need to tune hyperparameters?

Why do we need to tune hyperparameters? We need to conform to the noise and nature of a dataset. When does K-nearest Neighbours with K=1 gives good predictive models? When does K-nearest Neighbours with K=10 gives good predictive models?

How do we pick these values (like K in KNN)? We said that we will try all of possible combinations; pick the one with the best score.

The sweet spot for hyperparameters Some combination of hyperparameters will give you the best generalization score

Formally called the Grid Search approach Define a grid of all possible combinations. Try them all out. Use the one with the best testing score score

Grid Search Define all possible values in a grid Try all combinations of those values Pick the combination that has best real-world generalization

Grid Search in R tunegrid specifies the grid that we are searching against. When you run train, R will search for all possible hyperparameters. We ve been using tunegrid with a constant value. # Search for the K value from 1 to 6 knn_options = data.frame(k=1:6)

Grid Searching in R DataJoy Example: https://www.getdatajoy.com/examples/56aa32b91d2486f244a69442

That s all for today. Start Assignment 2. We will look into visualization techniques next week. Have a good weekend!