INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Similar documents
Evaluating Classifiers

Evaluating Classifiers

Weka ( )

CS145: INTRODUCTION TO DATA MINING

CS4491/CS 7265 BIG DATA ANALYTICS

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Large Scale Data Analysis Using Deep Learning

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Artificial Intelligence. Programming Styles

Network Traffic Measurements and Analysis

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Introduction to Automated Text Analysis. bit.ly/poir599

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Classification. Instructor: Wei Ding

Distributed Logistic Model Trees, Stratio Intelligence Mateo Álvarez and Antonio

2. On classification and related tasks

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Stages of (Batch) Machine Learning

Evaluating Classifiers

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Machine Learning and Bioinformatics 機器學習與生物資訊學

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

CISC 4631 Data Mining

Classification Part 4

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Lecture 25: Review I

CS249: ADVANCED DATA MINING

ECLT 5810 Evaluation of Classification Quality

Application of Support Vector Machine Algorithm in Spam Filtering

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Pattern recognition (4)

Evaluating Machine-Learning Methods. Goals for the lecture

Business Club. Decision Trees

CSE 158. Web Mining and Recommender Systems. Midterm recap

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Model Complexity and Generalization

Model s Performance Measures

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Lecture on Modeling Tools for Clustering & Regression

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Supervised vs unsupervised clustering

I211: Information infrastructure II

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

CISC 4631 Data Mining

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Data Mining Concepts & Techniques

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

List of Exercises: Data Mining 1 December 12th, 2015

Global Probability of Boundary

Using Machine Learning to Optimize Storage Systems

Evaluating Machine Learning Methods: Part 1

Topics in Machine Learning

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Classification: Feature Vectors

Machine Learning for. Artem Lind & Aleskandr Tkachenko

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

K- Nearest Neighbors(KNN) And Predictive Accuracy

CS 584 Data Mining. Classification 3

1 Machine Learning System Design

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Classification and K-Nearest Neighbors

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Machine Learning Techniques for Data Mining

Machine learning in fmri

CS 229 Midterm Review

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

Slides for Data Mining by I. H. Witten and E. Frank

CSE 573: Artificial Intelligence Autumn 2010

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

CS145: INTRODUCTION TO DATA MINING

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

CSC411/2515 Tutorial: K-NN and Decision Tree

Knowledge Discovery and Data Mining

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER

Tree-based methods for classification and regression

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

DATA MINING OVERFITTING AND EVALUATION

Ester Bernadó-Mansilla. Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A.

Probabilistic Classifiers DWML, /27

INF 4300 Classification III Anne Solberg The agenda today:

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

CSC 411 Lecture 4: Ensembles I

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Hyperparameters and Validation Sets. Sargur N. Srihari

Machine Learning in Biology

10 Classification: Evaluation

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Transcription:

INTRODUCTION TO MACHINE LEARNING Measuring model performance or error

Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

Classification Accuracy and Error System is right or wrong Accuracy goes up when Error goes down correctly classified instances Accuracy = total amount of classified instances Error = 1 - Accuracy

Example Squares with 2 features: small/big and solid/dotted Label: colored/not colored Binary classification problem

Example Truth Predicted = 3 = 5 60%

Example Truth Predicted 3 = = 5 60%

Limits of accuracy Classifying very rare heart disease Classify all as negative (not sick) Predict 99 correct (not sick) and miss 1 Accuracy: 99% Bogus you miss every positive case!

Confusion matrix Rows and columns contain all available labels Each cell contains frequency of instances that are classified in a certain way

Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction P N Truth p TP FN n FP TN

Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction True Positives Prediction: P Truth: P Truth P N p TP FN n FP TN

Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction True Negatives Prediction: N Truth: N Truth P N p TP FN n FP TN

Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction False Negatives Prediction: N Truth: P Truth P N p TP FN n FP TN

Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction False Positives Prediction: P Truth: N Truth P N p TP FN n FP TN

Ratios in the confusion matrix Accuracy Precision Recall Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix Accuracy Precision Recall Prediction Precision TP/(TP+FP) P N Truth p TP FN n FP TN

Ratios in the confusion matrix Accuracy Precision Recall Prediction Precision TP/(TP+FP) P N Truth p TP FN n FP TN

Ratios in the confusion matrix Accuracy Precision Recall Prediction Recall TP/(TP+FN) P N Truth p TP FN n FP TN

Ratios in the confusion matrix Accuracy Precision Recall Prediction Recall TP/(TP+FN) P N Truth p TP FN n FP TN

Back to the squares Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction P N Truth Predicted Truth p 1 1 n 1 2

Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction P N Truth Predicted Truth p 1 1 n 1 2

Rare heart disease Accuracy: 99/(99+1) = 99% Recall: 0/1 = 0% Precision: undefined no positive predictions Prediction P N Truth p 0 1 n 0 99

Regression: RMSE Root Mean Squared Error (RMSE) Mean distance between estimates and regression line 6 7 8 9 10 11 12 6 7 8 9 10 11 12 X1 X2

Clustering No label information Need distance metric between points

Clustering Performance measure consists of 2 elements Similarity within each cluster Similarity between clusters

5 0 5 10 5 0 5 10 X1 X2 Within cluster similarity Within sum of squares (WSS) Diameter Minimize

5 0 5 10 5 0 5 10 X1 X2 Between cluster similarity Between cluster sum of squares (BSS) Intercluster distance Maximize

Dunn s index 5 0 5 10 5 0 5 10 X1 X2 minimal intercluster distance maximal diameter

INTRODUCTION TO MACHINE LEARNING Let s practice!

INTRODUCTION TO MACHINE LEARNING Training set and test set

Machine learning - statistics Predictive power vs. descriptive power Supervised learning: model must predict unseen observations Classical statistics: model must fit data explain or describe data

Predictive model Training not on complete dataset training set Test set to evaluate performance of model Sets are disjoint: NO OVERLAP Model tested on unseen observations -> Generalization!

Split the dataset N instances (=observations): X K features: F Class labels: y f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn

Split the dataset N instances (=observations): X K features: F Class labels: y f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn

Split the dataset N instances (=observations): X K features: F Class labels: y f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn

Split the dataset N instances (=observations): X K features: F Class labels: y f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn

Split the dataset f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn

Split the dataset f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn Use to predict y: ŷ

Split the dataset f1 f2 fk y x1 x1,1 x1,2 x1,k y1 x2 x2,1 x2,2 x2,k y2 Training set xr xr,1 xr,2 xr,k yr xr+1 xr+1,1 xr+1,2 xr+1,k yr+1 xr+2 xr+2,1 xr+2,2 xr+2,k yr+2 Test set xn xn,1 xn,2 xn,k yn Use to predict y: ŷ real y compare them

When to use training/test set? Supervised learning Not for unsupervised (clustering) Data not labeled

Predictive power of model Train model Use model Test model Performance measure Predictive power Training set Test set

How to split the sets? Which observations go where? Training set larger test set Typically about 3/1 Quite arbitrary Generally: more data = better model Test set not too small

Distribution of the sets Classification classes must have similar distributions avoid a class not being available in a set Classification & regression shuffle dataset before splitting

Effect of sampling Sampling can affect performance measures Add robustness to these measures: cross-validation Idea: sample multiple times, with different separations

Cross-validation E.g.: 4-fold cross-validation Training set Test set Training set Test set Test set Training set Test set Training set

Cross-validation E.g.: 4-fold cross-validation Training set Test set Training set Test set Test set Training set Test set Training set

Cross-validation E.g.: 4-fold cross-validation Training set Test set Training set Test set Test set Training set Test set Training set

Cross-validation E.g.: 4-fold cross-validation Training set Test set Training set Test set Test set Training set Test set Training set aggregate results for robust measure

n-fold cross-validation Fold test set over dataset n times Each test set is 1/n size of total dataset

INTRODUCTION TO MACHINE LEARNING Let s practice!

INTRODUCTION TO MACHINE LEARNING Bias and Variance

What you ve learned? Accuracy and other performance measures Training and test set

Knitting it all together Effect of splitting dataset (train/test) on accuracy Over- and underfitting

Introducing BIAS VARIANCE

Bias and Variance Main goal of supervised learning: prediction Prediction error ~ reducible + irreducible error

Irreducible - reducible error Irreducible: noise don t minimize Reducible: error due to unfit model minimize Reducible error is split into bias and variance

Bias Error due to bias: wrong assumptions Difference predictions and truth using models trained by specific learning algorithm

Example Introduction to Machine Learning

Example Quadratic data

Example Quadratic data Assumption: data is linear use linear regression

Example Quadratic data Assumption: data is linear use linear regression Error due to bias is high: more restrictions on model

Bias Complexity of model More restrictions lead to high bias

Variance Error due to variance: error due to the sampling of the training set Model with high variance fits training set closely

Example Quadratic data Few restrictions: fit polynomial perfectly through training set If you change training set, model will change completely high variance : generalizes bad to test set

Bias-variance tradeoff BIAS VARIANCE low bias - high variance low variance - high bias

Overfitting Accuracy will depend on dataset split (train/test) High variance will heavily depend on split Overfitting = model fits training set a lot better than test set Too specific

Underfitting Restricting your model too much High bias Too general

Example - spam or not? Truth no A lot of capital letters? no spam capital letters Emails training set exclamation marks yes A lot of exclamation marks? no no spam yes exception with spam 50 capital letters 30 exclamation marks is no spam

Example - spam or not? Overfit no A lot of capital letters? no spam Emails training set capital letters exclamation marks yes A lot of exclamation marks? no no spam yes no 50 capital letters? spam exception with 50 capital letters 30 exclamation marks is no spam yes 30 exclamation marks? spam yes no spam no too specific!

Example - spam or not? Emails training set capital letters exclamation marks Underfit More than 10 capital letters? yes spam no no spam too general!

INTRODUCTION TO MACHINE LEARNING Let s practice!