Performance Measures

Similar documents
Evaluating Classifiers

Evaluating Classifiers

Support Vector Machines

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

CS 229 Midterm Review

Support Vector Machines

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Non-Parametric Modeling

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Logical Rhythm - Class 3. August 27, 2018

Generative and discriminative classification techniques

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Adaptive Learning of an Accurate Skin-Color Model

Applying Supervised Learning

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

CIS192 Python Programming

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Contents. Preface to the Second Edition

Neural Networks (pp )

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

The Curse of Dimensionality

Introduction to Machine Learning

ECG782: Multidimensional Digital Signal Processing

Generative and discriminative classification techniques

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

Content-based image and video analysis. Machine learning

Statistics 202: Statistical Aspects of Data Mining

ECE 5424: Introduction to Machine Learning

Instance-based Learning

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Expectation Maximization (EM) and Gaussian Mixture Models

Machine Learning Lecture 3

Chap.12 Kernel methods [Book, Chap.7]

Machine Learning. Chao Lan

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Machine Learning Lecture 3

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Machine Learning in Biology

Perceptron as a graph

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Feature scaling in support vector data description

Tutorial on Machine Learning Tools

Naïve Bayes for text classification

Machine Learning Classifiers and Boosting

CS6220: DATA MINING TECHNIQUES

Machine Learning in Action

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Network Traffic Measurements and Analysis

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

The exam is closed book, closed notes except your one-page cheat sheet.

An Introduction to PDF Estimation and Clustering

Semi-supervised learning and active learning

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Global Probability of Boundary

Machine Learning. Supervised Learning. Manfred Huber

Image Classification using Support Vector Machine and Artificial Neural Network

6 Model selection and kernels

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

Machine Learning Lecture 3

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Lecture 25: Review I

CS570: Introduction to Data Mining

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Title: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data

A Taxonomy of Semi-Supervised Learning Algorithms

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

CS381V Experiment Presentation. Chun-Chen Kuo

STA 4273H: Statistical Machine Learning

Generative and discriminative classification techniques

Generative and discriminative classification

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

SUPPORT VECTOR MACHINES

Face Recognition using Eigenfaces SMAI Course Project

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Support Vector Machines

k-nearest Neighbor (knn) Sept Youn-Hee Han

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

CPSC Coding Project (due December 17)

Machine Learning Techniques for Data Mining

Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Image classification Computer Vision Spring 2018, Lecture 18

Notes and Announcements

Transcription:

1 Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same class and making sure that each class contains points of only one class. Precision: proportion of datapoints of class 1 correctly classified over all datapoints classified in class 1

1 Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same class and making sure that each class contains points of only one class. Proportion of datapoints correctly classified in Class 1 Precision: proportion of datapoints of class 1 correctly classified over all datapoints classified in class 1

2 Performance Measures The ROC curve plots the fraction of true positives and false positives over the total number of samples of class y=+1 in the dataset. Each point on the curve corresponds to a different value of the classifier s parameter (usually a threshold; e.g. a threshold on Bayes classification). 100% p(tp) Performance improves Performance drops 0% 100% p(fp)

3 Gaussian Mixture Models (GMM) + Bayes How does it work Fit a GMM model on each class Compare the pdf of each class p + (x) >p (x) 2 mixtures per class What does it learn a density model for each class Why is it good Small number of parameters for good generalization Learns importance for each dimension

4 Multi-Layer Perceptron (with Back-Propagation) How does it work n=1 n=2 Each neuron cuts a plane We combine n neurons together to get a non-linear classifier input x output y n=3 n=4 hidden layer output neuron What does it learn Cuts the space into hyperplanes that are combined together Why is it good Fixed size of the model (size of the hidden layer) Extremely fast at testing time

5 K-Nearest Neighbors How does it work Find the k nearest samples If more samples belong to the positive class, the output is positive k=5 kx y = sgn i=1 C(x i ) What does it learn How close a sample is to each class Why is it good Extremely simple to implement Training = copying the data

6 Gaussian Process Classification How does it work A smooth version of KNN Measure of how close you are to the known samples (training) k=5 What does it learn A conditional density of a latent function modeled on the training data The latent function is then used for classification Why is it good Very high precision Very good density model for each class

7 A Comparison Training Testing GP KNN GMM GMMBoost MLP RANSAC SVM RVM SVM MLP Boost RVM RANSAC Bagging GP KNN Bagging WARNING: most of these algorithms require a certain amount of tweaking of the hyperparameters to get optimal results

8 A Comparison Training Testing GP KNN GMM GMMBoost MLP RANSAC SVM RVM SVM MLP Boost RVM RANSAC Bagging GP KNN Bagging WARNING: most of these algorithms require a certain amount of tweaking of the hyperparameters to get optimal results

9 A Comparison Training Testing GP KNN GMM GMMBoost MLP RANSAC SVM RVM SVM MLP Boost RVM RANSAC Bagging GP KNN Bagging WARNING: most of these algorithms require a certain amount of tweaking of the hyperparameters to get optimal results

10 Other factors to be aware of Accuracy Given sufficient tweaking of hyperparameters, all or most of the algorithms will provide similarly good accuracies Hyper-parameters Methods such as KNN or GMM have relatively few hyper-parameters Methods like GP or Boosting have a larger number of parameters to choose Implementations Often enough, finding an implementation of the algorithm that provides access to all the customization is not trivial Many researchers end up using the vanilla version of the algorithms for lack of better implementations

11 What these classifiers have in common Some parameters To get the class label we then use sgn(), argmax(),... X f(x) = i d(x, i )+b i Some sort of distance Some sort of weight Some sort of bias Algorithm alpha d() theta b SVM / RVM Lagrange multipliers kernel function support vectors bias Gaussian Process scaling factor kernel function all datapoints - GMM prior gaussian function means, variances - Boosting weights weak learner params of weaks bias MLP weights activation function params of act. fct. bias