Distribution-free Predictive Approaches

Similar documents
MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Linear Regression and K-Nearest Neighbors 3/28/18

Nearest Neighbor Classification. Machine Learning Fall 2017

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

CS 340 Lec. 4: K-Nearest Neighbors

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

k-nearest Neighbor (knn) Sept Youn-Hee Han

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Nearest Neighbor Methods

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Data Mining and Machine Learning: Techniques and Algorithms

Naïve Bayes for text classification

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Notes and Announcements

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

CSC411/2515 Tutorial: K-NN and Decision Tree

Machine Learning and Pervasive Computing

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

CISC 4631 Data Mining

Nonparametric Regression

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Nearest Neighbor Predictors

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

VECTOR SPACE CLASSIFICATION

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Nonparametric Methods Recap

CSC 411: Lecture 05: Nearest Neighbors

Machine Learning in Biology

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Supervised Learning Classification Algorithms Comparison

k-nearest Neighbors + Model Selection

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Lecture 3. Oct

Data Mining. Lecture 03: Nearest Neighbor Learning

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

PROBLEM 4

Data Preprocessing. Supervised Learning

Character Recognition from Google Street View Images

Non-Parametric Modeling

CPSC 340: Machine Learning and Data Mining

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: Nearest Neighbors

9 Classification: KNN and SVM

Machine Learning. Classification

Instance-based Learning

Announcements:$Rough$Plan$$

Last time... Bias-Variance decomposition. This week

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Adaptive Metric Nearest Neighbor Classification

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

On Bias, Variance, 0/1 - Loss, and the Curse of Dimensionality

Introduction to Machine Learning. Xiaojin Zhu

Classifier Inspired Scaling for Training Set Selection

Support vector machines

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Data Mining and Data Warehousing Classification-Lazy Learners

Features: representation, normalization, selection. Chapter e-9

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Classification/Regression Trees and Random Forests

Comparison of Linear Regression with K-Nearest Neighbors

Going nonparametric: Nearest neighbor methods for regression and classification

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Topic 4: Support Vector Machines

ECE 5424: Introduction to Machine Learning

The Curse of Dimensionality

Classification: Feature Vectors

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Classifiers and Detection. D.A. Forsyth

Lecture 25: Review I

Nearest Neighbor Classification

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CSE 573: Artificial Intelligence Autumn 2010

K- Nearest Neighbors(KNN) And Predictive Accuracy

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Chapter 12 Feature Selection

NOTE: Your grade will be based on the correctness, efficiency and clarity.

Nonparametric Classification Methods

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Wild Mushrooms Classification Edible or Poisonous

1) Give decision trees to represent the following Boolean functions:

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

High Dimensional Indexing by Clustering

Kernels and Clustering

Classification and K-Nearest Neighbors

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Transcription:

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for their intuitive appeal. In this section, we discuss in detail two such predictive approaches these are the nearest-neighbor methods and classification trees. 845

Nearest-neighbor Approaches Perhaps the simplest and most intuitive of all predictive approaches is k-nearest-neighbor classification. Depending on k, the strategy for predicting the class of an observation is to identify the k closest neighbors from among the training dataset and then to assign the class which has the most support among its neighbors. Ties may be broken at random or using some other approach. 846

Similarity of Nearest-neighbor Approaches with Regression Despite the simple intuition behind the k-nearest neighbor method, there is some similarity between this and regression. To see this, notice that the regression function f(x) = IE(Y X = x) minimizes the expected (squared) prediction error. Relaxing the conditioning at a point to include a region close to the point, and with a 0- loss function leads to the nearest-neighbor approach. 847

Example. (knn classification: k =,, 5).5.0.5 00 400 500 600 700.5.0.5 00 400 500 600 700.5.0.5 00 400 500 600 700 Should GPA and GMAT be on the same scale? Probably not! 848

Scaled scores: knn classification: k =,, 5.5.0.5 00 400 500 600 700.5.0.5 00 400 500 600 700.5.0.5 00 400 500 600 700 Answers are quite different and more reasonable! 849

Choosing k in Nearest-neighbor Approaches The choice of k or the number of neighbors to be included in the classification at a new point is important. A common choice is to take k = but this can give rise to very irregular and jagged regions with high variances in prediction. Larger choices of k lead to smoother regions and less variable classifications, but do not capture local details and can have larger biases. 850

Choosing k by Cross-Validation Since this is a predictive problem, a choice of k may be made using cross-validation. Cross-validation was used on the GMAT dataset to obtain k = as the most optimal choice in terms of minimizing predicted misclassification error. The cross-validated error was.5% The distance function used here was Euclidean. The -nearest neighbor classification predicted the test score (after scaling) to be in category. 85

CV errors in knn classification.5.0.5 00 400 500 600 700.5.0.5 00 400 500 600 700 The cross-validated error rates and the resulting -nn classification 85

Issues with k-nn classification On the face of it, k-nearest neighbors have only one parameter the number of neighbors to be included in deciding on the majority-vote predicted classification. However, the effective number of parameters to be fit is not really k but more like n/k. This is because the approach effectively divides the region spanned by the training set into approximately n/k parts and each part is assigned to the majority vote classifier. Also the curse of dimensionality (Bellman, 96) impacts performance. 85

The curse of dimensionality in knn classification As dimensionality increases, the data-points in the training set become closer to the boundary of the sample space than to any other observation. Consequently, prediction is much more difficult at the edges of the training sample, since some extrapolation in prediction may be needed. Furthermore, for a p-dimensional input problem, the sampling density is proportional to n p. Hence the sample size required to maintain the same sampling density as in lower dimensions grows exponentially with the number of dimensions. The k- nearest-neighbors approach is therefore not immune to the phenomenon of degraded performance in higher dimensions. 854