Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Similar documents
Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Lecture 3. Oct

Instance Based Learning. k-nearest Neighbor. Locally weighted regression. Radial basis functions. Case-based reasoning. Lazy and eager learning

Instance and case-based reasoning

Support Vector Machines

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Machine Learning and Pervasive Computing

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

CS570: Introduction to Data Mining

Content-based image and video analysis. Machine learning

9 Classification: KNN and SVM

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

ECG782: Multidimensional Digital Signal Processing

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

What is machine learning?

Naïve Bayes for text classification

Nearest Neighbor Classification. Machine Learning Fall 2017

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

VECTOR SPACE CLASSIFICATION

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Non-Parametric Modeling

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Performance Measures

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Supervised Learning: Nearest Neighbors

k-nearest Neighbors + Model Selection

Neural Networks and Deep Learning

Distribution-free Predictive Approaches

Machine Learning. Supervised Learning. Manfred Huber

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

CS6716 Pattern Recognition

6.034 Quiz 2, Spring 2005

Machine Learning in Biology

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data Mining and Machine Learning: Techniques and Algorithms

Instance-based Learning

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Chapter 4: Non-Parametric Techniques

Notes and Announcements

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

AN IMPROVEMENT TO K-NEAREST NEIGHBOR CLASSIFIER

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

k-nearest Neighbor (knn) Sept Youn-Hee Han

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

ADVANCED CLASSIFICATION TECHNIQUES

Support Vector Machines

Supervised Sementation: Pixel Classification

CS 343: Artificial Intelligence

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Lecture #11: The Perceptron

Parametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego

Kernels and Clustering

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Linear Regression and K-Nearest Neighbors 3/28/18

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

CS 584 Data Mining. Classification 1

Based on Raymond J. Mooney s slides

Machine Learning Lecture 3

Machine Learning Lecture 3

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

5 Learning hypothesis classes (16 points)

Nonparametric Methods

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Support Vector Machines

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

CISC 4631 Data Mining

Topics in Machine Learning

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CSC 411: Lecture 05: Nearest Neighbors

Nonparametric Classification. Prof. Richard Zanibbi

Bumptrees for Efficient Function, Constraint, and Classification Learning

MODULE 7 Nearest Neighbour Classifier and its Variants LESSON 12

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

SVMs and Data Dependent Distance Metric

Adaptive Metric Nearest Neighbor Classification

The role of Fisher information in primary data space for neighbourhood mapping

Local Linear Approximation for Kernel Methods: The Railway Kernel

CMPT 882 Week 3 Summary

Adaptive N earest Neighbor Classification using Support Vector Machines

1-Nearest Neighbor Boundary

Perceptron as a graph

Instance-based Learning

Pattern Classification Algorithms for Face Recognition

Pattern Recognition ( , RIT) Exercise 1 Solution

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Nearest Neighbor Classification

Transcription:

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas R. Moeller Hamburg University of Technology

Instance-Based Learning So far we saw statistical learning as parameter learning, i.e., given a specific parameter-dependent family of probability models fit it to the data by tweaking parameters Often simple and effective Fixed complexity Maybe good for very little data

Instance-Based Learning So far we saw statistical learning as parameter learning Nonparametric learning methods allow hypothesis complexity to grow with the data The more data we have, the wigglier the hypothesis can be

Characteristics An instance-based learner is a lazylearner and does all the work when the test example is presented. This is opposed to so-called eager-learners, which build a parameterised compact model of the target. It produces local approximation to the target function (different with each test instance)

Nearest Neighbor Classifier Basic idea Store all labelled instances (i.e., the training set) and compare new unlabeled instances (i.e., the test set) to the stored ones to assign them an appropriate label. Comparison is performed, for instance, by means of the Euclidean distance, and the labels of the k nearest neighbors of a new instance determine the assigned label Other distance measures: Mahalanobis distance (for multidimensional space),... Parameter: k (the number of nearest neighbors)

Nearest Neighbor Classifier 1-Nearest neighbor: Given a query instance x q, first locate the nearest training example x n then f(x q ):= f(x n ) K-Nearest neighbor: Given a query instance x q, First locate the k nearest training examples If discrete values target function then take vote among its k nearest nbrs else if real valued target fct then take the mean of the f values of the k nearest nbrs f ( x q ) :! k = i = 1 f ( x ) k i

Distance Between Examples We need a measure of distance in order to know who are the neighbours Assume that we have T attributes for the learning problem. Then one example point x has elements x t R, t=1, T. The distance between two points xi xj is often defined as the Euclidean distance: d( x, x i j ) = T! t= 1 [ x ti " x tj 2 ]

Difficulties For higher dimensionality, neighborhoods must be large in the average case curse of dimensionality There may be irrelevant attributes amongst the attributes Have to calculate the distance of the test case from all training cases

knn vs 1NN: Voronoi Diagram

When to Consider knn Algorithms?! Instances map to points in Not more then say 20 attributes per instance Lots of training data Advantages: Training is very fast Can learn complex target functions Don t lose information Disadvantages:? (will see them shortly ) n

one two three four five six seven eight?

Training data Number Lines Line types Rectangles Colours Mondrian? 1 6 1 10 4 No 2 4 2 8 5 No 3 5 2 7 4 Yes 4 5 1 8 4 Yes 5 5 1 10 5 No 6 6 1 8 6 Yes 7 7 1 14 5 No Test instance Number Lines Line types Rectangles Colours Mondrian? 8 7 2 9 4

Keep Data in Normalized Form One way to normalize the data a r (x) to a r(x) is x t '# x t "! t x t xt " mean of t th attribute " t # standard deviationof t th attribute average distance of the data values from their mean

Mean and standard deviation Source: Wikipedia

Normalized Training Data Number Lines Line types Rectangles Colours Mondrian? 1 0.632-0.632 0.327-1.021 No 2-1.581 1.581-0.588 0.408 No 3-0.474 1.581-1.046-1.021 Yes 4-0.474-0.632-0.588-1.021 Yes 5-0.474-0.632 0.327 0.408 No 6 0.632-0.632-0.588 1.837 Yes 7 1.739-0.632 2.157 0.408 No Test instance Number Lines Line types Rectangles Colours Mondrian? 8 1.739 1.581-0.131-1.021

Distances of Test Instance From Training Data Example Distance Mondrian? of test from example 1 2.517 No 2 3.644 No 3 2.395 Yes 4 3.164 Yes 5 3.472 No 6 3.808 Yes 7 3.490 No Classification 1-NN 3-NN 5-NN 7-NN Yes Yes No No

What if the target function is real valued? The k-nearest neighbor algorithm would just calculate the mean of the k nearest neighbours

Variant of knn: Distance-Weighted knn We might want to weight nearer neighbors more heavily Then it makes sense to use all training examples instead of just k 2 1 1 ), ( 1 where ) ( ) : ( i q i k i i k i i i q d w w f w f x x x x = =!! = =

Parzen classifier Basic idea Estimates the densities of the distributions of instances for each class by summing the distance-weighed contributions of each instance in a class within a predefined window function. Parameter h: size of the window over which the interpolation takes place. For h=0 one obtains a Dirac delta function, for increasing values of h more smoothing is achieved.

n Gaussian Parzen windows, with smoothing factor h applied to Gaussian distributed points

Parzen classification Parzen classifiers estimate the densities for each category and classify a test instance by the label corresponding to the maximum posterior (MAP) The decision region for a Parzenwindow classifier depends upon the choice of window function

Neural Networks Feed-forward networks Single-layer networks (Perceptrons) Perceptron learning rule Easy to train Fast convergence, few data required Cannot learn complex functions Multi-Layer networks Backpropagation learning Hard to train Slow convergence, many data required

XOR problem

XOR problem

Solution: Use multiple layers XOR problem

Backpropagation classifier Basic idea Mapping the instances via a non-linear transformation into a space where they can be separated by a hyperplane that minimizes the classification error Parameters: learning rate and number of non-linear basis functions,

Radial Basis Function Classifier Basic idea: Representing the instances by a number of prototypes. Each prototype is activated by weighting its nearest instances. The prototype activations may be taken as input for a perceptron or other classifier. Parameters: Number of prototypes (basis functions)

Example of a basis function

Support Vector Machine Classifier Basic idea Mapping the instances from the two classes into a space where they become linearly separable. The mapping is achieved using a kernel function that operates on the instances near to the margin of separation. Parameter: kernel type

Nonlinear Separation y = -1 y = +1

2 2 ( x, x, 2x x 1 2 1 2 )

Support Vectors margin separator support vectors

Literature Mitchell (1989). Machine Learning. http://www.cs.cmu.edu/~tom/mlbook.html Duda, Hart, & Stork (2000). Pattern Classification. http://rii.ricoh.com/~stork/dhs.html Hastie, Tibshirani, & Friedman (2001). The Elements of Statistical Learning. http://www-stat.stanford.edu/~tibs/elemstatlearn/

Literature (cont.) Shawe-Taylor & Cristianini. Kernel Methods for Pattern Analysis. http://www.kernel-methods.net/ Russell & Norvig (2004). Artificial Intelligence. http://aima.cs.berkeley.edu/