Nearest neighbors classifiers
|
|
- Malcolm Little
- 6 years ago
- Views:
Transcription
1 Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, / 25
2 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random sample, most look well done and are using the question assignment mechanism on Gradescope correctly. Example assignment submission. I received a few late submission s. This one time only, I will turn the Gradescope submission page on again after class until 8pm EST. We will go over the trickiest parts of the homework at the beginning of the talk this Wednesday. Therefore I have pushed the first office hours to Wednesday. Here are the dates of the two exams for this course: Exam 1: Wednesday October 18th, 2017 Exam 2: Monday December 11th, / 25
3 Example: OCR for digits 1. Classify images of handwritten digits by the actual digits they represent. 2. Classification problem: Y = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (a discrete set). 3 / 25
4 Nearest neighbor (NN) classifier Given: labeled examples D := {(x i, y i)} n i=1 Predictor: ˆfD : X Y On input x, 1. Find the point x i among {x i} n i=1 that is closest to x (the nearest neighbor). 2. Return y i. 4 / 25
5 How to measure distance? A default choice for distance between points in R d is the Euclidean distance (also called l 2 distance): u v 2 := d (u i v i) 2 i=1 (where u = (u 1, u 2,..., u d ) and v = (v 1, v 2,..., v d )). Grayscale pixel images. Treat as vectors (of 784 real-valued features) that live in R / 25
6 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. 6 / 25
7 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. 6 / 25
8 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. Given: labeled examples D := {(x i, y i)} n i=1 X Y. 6 / 25
9 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. Given: labeled examples D := {(x i, y i)} n i=1 X Y. Construct NN classifier ˆf D using D. 6 / 25
10 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. Given: labeled examples D := {(x i, y i)} n i=1 X Y. Construct NN classifier ˆf D using D. Question: Is this classifier any good? 6 / 25
11 Error rate Error rate of classifier f on a set of labeled examples D: err D(f) := # of (x, y) D such that f(x) y D (i.e., the fraction of D on which f disagrees with paired label). 7 / 25
12 Error rate Error rate of classifier f on a set of labeled examples D: err D(f) := # of (x, y) D such that f(x) y D (i.e., the fraction of D on which f disagrees with paired label). Sometimes, we ll write this as err(f, D). 7 / 25
13 Error rate Error rate of classifier f on a set of labeled examples D: err D(f) := # of (x, y) D such that f(x) y D (i.e., the fraction of D on which f disagrees with paired label). Sometimes, we ll write this as err(f, D). Question: What is err D( ˆf D)? 7 / 25
14 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. 8 / 25
15 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. 8 / 25
16 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. 8 / 25
17 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. Use test data T to evaluate accuracy of ˆf S. 8 / 25
18 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. Use test data T to evaluate accuracy of ˆf S. Test error rate of ˆfS: err T ( ˆf S) = 3.09%. 8 / 25
19 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. Use test data T to evaluate accuracy of ˆf S. Test error rate of ˆfS: err T ( ˆf S) = 3.09%. Is this good? 8 / 25
20 Diagnostics Some mistakes made by the NN classifier (test point in T, nearest neighbor in S): First mistake (correct label is 2 ) could ve been avoided by looking at the three nearest neighbors (whose labels are 8, 2, and 2 ). test point three nearest neighbors 9 / 25
21 k-nearest neighbors classifier Given: labeled examples D := {(x i, y i)} n i=1 Predictor: ˆfD,k : X Y: On input x, 1. Find the k points x i1, x i2,..., x ik among {x i} n i=1 closest to x (the k nearest neighbors). 2. Return the plurality of y i1, y i2,..., y ik. (Break ties in both steps arbitrarily.) 10 / 25
22 Effect of k Smaller k: smaller training error rate. Larger k: higher training error rate, but predictions are more stable due to voting. OCR digits classification k Test error rate / 25
23 Choosing k The hold-out set approach 1. Pick a subset V S (hold-out set, a.k.a. validation set). 2. For each k {1, 3, 5,... }: Construct k-nn classifier ˆf S\V,k using S \ V. Compute error rate of ˆf S\V,k on V ( hold-out error rate ). 3. Pick the k that gives the smallest hold-out error rate. 12 / 25
24 Other distance functions Lp norm dist(u, v) = x p 1 + x p x p d 1 p 13 / 25
25 Other distance functions Lp norm dist(u, v) = x p 1 + x p x p d 1 p OCR digits classification Distance l 2 l 3 Test error rate 3.09% 2.83% 13 / 25
26 Other distance functions Lp norm Manhattan distance dist(u, v) = x p 1 + x p x p d 1 p OCR digits classification Distance l 2 l 3 Test error rate 3.09% 2.83% dist(u, v) = distance on grid between u and v on strict horizontal/vertical path 13 / 25
27 Other distance functions Lp norm Manhattan distance dist(u, v) = x p 1 + x p x p d 1 p OCR digits classification Distance l 2 l 3 Test error rate 3.09% 2.83% dist(u, v) = distance on grid between u and v on strict horizontal/vertical path String edit distance dist(u, v) = # insertions/deletions/mutations needed to change u to v 13 / 25
28 Bad features Caution: nearest neighbor classifier can be broken by bad/noisy features! Feature 2 y = 0 y = 1 Feature 1 Feature 1 y = 0 y = 1 14 / 25
29 Questions of interest 1. How good is the classifier learned using NN on your problem? 2. Is NN a good learning method in general? 15 / 25
30 Statistical learning theory Basic assumption (main idea): labeled examples {(x i, y i} n i=1 come from same source as future examples. 16 / 25
31 Statistical learning theory Basic assumption (main idea): labeled examples {(x i, y i} n i=1 come from same source as future examples. new (unlabeled) example from P past labeled examples from P learning algorithm learned predictor predicted label More formally: {(x i, y i)} n i=1 is an i.i.d. sample from a probability distribution P over X Y. 16 / 25
32 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). 17 / 25
33 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). Let ˆf S be classifier trained using labeled examples S. 17 / 25
34 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). Let ˆf S be classifier trained using labeled examples S. True error rate of ˆf S is err P ( ˆf S) := P ( ˆf S(X) Y ). 17 / 25
35 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). Let ˆf S be classifier trained using labeled examples S. True error rate of ˆf S is err P ( ˆf S) := P ( ˆf S(X) Y ). We cannot compute this without knowing P. 17 / 25
36 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. 18 / 25
37 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. ˆfS and T are independent, and the test error rate of ˆf S is an unbiased estimate of the true error rate of ˆf S. 18 / 25
38 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. ˆfS and T are independent, and the test error rate of ˆf S is an unbiased estimate of the true error rate of ˆf S. If T = m, then the test error rate err T ( ˆf S) of ˆf S (conditional on S) is a binomial random variable (scaled by 1/m): m err T ( ˆf S) S Bin(m, err P ( ˆf S)). 18 / 25
39 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. ˆfS and T are independent, and the test error rate of ˆf S is an unbiased estimate of the true error rate of ˆf S. If T = m, then the test error rate err T ( ˆf S) of ˆf S (conditional on S) is a binomial random variable (scaled by 1/m): m err T ( ˆf S) S Bin(m, err P ( ˆf S)). The expected value of err T ( ˆf S) is err P ( ˆf S). (This means that err T ( ˆf S) is an unbiased estimator of err P ( ˆf S).) 18 / 25
40 Limits of prediction Binary classification: Y = {0, 1}. 19 / 25
41 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. 19 / 25
42 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. Think of P as being comprised of two parts. 1. Marginal distribution µ of X (a distribution over X ). 2. Conditional distribution of Y given X = x, for each x X : η(x) := P (Y = 1 X = x). 19 / 25
43 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. Think of P as being comprised of two parts. 1. Marginal distribution µ of X (a distribution over X ). 2. Conditional distribution of Y given X = x, for each x X : η(x) := P (Y = 1 X = x). If η(x) is 0 or 1 for all x X where µ(x) > 0, then optimal error rate is zero (i.e., min f err P (f) = 0). 19 / 25
44 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. Think of P as being comprised of two parts. 1. Marginal distribution µ of X (a distribution over X ). 2. Conditional distribution of Y given X = x, for each x X : η(x) := P (Y = 1 X = x). If η(x) is 0 or 1 for all x X where µ(x) > 0, then optimal error rate is zero (i.e., min f err P (f) = 0). Otherwise it is non-zero. 19 / 25
45 Bayes optimality What is the classifier with smallest true error rate? { f 0 if η(x) 1/2; (x) := 1 if η(x) > 1/2. (Do you see why?) 20 / 25
46 Bayes optimality What is the classifier with smallest true error rate? { f 0 if η(x) 1/2; (x) := 1 if η(x) > 1/2. (Do you see why?) f is called the Bayes (optimal) classifier, and err P (f ) = min err P (f) = E [min { η(x), 1 η(x) }] f which is called the Bayes (optimal) error rate. 20 / 25
47 Bayes optimality What is the classifier with smallest true error rate? { f 0 if η(x) 1/2; (x) := 1 if η(x) > 1/2. (Do you see why?) f is called the Bayes (optimal) classifier, and err P (f ) = min err P (f) = E [min { η(x), 1 η(x) }] f which is called the Bayes (optimal) error rate. Question: How far from optimal is the classifier produced by the NN learning method? 20 / 25
48 Consistency of k-nn We say that a learning algorithm A is consistent if [ lim E err P ( ˆf ] n n) = err(f ), where ˆf n is the classifier learned using A on an i.i.d. sample of size n. Theorem (e.g., Cover and Hart 1967) Assume η is continuous. Then: 1-NN is consistent if min f err P (f) = 0. k-nn is consistent, provided that k := k n is chosen as an increasing but sublinear function of n: k n lim kn =, lim n n n = / 25
49 Key takeaways 1. k-nn learning procedure; role of k, distance functions, features. 2. Training and test error rates. 3. Framework of statistical learning theory; estimating the true error rate; Bayes optimality; high-level idea of consistency. 22 / 25
50 Recap: definitions denote our classifier as f 23 / 25
51 Recap: definitions denote our classifier as f what does f(x) mean? 23 / 25
52 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D 23 / 25
53 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? 23 / 25
54 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? # of (x, y) D such that f(x) y D 23 / 25
55 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? # of (x, y) D such that f(x) y D 23 / 25
56 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? what is err P (f)? # of (x, y) D such that f(x) y D 23 / 25
57 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? # of (x, y) D such that f(x) y D what is err P (f)? E P [I[f(X) Y ]] 23 / 25
58 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? # of (x, y) D such that f(x) y D what is err P (f)? E P [I[f(X) Y ]] = P (f(x) Y ) 23 / 25
59 Recap: unbiased estimator what is an estimator? 24 / 25
60 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? 24 / 25
61 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) 24 / 25
62 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? 24 / 25
63 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) 24 / 25
64 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) what on earth does that mean? 24 / 25
65 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) what on earth does that mean? if we repeatedly draw datasets of size n, D n P (X, Y ) and calculate err D(f), the average will converge to err P (f) 24 / 25
66 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) what on earth does that mean? if we repeatedly draw datasets of size n, D n P (X, Y ) and calculate err D(f), the average will converge to err P (f) why should we care? 24 / 25
67 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples 25 / 25
68 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? 25 / 25
69 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency 25 / 25
70 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? 25 / 25
71 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? the Bayes optimal classifier 25 / 25
72 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? the Bayes optimal classifier how can we define f? 25 / 25
73 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? the Bayes optimal classifier how can we define f? err P (f ) = min f err P (f) = E [min { η(x), 1 η(x) }] 25 / 25
We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.
The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationMathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul
Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationStat 342 Exam 3 Fall 2014
Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can
More informationLinear Regression and K-Nearest Neighbors 3/28/18
Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Aug 22, 2018 August 22, 2018 1 / 63 Outline 1 About this course 2 Overview of machine learning 3 Nearest Neighbor Classifier
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationHomework Assignment #3
CS 540-2: Introduction to Artificial Intelligence Homework Assignment #3 Assigned: Monday, February 20 Due: Saturday, March 4 Hand-In Instructions This assignment includes written problems and programming
More informationMachine Learning (CSE 446): Unsupervised Learning
Machine Learning (CSE 446): Unsupervised Learning Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 19 Announcements HW2 posted. Due Feb 1. It is long. Start this week! Today:
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationNotes and Announcements
Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationClassification and K-Nearest Neighbors
Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationCS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm
CS 170 Algorithms Fall 2014 David Wagner HW12 Due Dec. 5, 6:00pm Instructions. This homework is due Friday, December 5, at 6:00pm electronically via glookup. This homework assignment is a programming assignment
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationNearest Neighbor Classifiers
Nearest Neighbor Classifiers CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Nearest Neighbor Classifier Let X be the space
More informationNearest Neighbor Classification
Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training
More informationChapter 4: Non-Parametric Techniques
Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationComputer Vision. Exercise Session 10 Image Categorization
Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationK-Nearest Neighbour (Continued) Dr. Xiaowei Huang
K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week
More informationStat 602X Exam 2 Spring 2011
Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More informationCSC 411 Lecture 4: Ensembles I
CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:
More informationProblems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1
Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationDecision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang
Decision Tree (Continued) and K-Nearest Neighbour Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Recap basic knowledge Decision tree learning How to split Identify the best feature to
More informationMS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods
MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric
More informationHomework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1
Homework 2 Due: March 2, 2018 at 7:00PM Written Questions Problem 1: Estimator (5 points) Let x 1, x 2,..., x m be an i.i.d. (independent and identically distributed) sample drawn from distribution B(p)
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationLinear Classification and Perceptron
Linear Classification and Perceptron INFO-4604, Applied Machine Learning University of Colorado Boulder September 7, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More informationEmpirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationHomework 3 Solutions
CS3510 Design & Analysis of Algorithms Section A Homework 3 Solutions Released: 7pm, Wednesday Nov 8, 2017 This homework has a total of 4 problems on 4 pages. Solutions should be submitted to GradeScope
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More informationInstance-Based Learning.
Instance-Based Learning www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts k-nn classification k-nn regression edited nearest neighbor k-d trees for nearest
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationMachine Learning. Classification
10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to
More informationCS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w
CS446: Machine Learning Fall 2013 Problem Set 4 Handed Out: October 17, 2013 Due: October 31 th, 2013 Feel free to talk to other members of the class in doing the homework. I am more concerned that you
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationTopic 7 Machine learning
CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationSupervised Learning: K-Nearest Neighbors and Decision Trees
Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationApplication of Spectral Clustering Algorithm
1/27 Application of Spectral Clustering Algorithm Danielle Middlebrooks dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationUVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons
More informationCS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University
CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationMachine Learning (CSE 446): Decision Trees
Machine Learning (CSE 446): Decision Trees Sham M Kakade c 28 University of Washington cse446-staff@cs.washington.edu / 8 Announcements First assignment posted. Due Thurs, Jan 8th. Remember the late policy
More informationNotes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL:
Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 9 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes February 8 (1) HW4 is due
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More informationNearest Neighbor Classification. Machine Learning Fall 2017
Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationWild Mushrooms Classification Edible or Poisonous
Wild Mushrooms Classification Edible or Poisonous Yulin Shen ECE 539 Project Report Professor: Hu Yu Hen 2013 Fall ( I allow code to be released in the public domain) pg. 1 Contents Introduction. 3 Practicality
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More information1 Document Classification [60 points]
CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More information