Nearest neighbors classifiers

Size: px
Start display at page:

Download "Nearest neighbors classifiers"

Transcription

1 Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, / 25

2 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random sample, most look well done and are using the question assignment mechanism on Gradescope correctly. Example assignment submission. I received a few late submission s. This one time only, I will turn the Gradescope submission page on again after class until 8pm EST. We will go over the trickiest parts of the homework at the beginning of the talk this Wednesday. Therefore I have pushed the first office hours to Wednesday. Here are the dates of the two exams for this course: Exam 1: Wednesday October 18th, 2017 Exam 2: Monday December 11th, / 25

3 Example: OCR for digits 1. Classify images of handwritten digits by the actual digits they represent. 2. Classification problem: Y = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (a discrete set). 3 / 25

4 Nearest neighbor (NN) classifier Given: labeled examples D := {(x i, y i)} n i=1 Predictor: ˆfD : X Y On input x, 1. Find the point x i among {x i} n i=1 that is closest to x (the nearest neighbor). 2. Return y i. 4 / 25

5 How to measure distance? A default choice for distance between points in R d is the Euclidean distance (also called l 2 distance): u v 2 := d (u i v i) 2 i=1 (where u = (u 1, u 2,..., u d ) and v = (v 1, v 2,..., v d )). Grayscale pixel images. Treat as vectors (of 784 real-valued features) that live in R / 25

6 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. 6 / 25

7 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. 6 / 25

8 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. Given: labeled examples D := {(x i, y i)} n i=1 X Y. 6 / 25

9 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. Given: labeled examples D := {(x i, y i)} n i=1 X Y. Construct NN classifier ˆf D using D. 6 / 25

10 Example: OCR for digits with NN classifier Classify images of handwritten digits by the digits they depict. X = R 784, Y = {0, 1,..., 9}. Given: labeled examples D := {(x i, y i)} n i=1 X Y. Construct NN classifier ˆf D using D. Question: Is this classifier any good? 6 / 25

11 Error rate Error rate of classifier f on a set of labeled examples D: err D(f) := # of (x, y) D such that f(x) y D (i.e., the fraction of D on which f disagrees with paired label). 7 / 25

12 Error rate Error rate of classifier f on a set of labeled examples D: err D(f) := # of (x, y) D such that f(x) y D (i.e., the fraction of D on which f disagrees with paired label). Sometimes, we ll write this as err(f, D). 7 / 25

13 Error rate Error rate of classifier f on a set of labeled examples D: err D(f) := # of (x, y) D such that f(x) y D (i.e., the fraction of D on which f disagrees with paired label). Sometimes, we ll write this as err(f, D). Question: What is err D( ˆf D)? 7 / 25

14 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. 8 / 25

15 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. 8 / 25

16 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. 8 / 25

17 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. Use test data T to evaluate accuracy of ˆf S. 8 / 25

18 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. Use test data T to evaluate accuracy of ˆf S. Test error rate of ˆfS: err T ( ˆf S) = 3.09%. 8 / 25

19 A better way to evaluate the classifier Split the labeled examples {(x i, y i)} n i=1 into two sets (randomly). Training data S. Test data T. Only use training data S to construct NN classifier ˆf S. Training error rate of ˆfS: err S( ˆf S) = 0%. Use test data T to evaluate accuracy of ˆf S. Test error rate of ˆfS: err T ( ˆf S) = 3.09%. Is this good? 8 / 25

20 Diagnostics Some mistakes made by the NN classifier (test point in T, nearest neighbor in S): First mistake (correct label is 2 ) could ve been avoided by looking at the three nearest neighbors (whose labels are 8, 2, and 2 ). test point three nearest neighbors 9 / 25

21 k-nearest neighbors classifier Given: labeled examples D := {(x i, y i)} n i=1 Predictor: ˆfD,k : X Y: On input x, 1. Find the k points x i1, x i2,..., x ik among {x i} n i=1 closest to x (the k nearest neighbors). 2. Return the plurality of y i1, y i2,..., y ik. (Break ties in both steps arbitrarily.) 10 / 25

22 Effect of k Smaller k: smaller training error rate. Larger k: higher training error rate, but predictions are more stable due to voting. OCR digits classification k Test error rate / 25

23 Choosing k The hold-out set approach 1. Pick a subset V S (hold-out set, a.k.a. validation set). 2. For each k {1, 3, 5,... }: Construct k-nn classifier ˆf S\V,k using S \ V. Compute error rate of ˆf S\V,k on V ( hold-out error rate ). 3. Pick the k that gives the smallest hold-out error rate. 12 / 25

24 Other distance functions Lp norm dist(u, v) = x p 1 + x p x p d 1 p 13 / 25

25 Other distance functions Lp norm dist(u, v) = x p 1 + x p x p d 1 p OCR digits classification Distance l 2 l 3 Test error rate 3.09% 2.83% 13 / 25

26 Other distance functions Lp norm Manhattan distance dist(u, v) = x p 1 + x p x p d 1 p OCR digits classification Distance l 2 l 3 Test error rate 3.09% 2.83% dist(u, v) = distance on grid between u and v on strict horizontal/vertical path 13 / 25

27 Other distance functions Lp norm Manhattan distance dist(u, v) = x p 1 + x p x p d 1 p OCR digits classification Distance l 2 l 3 Test error rate 3.09% 2.83% dist(u, v) = distance on grid between u and v on strict horizontal/vertical path String edit distance dist(u, v) = # insertions/deletions/mutations needed to change u to v 13 / 25

28 Bad features Caution: nearest neighbor classifier can be broken by bad/noisy features! Feature 2 y = 0 y = 1 Feature 1 Feature 1 y = 0 y = 1 14 / 25

29 Questions of interest 1. How good is the classifier learned using NN on your problem? 2. Is NN a good learning method in general? 15 / 25

30 Statistical learning theory Basic assumption (main idea): labeled examples {(x i, y i} n i=1 come from same source as future examples. 16 / 25

31 Statistical learning theory Basic assumption (main idea): labeled examples {(x i, y i} n i=1 come from same source as future examples. new (unlabeled) example from P past labeled examples from P learning algorithm learned predictor predicted label More formally: {(x i, y i)} n i=1 is an i.i.d. sample from a probability distribution P over X Y. 16 / 25

32 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). 17 / 25

33 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). Let ˆf S be classifier trained using labeled examples S. 17 / 25

34 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). Let ˆf S be classifier trained using labeled examples S. True error rate of ˆf S is err P ( ˆf S) := P ( ˆf S(X) Y ). 17 / 25

35 Prediction error rate Define the (true) error rate of a classifier f : X Y w.r.t. P to be err P (f) := P (f(x) Y ) where (X, Y ) is a pair of random variables with joint distribution P (i.e., (X, Y ) P ). Let ˆf S be classifier trained using labeled examples S. True error rate of ˆf S is err P ( ˆf S) := P ( ˆf S(X) Y ). We cannot compute this without knowing P. 17 / 25

36 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. 18 / 25

37 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. ˆfS and T are independent, and the test error rate of ˆf S is an unbiased estimate of the true error rate of ˆf S. 18 / 25

38 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. ˆfS and T are independent, and the test error rate of ˆf S is an unbiased estimate of the true error rate of ˆf S. If T = m, then the test error rate err T ( ˆf S) of ˆf S (conditional on S) is a binomial random variable (scaled by 1/m): m err T ( ˆf S) S Bin(m, err P ( ˆf S)). 18 / 25

39 Estimating the true error rate Suppose {(x i, y i) n i=1 (assumed to be an i.i.d. sample from P ) is randomly split into S and T, and ˆf S is based only on S. ˆfS and T are independent, and the test error rate of ˆf S is an unbiased estimate of the true error rate of ˆf S. If T = m, then the test error rate err T ( ˆf S) of ˆf S (conditional on S) is a binomial random variable (scaled by 1/m): m err T ( ˆf S) S Bin(m, err P ( ˆf S)). The expected value of err T ( ˆf S) is err P ( ˆf S). (This means that err T ( ˆf S) is an unbiased estimator of err P ( ˆf S).) 18 / 25

40 Limits of prediction Binary classification: Y = {0, 1}. 19 / 25

41 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. 19 / 25

42 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. Think of P as being comprised of two parts. 1. Marginal distribution µ of X (a distribution over X ). 2. Conditional distribution of Y given X = x, for each x X : η(x) := P (Y = 1 X = x). 19 / 25

43 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. Think of P as being comprised of two parts. 1. Marginal distribution µ of X (a distribution over X ). 2. Conditional distribution of Y given X = x, for each x X : η(x) := P (Y = 1 X = x). If η(x) is 0 or 1 for all x X where µ(x) > 0, then optimal error rate is zero (i.e., min f err P (f) = 0). 19 / 25

44 Limits of prediction Binary classification: Y = {0, 1}. Probability distribution P over X {0, 1}; let (X, Y ) P. Think of P as being comprised of two parts. 1. Marginal distribution µ of X (a distribution over X ). 2. Conditional distribution of Y given X = x, for each x X : η(x) := P (Y = 1 X = x). If η(x) is 0 or 1 for all x X where µ(x) > 0, then optimal error rate is zero (i.e., min f err P (f) = 0). Otherwise it is non-zero. 19 / 25

45 Bayes optimality What is the classifier with smallest true error rate? { f 0 if η(x) 1/2; (x) := 1 if η(x) > 1/2. (Do you see why?) 20 / 25

46 Bayes optimality What is the classifier with smallest true error rate? { f 0 if η(x) 1/2; (x) := 1 if η(x) > 1/2. (Do you see why?) f is called the Bayes (optimal) classifier, and err P (f ) = min err P (f) = E [min { η(x), 1 η(x) }] f which is called the Bayes (optimal) error rate. 20 / 25

47 Bayes optimality What is the classifier with smallest true error rate? { f 0 if η(x) 1/2; (x) := 1 if η(x) > 1/2. (Do you see why?) f is called the Bayes (optimal) classifier, and err P (f ) = min err P (f) = E [min { η(x), 1 η(x) }] f which is called the Bayes (optimal) error rate. Question: How far from optimal is the classifier produced by the NN learning method? 20 / 25

48 Consistency of k-nn We say that a learning algorithm A is consistent if [ lim E err P ( ˆf ] n n) = err(f ), where ˆf n is the classifier learned using A on an i.i.d. sample of size n. Theorem (e.g., Cover and Hart 1967) Assume η is continuous. Then: 1-NN is consistent if min f err P (f) = 0. k-nn is consistent, provided that k := k n is chosen as an increasing but sublinear function of n: k n lim kn =, lim n n n = / 25

49 Key takeaways 1. k-nn learning procedure; role of k, distance functions, features. 2. Training and test error rates. 3. Framework of statistical learning theory; estimating the true error rate; Bayes optimality; high-level idea of consistency. 22 / 25

50 Recap: definitions denote our classifier as f 23 / 25

51 Recap: definitions denote our classifier as f what does f(x) mean? 23 / 25

52 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D 23 / 25

53 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? 23 / 25

54 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? # of (x, y) D such that f(x) y D 23 / 25

55 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? # of (x, y) D such that f(x) y D 23 / 25

56 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? what is err P (f)? # of (x, y) D such that f(x) y D 23 / 25

57 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? # of (x, y) D such that f(x) y D what is err P (f)? E P [I[f(X) Y ]] 23 / 25

58 Recap: definitions denote our classifier as f what does f(x) mean? denote the dataset as D what is err D(f)? what does P (X, Y ) mean? # of (x, y) D such that f(x) y D what is err P (f)? E P [I[f(X) Y ]] = P (f(x) Y ) 23 / 25

59 Recap: unbiased estimator what is an estimator? 24 / 25

60 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? 24 / 25

61 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) 24 / 25

62 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? 24 / 25

63 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) 24 / 25

64 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) what on earth does that mean? 24 / 25

65 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) what on earth does that mean? if we repeatedly draw datasets of size n, D n P (X, Y ) and calculate err D(f), the average will converge to err P (f) 24 / 25

66 Recap: unbiased estimator what is an estimator? using definitions from previous slide, what is err D(f) an estimator of? err P (f) what is an unbiased estimator? E[err D(f)] = err P (f) what on earth does that mean? if we repeatedly draw datasets of size n, D n P (X, Y ) and calculate err D(f), the average will converge to err P (f) why should we care? 24 / 25

67 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples 25 / 25

68 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? 25 / 25

69 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency 25 / 25

70 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? 25 / 25

71 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? the Bayes optimal classifier 25 / 25

72 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? the Bayes optimal classifier how can we define f? 25 / 25

73 Recap: consistent algorithm let ˆf n mean a classifier trained using n examples [ what is the property lim n E err P ( ˆf ] n) = err(f )? consistency what is the ideal classifier f also known as? the Bayes optimal classifier how can we define f? err P (f ) = min f err P (f) = E [min { η(x), 1 η(x) }] 25 / 25

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued. The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

Stat 342 Exam 3 Fall 2014

Stat 342 Exam 3 Fall 2014 Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Aug 22, 2018 August 22, 2018 1 / 63 Outline 1 About this course 2 Overview of machine learning 3 Nearest Neighbor Classifier

More information

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal

More information

Homework Assignment #3

Homework Assignment #3 CS 540-2: Introduction to Artificial Intelligence Homework Assignment #3 Assigned: Monday, February 20 Due: Saturday, March 4 Hand-In Instructions This assignment includes written problems and programming

More information

Machine Learning (CSE 446): Unsupervised Learning

Machine Learning (CSE 446): Unsupervised Learning Machine Learning (CSE 446): Unsupervised Learning Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 19 Announcements HW2 posted. Due Feb 1. It is long. Start this week! Today:

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Notes and Announcements

Notes and Announcements Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

Classification and K-Nearest Neighbors

Classification and K-Nearest Neighbors Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm

CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm CS 170 Algorithms Fall 2014 David Wagner HW12 Due Dec. 5, 6:00pm Instructions. This homework is due Friday, December 5, at 6:00pm electronically via glookup. This homework assignment is a programming assignment

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Nearest Neighbor Classifiers

Nearest Neighbor Classifiers Nearest Neighbor Classifiers CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Nearest Neighbor Classifier Let X be the space

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week

More information

Stat 602X Exam 2 Spring 2011

Stat 602X Exam 2 Spring 2011 Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018 CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1 Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure

More information

7. Boosting and Bagging Bagging

7. Boosting and Bagging Bagging Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang Decision Tree (Continued) and K-Nearest Neighbour Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Recap basic knowledge Decision tree learning How to split Identify the best feature to

More information

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric

More information

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1 Homework 2 Due: March 2, 2018 at 7:00PM Written Questions Problem 1: Estimator (5 points) Let x 1, x 2,..., x m be an i.i.d. (independent and identically distributed) sample drawn from distribution B(p)

More information

6.034 Design Assignment 2

6.034 Design Assignment 2 6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

Linear Classification and Perceptron

Linear Classification and Perceptron Linear Classification and Perceptron INFO-4604, Applied Machine Learning University of Colorado Boulder September 7, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Homework 3 Solutions

Homework 3 Solutions CS3510 Design & Analysis of Algorithms Section A Homework 3 Solutions Released: 7pm, Wednesday Nov 8, 2017 This homework has a total of 4 problems on 4 pages. Solutions should be submitted to GradeScope

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Instance-Based Learning.

Instance-Based Learning. Instance-Based Learning www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts k-nn classification k-nn regression edited nearest neighbor k-d trees for nearest

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Machine Learning. Classification

Machine Learning. Classification 10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to

More information

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w CS446: Machine Learning Fall 2013 Problem Set 4 Handed Out: October 17, 2013 Due: October 31 th, 2013 Feel free to talk to other members of the class in doing the homework. I am more concerned that you

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Topic 7 Machine learning

Topic 7 Machine learning CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key

More information

A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram. Bioinformatics (Lec 17) A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

More information

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training

More information

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Machine Learning and Pervasive Computing

Machine Learning and Pervasive Computing Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Application of Spectral Clustering Algorithm

Application of Spectral Clustering Algorithm 1/27 Application of Spectral Clustering Algorithm Danielle Middlebrooks dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons

More information

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies

More information

What to come. There will be a few more topics we will cover on supervised learning

What to come. There will be a few more topics we will cover on supervised learning Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Machine Learning (CSE 446): Decision Trees

Machine Learning (CSE 446): Decision Trees Machine Learning (CSE 446): Decision Trees Sham M Kakade c 28 University of Washington cse446-staff@cs.washington.edu / 8 Announcements First assignment posted. Due Thurs, Jan 8th. Remember the late policy

More information

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL:

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL: Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 9 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes February 8 (1) HW4 is due

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

Nearest Neighbor Classification. Machine Learning Fall 2017

Nearest Neighbor Classification. Machine Learning Fall 2017 Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

Nonparametric Classification Methods

Nonparametric Classification Methods Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Wild Mushrooms Classification Edible or Poisonous

Wild Mushrooms Classification Edible or Poisonous Wild Mushrooms Classification Edible or Poisonous Yulin Shen ECE 539 Project Report Professor: Hu Yu Hen 2013 Fall ( I allow code to be released in the public domain) pg. 1 Contents Introduction. 3 Practicality

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

1 Document Classification [60 points]

1 Document Classification [60 points] CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,

More information