MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT

Size: px
Start display at page:

Download "MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT"

Transcription

1 MLCC 2018 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT

2 About this class 1. Introduce a basic class of learning methods, namely local methods. 2. Discuss the fundamental concept of bias-variance trade-off to understand parameter tuning (a.k.a. model selection) MLCC

3 Outline MLCC

4 The problem What is the price of one house given its area? MLCC

5 The problem What is the price of one house given its area? Start from data... MLCC

6 The problem What is the price of one house given its area? Start from data... 6 x 105 Area (m 2 ) Price (AC) x 1 = 62 y 1 = 99, 200 x 2 = 64 y 2 = 135, 700 x 3 = 65 y 3 = 93, 300 x 4 = 66 y 4 = 114, Let S the houses example dataset (n = 100) S = {(x 1, y 1 ),..., (x n, y n )} MLCC

7 The problem What is the price of one house given its area? Start from data... 6 x 105 Area (m 2 ) Price (AC) x 1 = 62 y 1 = 99, 200 x 2 = 64 y 2 = 135, 700 x 3 = 65 y 3 = 93, 300 x 4 = 66 y 4 = 114, Let S the houses example dataset (n = 100) S = {(x 1, y 1 ),..., (x n, y n )} Given a new point x we want to predict y by means of S. MLCC

8 Example Let x a 300m 2 house. MLCC

9 Example Let x a 300m 2 house. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC

10 Example Let x a 300m 2 house. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, What is its price? MLCC

11 Nearest Neighbors Nearest Neighbor: y is the same of the closest point to x in S. y = 311, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC

12 Nearest Neighbors Nearest Neighbor: y is the same of the closest point to x in S. y = 311, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC

13 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R MLCC

14 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, MLCC

15 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, y the predicted output y = ˆf(x ) where y = y j j = arg min i=1,...,n x x i MLCC

16 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, y the predicted output y = ˆf(x ) where y = y j j = arg min i=1,...,n x x i Computational cost O(nD): we compute n times the distance x x i that costs O(D) MLCC

17 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, y the predicted output y = ˆf(x ) where y = y j j = arg min i=1,...,n x x i Computational cost O(nD): we compute n times the distance x x i that costs O(D) In general let d : R D R D a distance on the input space, then f(x) = y j j = arg min i=1,...,n d(x, x i) MLCC

18 Extensions Nearest Neighbor takes y is the same of the closest point to x in S. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC

19 Extensions Nearest Neighbor takes y is the same of the closest point to x in S. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, Can we do better? (for example using more points) MLCC

20 K-Nearest Neighbors K-Nearest Neighbor: y is the mean of the values of the K closest point to x in S. If K = 3 we have y = 274, , , = 303, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC

21 K-Nearest Neighbors K-Nearest Neighbor: y is the mean of the values of the K closest point to x in S. If K = 3 we have y = 274, , , = 303, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC

22 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, MLCC

23 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, Let K be an integer K << n, MLCC

24 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, Let K be an integer K << n, j 1,..., j K defined as j 1 = arg min i {1,...,n} x x i and j t = arg min i {1,...,n}\{j1,...,j t 1} x x i for t {2,..., K}, MLCC

25 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, Let K be an integer K << n, j 1,..., j K defined as j 1 = arg min i {1,...,n} x x i and j t = arg min i {1,...,n}\{j1,...,j t 1} x x i for t {2,..., K}, predicted output y = 1 K i {j 1,...,j K } y i MLCC

26 K-Nearest Neighbors (cont.) K f(x) = 1 K i=1 y ji MLCC

27 K-Nearest Neighbors (cont.) K f(x) = 1 K i=1 y ji Computational cost O(nD + n log n): compute the n distances x x i for i = {1,..., n} (each costs O(D)). Order them O(n log n). MLCC

28 K-Nearest Neighbors (cont.) K f(x) = 1 K i=1 y ji Computational cost O(nD + n log n): compute the n distances x x i for i = {1,..., n} (each costs O(D)). Order them O(n log n). General Metric d f is the same, but j 1,..., j K are defined as j 1 = arg min i {1,...,n} d(x, x i ) and j t = arg min i {1,...,n}\{j1,...,j t 1} d(x, x i ) for t {2,..., K} MLCC

29 Parzen Windows K-NN puts equal weights on the values of the selected points. MLCC

30 Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? MLCC

31 Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x should influence more its value MLCC

32 Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x should influence more its value PARZEN WINDOWS: n i=1 ˆf(x) = y ik(x, x i ) n i=1 k(x, x i) where k is a similarity function k(x, x ) 0 for all x, x R D k(x, x ) 1 when x x 0 k(x, x ) 0 when x x MLCC

33 Parzen Windows Examples of k ( ) k 1 (x, x ) = sign 1 x x σ with a σ > 0 + ( ) k 2 (x, x ) = 1 x x σ with a σ > 0 + k 3 (x, x ) = (1 x x 2 with a σ > 0 σ 2 )+ k 4 (x, x ) = e x x 2 2σ 2 with a σ > 0 k 5 (x, x ) = e x x σ with a σ > 0 MLCC

34 K-NN example K-Nearest neighbor depends on K. When K = MLCC

35 K-NN example K-Nearest neighbor depends on K. When K = MLCC

36 K-NN example K-Nearest neighbor depends on K. When K = MLCC

37 K-NN example K-Nearest neighbor depends on K. When K = MLCC

38 K-NN example K-Nearest neighbor depends on K. When K = MLCC

39 K-NN example K-Nearest neighbor depends on K. When K = MLCC

40 K-NN example K-Nearest neighbor depends on K. When K = MLCC

41 K-NN example K-Nearest neighbor depends on K Changing K the result changes a lot! How to select K? MLCC

42 Outline MLCC

43 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) MLCC

44 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) The expected loss E K is E K = E S E x,y (y ˆf S,K (x)) 2 MLCC

45 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) The expected loss E K is E K = E S E x,y (y ˆf S,K (x)) 2 Optimal hyperparameter K should minimize E K MLCC

46 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) The expected loss E K is E K = E S E x,y (y ˆf S,K (x)) 2 Optimal hyperparameter K should minimize E K K = arg min K K E K MLCC

47 Optimal choice for the Hyper-parameters (cont.) Optimal hyperparameter K should minimize E K MLCC

48 Optimal choice for the Hyper-parameters (cont.) Optimal hyperparameter K should minimize E K K = arg min K K E K MLCC

49 Optimal choice for the Hyper-parameters (cont.) Optimal hyperparameter K should minimize E K K = arg min K K E K Ideally! (In practice we don t have access to the distribution) We can still try to understand the above minimization problem: does a solution exists? What does it depend on? Yet, ultimately, we need something we can compute! MLCC

50 Example: regression problem Define the pointwise expected loss E K (x) = E S E y x (y ˆf S,K (x)) 2 MLCC

51 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). E K (x) = E S E y x (y ˆf S,K (x)) 2 MLCC

52 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ MLCC

53 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 MLCC

54 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 Now E K (x) = E S E y x (y ˆf S,K (x)) 2 MLCC

55 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 Now E K (x) = E S E y x (y ˆf S,K (x)) 2 = E S E y x (f (x) + δ ˆf S,K (x)) 2 MLCC

56 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 Now E K (x) = E S E y x (y ˆf S,K (x)) 2 = E S E y x (f (x) + δ ˆf S,K (x)) 2 that is E K (x) = E S (f (x) ˆf S,K (x)) 2 + σ 2... MLCC

57 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) f S,K (x) = 1 f (x l ) K l K x MLCC

58 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) f S,K (x) = 1 f (x l ) K l K x Note that f S,K (x) = E y x ˆfS,K (x). MLCC

59 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) Note that f S,K (x) = E y x ˆfS,K (x). Consider f S,K (x) = 1 f (x l ) K l K x E K (x) = (f (x) E X fs,k (x)) 2 + E S ( }{{} f S,K (x) ˆf S,K (x)) 2 + σ 2 }{{} bias variance... MLCC

60 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) Note that f S,K (x) = E y x ˆfS,K (x). Consider f S,K (x) = 1 f (x l ) K l K x E K (x) = (f (x) E X fs,k (x)) }{{} K 2 E X E yl x l (y l f (x l )) 2 + σ 2 l K x bias }{{} variance... MLCC

61 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) f S,K (x) = 1 f (x l ) K l K x Note that f S,K (x) = E y x ˆfS,K (x). Consider E K (x) = (f (x) E X fs,k (x)) 2 + σ2 }{{} K + σ2 }{{} bias variance... MLCC

62 Bias Variance trade-off Errors Variance Bias K MLCC

63 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: MLCC

64 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and MLCC

65 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and it depends on the noise and the unknown target function. MLCC

66 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and it depends on the noise and the unknown target function. How to choose K in practice? MLCC

67 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and it depends on the noise and the unknown target function. How to choose K in practice? Idea: train on some data and validate the parameter on new unseen data as a proxy for the ideal case. MLCC

68 Hold-out Cross-validation For each K MLCC

69 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) MLCC

70 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 MLCC

71 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 MLCC

72 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 3. Select ˆK that minimize ÊK. MLCC

73 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 3. Select ˆK that minimize ÊK. The above procedure can be repeated to augment stability and K selected to minimize error over trials. MLCC

74 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 3. Select ˆK that minimize ÊK. The above procedure can be repeated to augment stability and K selected to minimize error over trials. There are other related parameter selection methods (k-fold cross validation, leave-one out...). MLCC

75 Training and Validation Error behavior 0.25 Training vs. validation error T = 50% S; n = 200 Validation error Training error Error K MLCC

76 Training and Validation Error behavior 0.25 Training vs. validation error T = 50% S; n = 200 Validation error Training error Error ˆK = K MLCC

77 Wrapping up In this class we made our first encounter with learning algorithms (local methods) and the problem of tuning their parameters (via bias-variance trade-off and cross-validation) to avoid overfitting and achieve generalization. MLCC

78 Next Class High Dimensions: Beyond local methods! MLCC

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 8, 28 Locality sensitive hashing for approximate NN

More information

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information

Classification and K-Nearest Neighbors

Classification and K-Nearest Neighbors Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

PROBLEM 4

PROBLEM 4 PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

Lecture 8: Grid Search and Model Validation Continued

Lecture 8: Grid Search and Model Validation Continued Lecture 8: Grid Search and Model Validation Continued Mat Kallada STAT2450 - Introduction to Data Mining with R Outline for Today Model Validation Grid Search Some Preliminary Notes Thank you for submitting

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1 CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #19: Machine Learning 1 Supervised Learning Would like to do predicbon: esbmate a func3on f(x) so that y = f(x) Where y can be: Real number:

More information

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 K-Nearest Neighbors Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine:

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Transductive Learning: Motivation, Model, Algorithms

Transductive Learning: Motivation, Model, Algorithms Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Model Complexity and Generalization

Model Complexity and Generalization HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Classification Key Concepts

Classification Key Concepts http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech 1 How will

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Clustering & Classification (chapter 15)

Clustering & Classification (chapter 15) Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

CSC411/2515 Tutorial: K-NN and Decision Tree

CSC411/2515 Tutorial: K-NN and Decision Tree CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

Data Preprocessing. Supervised Learning

Data Preprocessing. Supervised Learning Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Topic 1 Classification Alternatives

Topic 1 Classification Alternatives Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent

More information

K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Supervised Learning CS 586 Machine Learning Prepared by Jugal Kalita With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Topics What is classification? Hypothesis classes and learning

More information

I211: Information infrastructure II

I211: Information infrastructure II Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

LS-SVM Functional Network for Time Series Prediction

LS-SVM Functional Network for Time Series Prediction LS-SVM Functional Network for Time Series Prediction Tuomas Kärnä 1, Fabrice Rossi 2 and Amaury Lendasse 1 Helsinki University of Technology - Neural Networks Research Center P.O. Box 5400, FI-02015 -

More information

Announcements:$Rough$Plan$$

Announcements:$Rough$Plan$$ UVACS6316 Fall2015Graduate: MachineLearning Lecture16:K@nearest@neighbor Classifier/Bias@VarianceTradeoff 10/27/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Announcements:RoughPlan

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Nearest Neighbor 1-nearest neighbor algorithm: Remember all your data points When prediction needed for a new point Find the nearest saved data point Return the answer associated

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.

More information

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

Classification Key Concepts

Classification Key Concepts http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

CS489/698 Lecture 2: January 8 th, 2018

CS489/698 Lecture 2: January 8 th, 2018 CS489/698 Lecture 2: January 8 th, 2018 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D] Chapt. 3, [B] Sec. 2.5.2, [M] Sec. 1.4.2 CS489/698 (c) 2018 P. Poupart 1 Inductive Learning (recap) Induction

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Lecture 03 Linear classification methods II

Lecture 03 Linear classification methods II Lecture 03 Linear classification methods II 25 January 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/35 Office hours Taylor Arnold Mondays, 13:00-14:15, HH 24, Office 203 (by appointment) Yu Lu

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

A Probabilistic Approach to WLAN User Location Estimation

A Probabilistic Approach to WLAN User Location Estimation The Third IEEE Workshop on Wireless Local Area Networks September 27-28, 2001 A Probabilistic Approach to WLAN User Location Estimation Petri Myllymäki Teemu Roos Henry Tirri Pauli Misikangas Juha Sievänen

More information

Machine Learning (CSE 446): Practical Issues

Machine Learning (CSE 446): Practical Issues Machine Learning (CSE 446): Practical Issues Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 18, 2017 1 / 39 scary words 2 / 39 Outline of CSE 446 We ve already covered stuff

More information

Nearest Neighbor Methods

Nearest Neighbor Methods Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning: k-nearest Neighbors Lecture 08 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Nonparametric Methods: k-nearest Neighbors Input: A training dataset

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques

More information

k-nearest Neighbor (knn) Sept Youn-Hee Han

k-nearest Neighbor (knn) Sept Youn-Hee Han k-nearest Neighbor (knn) Sept. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²Eager Learners Eager vs. Lazy Learning when given a set of training data, it will construct a generalization model before receiving

More information

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week

More information

Comparison of Linear Regression with K-Nearest Neighbors

Comparison of Linear Regression with K-Nearest Neighbors Comparison of Linear Regression with K-Nearest Neighbors Rebecca C. Steorts, Duke University STA 325, Chapter 3.5 ISL Agenda Intro to KNN Comparison of KNN and Linear Regression K-Nearest Neighbors vs

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques. . Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction

More information

Lecture 20: Bagging, Random Forests, Boosting

Lecture 20: Bagging, Random Forests, Boosting Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information