MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT
|
|
- Augusta Stafford
- 6 years ago
- Views:
Transcription
1 MLCC 2018 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT
2 About this class 1. Introduce a basic class of learning methods, namely local methods. 2. Discuss the fundamental concept of bias-variance trade-off to understand parameter tuning (a.k.a. model selection) MLCC
3 Outline MLCC
4 The problem What is the price of one house given its area? MLCC
5 The problem What is the price of one house given its area? Start from data... MLCC
6 The problem What is the price of one house given its area? Start from data... 6 x 105 Area (m 2 ) Price (AC) x 1 = 62 y 1 = 99, 200 x 2 = 64 y 2 = 135, 700 x 3 = 65 y 3 = 93, 300 x 4 = 66 y 4 = 114, Let S the houses example dataset (n = 100) S = {(x 1, y 1 ),..., (x n, y n )} MLCC
7 The problem What is the price of one house given its area? Start from data... 6 x 105 Area (m 2 ) Price (AC) x 1 = 62 y 1 = 99, 200 x 2 = 64 y 2 = 135, 700 x 3 = 65 y 3 = 93, 300 x 4 = 66 y 4 = 114, Let S the houses example dataset (n = 100) S = {(x 1, y 1 ),..., (x n, y n )} Given a new point x we want to predict y by means of S. MLCC
8 Example Let x a 300m 2 house. MLCC
9 Example Let x a 300m 2 house. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC
10 Example Let x a 300m 2 house. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, What is its price? MLCC
11 Nearest Neighbors Nearest Neighbor: y is the same of the closest point to x in S. y = 311, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC
12 Nearest Neighbors Nearest Neighbor: y is the same of the closest point to x in S. y = 311, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC
13 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R MLCC
14 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, MLCC
15 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, y the predicted output y = ˆf(x ) where y = y j j = arg min i=1,...,n x x i MLCC
16 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, y the predicted output y = ˆf(x ) where y = y j j = arg min i=1,...,n x x i Computational cost O(nD): we compute n times the distance x x i that costs O(D) MLCC
17 Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, y the predicted output y = ˆf(x ) where y = y j j = arg min i=1,...,n x x i Computational cost O(nD): we compute n times the distance x x i that costs O(D) In general let d : R D R D a distance on the input space, then f(x) = y j j = arg min i=1,...,n d(x, x i) MLCC
18 Extensions Nearest Neighbor takes y is the same of the closest point to x in S. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC
19 Extensions Nearest Neighbor takes y is the same of the closest point to x in S. 6 x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, Can we do better? (for example using more points) MLCC
20 K-Nearest Neighbors K-Nearest Neighbor: y is the mean of the values of the K closest point to x in S. If K = 3 we have y = 274, , , = 303, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC
21 K-Nearest Neighbors K-Nearest Neighbor: y is the mean of the values of the K closest point to x in S. If K = 3 we have y = 274, , , = 303, x 105 Area (m 2 ) Price (AC).. x 93 = 255 y 93 = 274, 600 x 94 = 264 y 94 = 324, 900 x 95 = 310 y 95 = 311, 200 x 96 = 480 y 96 = 515, MLCC
22 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, MLCC
23 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, Let K be an integer K << n, MLCC
24 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, Let K be an integer K << n, j 1,..., j K defined as j 1 = arg min i {1,...,n} x x i and j t = arg min i {1,...,n}\{j1,...,j t 1} x x i for t {2,..., K}, MLCC
25 K-Nearest Neighbors S = {(x i, y i )} n i=1 with x i R D, y i R x the new point x R D, Let K be an integer K << n, j 1,..., j K defined as j 1 = arg min i {1,...,n} x x i and j t = arg min i {1,...,n}\{j1,...,j t 1} x x i for t {2,..., K}, predicted output y = 1 K i {j 1,...,j K } y i MLCC
26 K-Nearest Neighbors (cont.) K f(x) = 1 K i=1 y ji MLCC
27 K-Nearest Neighbors (cont.) K f(x) = 1 K i=1 y ji Computational cost O(nD + n log n): compute the n distances x x i for i = {1,..., n} (each costs O(D)). Order them O(n log n). MLCC
28 K-Nearest Neighbors (cont.) K f(x) = 1 K i=1 y ji Computational cost O(nD + n log n): compute the n distances x x i for i = {1,..., n} (each costs O(D)). Order them O(n log n). General Metric d f is the same, but j 1,..., j K are defined as j 1 = arg min i {1,...,n} d(x, x i ) and j t = arg min i {1,...,n}\{j1,...,j t 1} d(x, x i ) for t {2,..., K} MLCC
29 Parzen Windows K-NN puts equal weights on the values of the selected points. MLCC
30 Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? MLCC
31 Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x should influence more its value MLCC
32 Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x should influence more its value PARZEN WINDOWS: n i=1 ˆf(x) = y ik(x, x i ) n i=1 k(x, x i) where k is a similarity function k(x, x ) 0 for all x, x R D k(x, x ) 1 when x x 0 k(x, x ) 0 when x x MLCC
33 Parzen Windows Examples of k ( ) k 1 (x, x ) = sign 1 x x σ with a σ > 0 + ( ) k 2 (x, x ) = 1 x x σ with a σ > 0 + k 3 (x, x ) = (1 x x 2 with a σ > 0 σ 2 )+ k 4 (x, x ) = e x x 2 2σ 2 with a σ > 0 k 5 (x, x ) = e x x σ with a σ > 0 MLCC
34 K-NN example K-Nearest neighbor depends on K. When K = MLCC
35 K-NN example K-Nearest neighbor depends on K. When K = MLCC
36 K-NN example K-Nearest neighbor depends on K. When K = MLCC
37 K-NN example K-Nearest neighbor depends on K. When K = MLCC
38 K-NN example K-Nearest neighbor depends on K. When K = MLCC
39 K-NN example K-Nearest neighbor depends on K. When K = MLCC
40 K-NN example K-Nearest neighbor depends on K. When K = MLCC
41 K-NN example K-Nearest neighbor depends on K Changing K the result changes a lot! How to select K? MLCC
42 Outline MLCC
43 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) MLCC
44 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) The expected loss E K is E K = E S E x,y (y ˆf S,K (x)) 2 MLCC
45 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) The expected loss E K is E K = E S E x,y (y ˆf S,K (x)) 2 Optimal hyperparameter K should minimize E K MLCC
46 Optimal choice for the Hyper-parameters S = (x i, y i ) n i=1 training set. Name Y = (y 1,..., y n ) and X = (x 1,..., x n ). K N hyperparameter of the learning algorithm ˆfS,K learned function (depends on S and K) The expected loss E K is E K = E S E x,y (y ˆf S,K (x)) 2 Optimal hyperparameter K should minimize E K K = arg min K K E K MLCC
47 Optimal choice for the Hyper-parameters (cont.) Optimal hyperparameter K should minimize E K MLCC
48 Optimal choice for the Hyper-parameters (cont.) Optimal hyperparameter K should minimize E K K = arg min K K E K MLCC
49 Optimal choice for the Hyper-parameters (cont.) Optimal hyperparameter K should minimize E K K = arg min K K E K Ideally! (In practice we don t have access to the distribution) We can still try to understand the above minimization problem: does a solution exists? What does it depend on? Yet, ultimately, we need something we can compute! MLCC
50 Example: regression problem Define the pointwise expected loss E K (x) = E S E y x (y ˆf S,K (x)) 2 MLCC
51 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). E K (x) = E S E y x (y ˆf S,K (x)) 2 MLCC
52 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ MLCC
53 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 MLCC
54 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 Now E K (x) = E S E y x (y ˆf S,K (x)) 2 MLCC
55 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 Now E K (x) = E S E y x (y ˆf S,K (x)) 2 = E S E y x (f (x) + δ ˆf S,K (x)) 2 MLCC
56 Example: regression problem Define the pointwise expected loss By definition E K = E x E K (x). Regression setting: E K (x) = E S E y x (y ˆf S,K (x)) 2 Regression model y = f (x) + δ Eδ = 0, Eδ 2 = σ 2 Now E K (x) = E S E y x (y ˆf S,K (x)) 2 = E S E y x (f (x) + δ ˆf S,K (x)) 2 that is E K (x) = E S (f (x) ˆf S,K (x)) 2 + σ 2... MLCC
57 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) f S,K (x) = 1 f (x l ) K l K x MLCC
58 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) f S,K (x) = 1 f (x l ) K l K x Note that f S,K (x) = E y x ˆfS,K (x). MLCC
59 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) Note that f S,K (x) = E y x ˆfS,K (x). Consider f S,K (x) = 1 f (x l ) K l K x E K (x) = (f (x) E X fs,k (x)) 2 + E S ( }{{} f S,K (x) ˆf S,K (x)) 2 + σ 2 }{{} bias variance... MLCC
60 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) Note that f S,K (x) = E y x ˆfS,K (x). Consider f S,K (x) = 1 f (x l ) K l K x E K (x) = (f (x) E X fs,k (x)) }{{} K 2 E X E yl x l (y l f (x l )) 2 + σ 2 l K x bias }{{} variance... MLCC
61 Bias Variance trade-off for K-NN Define the noisyless K-NN (it is ideal!) f S,K (x) = 1 f (x l ) K l K x Note that f S,K (x) = E y x ˆfS,K (x). Consider E K (x) = (f (x) E X fs,k (x)) 2 + σ2 }{{} K + σ2 }{{} bias variance... MLCC
62 Bias Variance trade-off Errors Variance Bias K MLCC
63 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: MLCC
64 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and MLCC
65 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and it depends on the noise and the unknown target function. MLCC
66 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and it depends on the noise and the unknown target function. How to choose K in practice? MLCC
67 How to choose the hyper-parameters Bias-Variance trade-off is theoretical, but shows that: an optimal parameter exists and it depends on the noise and the unknown target function. How to choose K in practice? Idea: train on some data and validate the parameter on new unseen data as a proxy for the ideal case. MLCC
68 Hold-out Cross-validation For each K MLCC
69 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) MLCC
70 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 MLCC
71 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 MLCC
72 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 3. Select ˆK that minimize ÊK. MLCC
73 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 3. Select ˆK that minimize ÊK. The above procedure can be repeated to augment stability and K selected to minimize error over trials. MLCC
74 Hold-out Cross-validation For each K 1. shuffle and split S in T (training) and V (validation) 2. train the algorithm on T and compute the empirical loss on V Ê K = 1 V x,y V (y ˆf T,K (x)) 2 3. Select ˆK that minimize ÊK. The above procedure can be repeated to augment stability and K selected to minimize error over trials. There are other related parameter selection methods (k-fold cross validation, leave-one out...). MLCC
75 Training and Validation Error behavior 0.25 Training vs. validation error T = 50% S; n = 200 Validation error Training error Error K MLCC
76 Training and Validation Error behavior 0.25 Training vs. validation error T = 50% S; n = 200 Validation error Training error Error ˆK = K MLCC
77 Wrapping up In this class we made our first encounter with learning algorithms (local methods) and the problem of tuning their parameters (via bias-variance trade-off and cross-validation) to avoid overfitting and achieve generalization. MLCC
78 Next Class High Dimensions: Beyond local methods! MLCC
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationLinear Regression and K-Nearest Neighbors 3/28/18
Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationUVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationMathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul
Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 8, 28 Locality sensitive hashing for approximate NN
More informationCOMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization
COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before
More informationClassification and K-Nearest Neighbors
Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationPROBLEM 4
PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationLecture 8: Grid Search and Model Validation Continued
Lecture 8: Grid Search and Model Validation Continued Mat Kallada STAT2450 - Introduction to Data Mining with R Outline for Today Model Validation Grid Search Some Preliminary Notes Thank you for submitting
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #19: Machine Learning 1 Supervised Learning Would like to do predicbon: esbmate a func3on f(x) so that y = f(x) Where y can be: Real number:
More informationK-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
K-Nearest Neighbors Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine:
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationTransductive Learning: Motivation, Model, Algorithms
Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationSupervised Learning: K-Nearest Neighbors and Decision Trees
Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationModel Complexity and Generalization
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationClassification Key Concepts
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech 1 How will
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationData Preprocessing. Supervised Learning
Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationSupervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.
Supervised Learning CS 586 Machine Learning Prepared by Jugal Kalita With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Topics What is classification? Hypothesis classes and learning
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationLS-SVM Functional Network for Time Series Prediction
LS-SVM Functional Network for Time Series Prediction Tuomas Kärnä 1, Fabrice Rossi 2 and Amaury Lendasse 1 Helsinki University of Technology - Neural Networks Research Center P.O. Box 5400, FI-02015 -
More informationAnnouncements:$Rough$Plan$$
UVACS6316 Fall2015Graduate: MachineLearning Lecture16:K@nearest@neighbor Classifier/Bias@VarianceTradeoff 10/27/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Announcements:RoughPlan
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationInstance-based Learning
Instance-based Learning Nearest Neighbor 1-nearest neighbor algorithm: Remember all your data points When prediction needed for a new point Find the nearest saved data point Return the answer associated
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationOverfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff
Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationClassification Key Concepts
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationCS489/698 Lecture 2: January 8 th, 2018
CS489/698 Lecture 2: January 8 th, 2018 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D] Chapt. 3, [B] Sec. 2.5.2, [M] Sec. 1.4.2 CS489/698 (c) 2018 P. Poupart 1 Inductive Learning (recap) Induction
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationLecture 03 Linear classification methods II
Lecture 03 Linear classification methods II 25 January 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/35 Office hours Taylor Arnold Mondays, 13:00-14:15, HH 24, Office 203 (by appointment) Yu Lu
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationA Probabilistic Approach to WLAN User Location Estimation
The Third IEEE Workshop on Wireless Local Area Networks September 27-28, 2001 A Probabilistic Approach to WLAN User Location Estimation Petri Myllymäki Teemu Roos Henry Tirri Pauli Misikangas Juha Sievänen
More informationMachine Learning (CSE 446): Practical Issues
Machine Learning (CSE 446): Practical Issues Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 18, 2017 1 / 39 scary words 2 / 39 Outline of CSE 446 We ve already covered stuff
More informationNearest Neighbor Methods
Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a
More informationLOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.
LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression
More informationMachine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Machine Learning: k-nearest Neighbors Lecture 08 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Nonparametric Methods: k-nearest Neighbors Input: A training dataset
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques
More informationk-nearest Neighbor (knn) Sept Youn-Hee Han
k-nearest Neighbor (knn) Sept. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²Eager Learners Eager vs. Lazy Learning when given a set of training data, it will construct a generalization model before receiving
More informationK-Nearest Neighbour (Continued) Dr. Xiaowei Huang
K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week
More informationComparison of Linear Regression with K-Nearest Neighbors
Comparison of Linear Regression with K-Nearest Neighbors Rebecca C. Steorts, Duke University STA 325, Chapter 3.5 ISL Agenda Intro to KNN Comparison of KNN and Linear Regression K-Nearest Neighbors vs
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.
. Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction
More informationLecture 20: Bagging, Random Forests, Boosting
Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More information