10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3 Murphy - - - Bishop 2.5.2 Prob. Readings: (next lecture) Lecture notes from 10-600 (See Piazza post for the pointers) Murphy 2 Bishop 2 HTF - - Mitchell - - Matt Gormley Lecture 3 January 25, 2016 1
Reminders Background Exercises (Homework 1) Released: Wed, Jan. 25 Due: Mon, Jan. 30 at 5:30pm Website updates Office hours Google calendar on People Readings on Schedule Meet AIs: Sarah, Daniel, Brynn 2
Outline k- Nearest Neighbors (KNN) Special cases Choosing k Case Study: KNN on Fisher Iris Data Case Study: KNN on 2D Gaussian Data Experimental Design Train error vs. test error Train / validation / test splits Cross- validation Function Approximation View of ML 3
K- NEAREST NEIGHBORS 4
Whiteboard: Special cases Choosing k k- Nearest Neighbors 5
KNN ON FISHER IRIS DATA 6
Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Length Sepal Width Petal Length 0 4.3 3.0 1.1 0.1 0 4.9 3.6 1.4 0.1 0 5.3 3.7 1.5 0.2 1 4.9 2.4 3.3 1.0 1 5.7 2.8 4.1 1.3 1 6.3 3.3 4.7 1.6 1 6.7 3.0 5.0 1.7 Petal Width Full dataset: https://en.wikipedia.org/wiki/iris_flower_data_set 7
KNN on Fisher Iris Data 8
KNN on Fisher Iris Data Special Case: Nearest Neighbor 9
KNN on Fisher Iris Data Special Case: Majority Vote 10
KNN on Fisher Iris Data 11
KNN on Fisher Iris Data Special Case: Nearest Neighbor 12
KNN on Fisher Iris Data 13
KNN on Fisher Iris Data 14
KNN on Fisher Iris Data 15
KNN on Fisher Iris Data 16
KNN on Fisher Iris Data 17
KNN on Fisher Iris Data 18
KNN on Fisher Iris Data 19
KNN on Fisher Iris Data 20
KNN on Fisher Iris Data 21
KNN on Fisher Iris Data 22
KNN on Fisher Iris Data 23
KNN on Fisher Iris Data 24
KNN on Fisher Iris Data 25
KNN on Fisher Iris Data 26
KNN on Fisher Iris Data 27
KNN on Fisher Iris Data 28
KNN on Fisher Iris Data 29
KNN on Fisher Iris Data 30
KNN on Fisher Iris Data 31
KNN on Fisher Iris Data Special Case: Majority Vote 32
KNN ON GAUSSIAN DATA 33
KNN on Gaussian Data 34
KNN on Gaussian Data 35
KNN on Gaussian Data 36
KNN on Gaussian Data 37
KNN on Gaussian Data 38
KNN on Gaussian Data 39
KNN on Gaussian Data 40
KNN on Gaussian Data 41
KNN on Gaussian Data 42
KNN on Gaussian Data 43
KNN on Gaussian Data 44
KNN on Gaussian Data 45
KNN on Gaussian Data 46
KNN on Gaussian Data 47
KNN on Gaussian Data 48
KNN on Gaussian Data 49
KNN on Gaussian Data 50
KNN on Gaussian Data 51
KNN on Gaussian Data 52
KNN on Gaussian Data 53
KNN on Gaussian Data 54
KNN on Gaussian Data 55
KNN on Gaussian Data 56
KNN on Gaussian Data 57
KNN on Gaussian Data 58
CHOOSING THE NUMBER OF NEIGHBORS 59
(Name changed from K- Fold Cross- Validation to avoid confusion with KNN) F- Fold Cross- Validation Key idea: rather than just a single validation set, use many! (Error is more stable. Slower computation.) D = y (1) y (2) y (N) x (1) x (2) x (N) Fold 1 Fold 2 Fold 3 Fold 4 Divide data into folds (e.g. 4) 1. Train on folds {1,2,3} and predict on {4} 2. Train on folds {1,2,4} and predict on {3} 3. Train on folds {1,3,4} and predict on {2} 4. Train on folds {2,3,4} and predict on {1} Concatenate all the predictions and evaluate error 60
Math as Code How to implement? y max = argmax f(y) y Y It depends on how large the set Y is! If it s a small enumerable set Y = {1,2,,77}, then: ymax = -inf for y in {1,2, 77}: if f(y) > ymax: ymax = y return ymax 61
Math as Code How to implement? v max = max f(y) y Y It depends on how large the set Y is! If it s a small enumerable set Y = {1,2,,77}, then: vmax = -inf for y in {1,2, 77}: if f(y) > vmax: vmax = f(y) return vmax 62
Function Approximation View of ML Whiteboard 63
Beyond the Scope of This Lecture k- Nearest Neighbors (KNN) for Regression Distance- weighted KNN Cover & Hart (1967) Bayes error rate bound KNN for Facial Recognition (see Eigenfaces in PCA lecture) 64
Takeaways k- Nearest Neighbors Requires careful choice of k (# of neighbors) Experimental design can be just as important as the learning algorithm itself Function Approximation View Assumption: inputs are sampled from some unknown distributions Assumption: outputs come from a fixed unknown function (e.g. human annotator) Goal: Learn a hypothesis which closely approximates that function 65