Experimental Design + k- Nearest Neighbors

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3 Murphy - - - Bishop 2.5.2 Prob. Readings: (next lecture) Lecture notes from 10-600 (See Piazza post for the pointers) Murphy 2 Bishop 2 HTF - - Mitchell - - Matt Gormley Lecture 3 January 25, 2016 1

Reminders Background Exercises (Homework 1) Released: Wed, Jan. 25 Due: Mon, Jan. 30 at 5:30pm Website updates Office hours Google calendar on People Readings on Schedule Meet AIs: Sarah, Daniel, Brynn 2

Outline k- Nearest Neighbors (KNN) Special cases Choosing k Case Study: KNN on Fisher Iris Data Case Study: KNN on 2D Gaussian Data Experimental Design Train error vs. test error Train / validation / test splits Cross- validation Function Approximation View of ML 3

K- NEAREST NEIGHBORS 4

Whiteboard: Special cases Choosing k k- Nearest Neighbors 5

KNN ON FISHER IRIS DATA 6

Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Length Sepal Width Petal Length 0 4.3 3.0 1.1 0.1 0 4.9 3.6 1.4 0.1 0 5.3 3.7 1.5 0.2 1 4.9 2.4 3.3 1.0 1 5.7 2.8 4.1 1.3 1 6.3 3.3 4.7 1.6 1 6.7 3.0 5.0 1.7 Petal Width Full dataset: https://en.wikipedia.org/wiki/iris_flower_data_set 7

KNN on Fisher Iris Data 8

KNN on Fisher Iris Data Special Case: Nearest Neighbor 9

KNN on Fisher Iris Data Special Case: Majority Vote 10

KNN on Fisher Iris Data 11

KNN on Fisher Iris Data Special Case: Nearest Neighbor 12

KNN on Fisher Iris Data 13

KNN on Fisher Iris Data 14

KNN on Fisher Iris Data 15

KNN on Fisher Iris Data 16

KNN on Fisher Iris Data 17

KNN on Fisher Iris Data 18

KNN on Fisher Iris Data 19

KNN on Fisher Iris Data 20

KNN on Fisher Iris Data 21

KNN on Fisher Iris Data 22

KNN on Fisher Iris Data 23

KNN on Fisher Iris Data 24

KNN on Fisher Iris Data 25

KNN on Fisher Iris Data 26

KNN on Fisher Iris Data 27

KNN on Fisher Iris Data 28

KNN on Fisher Iris Data 29

KNN on Fisher Iris Data 30

KNN on Fisher Iris Data 31

KNN on Fisher Iris Data Special Case: Majority Vote 32

KNN ON GAUSSIAN DATA 33

KNN on Gaussian Data 34

KNN on Gaussian Data 35

KNN on Gaussian Data 36

KNN on Gaussian Data 37

KNN on Gaussian Data 38

KNN on Gaussian Data 39

KNN on Gaussian Data 40

KNN on Gaussian Data 41

KNN on Gaussian Data 42

KNN on Gaussian Data 43

KNN on Gaussian Data 44

KNN on Gaussian Data 45

KNN on Gaussian Data 46

KNN on Gaussian Data 47

KNN on Gaussian Data 48

KNN on Gaussian Data 49

KNN on Gaussian Data 50

KNN on Gaussian Data 51

KNN on Gaussian Data 52

KNN on Gaussian Data 53

KNN on Gaussian Data 54

KNN on Gaussian Data 55

KNN on Gaussian Data 56

KNN on Gaussian Data 57

KNN on Gaussian Data 58

CHOOSING THE NUMBER OF NEIGHBORS 59

(Name changed from K- Fold Cross- Validation to avoid confusion with KNN) F- Fold Cross- Validation Key idea: rather than just a single validation set, use many! (Error is more stable. Slower computation.) D = y (1) y (2) y (N) x (1) x (2) x (N) Fold 1 Fold 2 Fold 3 Fold 4 Divide data into folds (e.g. 4) 1. Train on folds {1,2,3} and predict on {4} 2. Train on folds {1,2,4} and predict on {3} 3. Train on folds {1,3,4} and predict on {2} 4. Train on folds {2,3,4} and predict on {1} Concatenate all the predictions and evaluate error 60

Math as Code How to implement? y max = argmax f(y) y Y It depends on how large the set Y is! If it s a small enumerable set Y = {1,2,,77}, then: ymax = -inf for y in {1,2, 77}: if f(y) > ymax: ymax = y return ymax 61

Math as Code How to implement? v max = max f(y) y Y It depends on how large the set Y is! If it s a small enumerable set Y = {1,2,,77}, then: vmax = -inf for y in {1,2, 77}: if f(y) > vmax: vmax = f(y) return vmax 62

Function Approximation View of ML Whiteboard 63

Beyond the Scope of This Lecture k- Nearest Neighbors (KNN) for Regression Distance- weighted KNN Cover & Hart (1967) Bayes error rate bound KNN for Facial Recognition (see Eigenfaces in PCA lecture) 64

Takeaways k- Nearest Neighbors Requires careful choice of k (# of neighbors) Experimental design can be just as important as the learning algorithm itself Function Approximation View Assumption: inputs are sampled from some unknown distributions Assumption: outputs come from a fixed unknown function (e.g. human annotator) Goal: Learn a hypothesis which closely approximates that function 65