CSCI567 Machine Learning (Fall 2014)

Size: px
Start display at page:

Download "CSCI567 Machine Learning (Fall 2014)"

Transcription

1 CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu September 9, 2014 Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

2 Administration Outline 1 Administration 2 First learning algorithm: Nearest neighbor classifier 3 More deep understanding about NNC 4 Some practical sides of NNC 5 What we have learned Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

3 Administration Entrance exam All were graded about 87% has passed 96 students in Prof. Sha s section and 48 in Prof. Liu s section Those who have passed the threshold were granted D-clearance please enroll asap by Friday noon. In several cases, advisors had sent out inquiries please respond by Thursday noon per their instructions. If not being granted D-clearance, or being contacted by advisors, you can assume that you are not permitted to enroll. You can ask TAs to check up your grades, only if you strongly believe you did well. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

4 First learning algorithm: Nearest neighbor classifier Outline 1 Administration 2 First learning algorithm: Nearest neighbor classifier Intuitive example General setup for classification Algorithm 3 More deep understanding about NNC 4 Some practical sides of NNC 5 What we have learned Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

5 First learning algorithm: Nearest neighbor classifier Intuitive example Recognizing flowers Types of Iris: setosa, versicolor, and virginica Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

6 First learning algorithm: Nearest neighbor classifier Intuitive example Measuring the properties of the flowers Features and attributes: the widths and lengths of sepal and petal Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

7 First learning algorithm: Nearest neighbor classifier Intuitive example Pairwise scatter plots of 131 flower specimens Visualization of data helps to identify the right learning model to use Each colored point is a flower specimen: setosa, versicolor, virginica sepal length sepal width petal length petal width petal width petal length sepal width sepal length Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

8 First learning algorithm: Nearest neighbor classifier Intuitive example Different types seem well-clustered and separable Using two features: petal width and sepal length sepal length sepal width petal length petal width petal width petal length sepal width sepal length Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

9 First learning algorithm: Nearest neighbor classifier Intuitive example Labeling an unknown flower type Closer to red cluster: so labeling it as setosa? sepal length sepal width petal length petal width petal width petal length sepal width sepal length Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

10 First learning algorithm: Nearest neighbor classifier General setup for classification Multi-class classification Classify data into one of the multiple categories Input (feature vectors): x R D Output (label): y [C] = {1, 2,, C} Learning goal: y = f(x) Special case: binary classification Number of classes: C = 2 Labels: {0, 1} or { 1, +1} Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

11 First learning algorithm: Nearest neighbor classifier General setup for classification More terminology Training data (set) N samples/instances: D train = {(x 1, y 1 ), (x 2, y 2 ),, (x N, y N )} They are used for learning f( ) Test (evaluation) data M samples/instances: D test = {(x 1, y 1 ), (x 2, y 2 ),, (x M, y M )} They are used for assessing how well f( ) will do in predicting an unseen x / D train Training data and test data should not overlap: D train D test = Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

12 First learning algorithm: Nearest neighbor classifier General setup for classification Often, data is conveniently organized as a table Ex: Iris data (click here for all data) 4 features 3 classes Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

13 First learning algorithm: Nearest neighbor classifier Algorithm Nearest neighbor classification (NNC) Nearest neighbor x(1) = x nn(x) where nn(x) [N] = {1, 2,, N}, i.e., the index to one of the training instances, nn(x) = arg min n [N] x x n 2 2 = arg min n [N] D d=1 (x d x nd ) 2 Classification rule y = f(x) = y nn(x) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

14 First learning algorithm: Nearest neighbor classifier Algorithm Visual example In this 2-dimensional example, the nearest point to x is a training instance, thus, x will be labeled as red. x 2 (a) x 1 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

15 First learning algorithm: Nearest neighbor classifier Algorithm Example: classify Iris with two features Training data ID (n) petal width (x 1 ) sepal length (x 2 ) category (y) setoas versicolor virginica.. Flower with unknown category petal width = 1.8 and sepal width = 6.4 Calculating distance = x 1 x n1 ) 2 + (x 2 x n2 ) 2 ID distance Thus, the category is versicolor (the real category is viriginica) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47.

16 First learning algorithm: Nearest neighbor classifier Algorithm Decision boundary For every point in the space, we can determine its label using the NNC rule. This gives rise to a decision boundary that partitions the space into different regions. x 2 (b) x 1 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

17 First learning algorithm: Nearest neighbor classifier Algorithm How to measure nearness with other distances? Previously, we use the Euclidean distance nn(x) = arg min n [N] x x n 2 2 We can also use alternative distances E.g., the following L 1 distance (i.e., city block distance, or Manhattan distance) nn(x) = arg min n [N] x x n 1 = arg min n [N] D d=1 x d x nd Green line is Euclidean distance. Red, Blue, and Yellow lines are L 1 distance Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

18 First learning algorithm: Nearest neighbor classifier Algorithm K-nearest neighbor (KNN) classification Increase the number of nearest neighbors to use? 1-nearest neighbor: nn 1 (x) = arg min n [N] x x n 2 2 2nd-nearest neighbor: nn 2 (x) = arg min n [N] nn1 (x) x x n 2 2 3rd-nearest neighbor: nn 2 (x) = arg min n [N] nn1 (x) nn 2 (x) x x n 2 2 The set of K-nearest neighbor Let x(k) = x nnk (x), then knn(x) = {nn 1 (x), nn 2 (x),, nn K (x)} x x(1) 2 2 x x(2) 2 2 x x(k) 2 2 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

19 First learning algorithm: Nearest neighbor classifier Algorithm How to classify with K neighbors? Classification rule Every neighbor votes: suppose y n (the true label) for x n is c, then vote for c is 1 vote for c c is 0 We use the indicator function I(y n == c) to represent. Aggregate everyone s vote v c = Label with the majority n knn(x) I(y n == c), c [C] y = f(x) = arg max c [C] v c Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

20 First learning algorithm: Nearest neighbor classifier Algorithm Example x 2 K=1, Label: red x 2 K=3, Label: red x 2 K=5, Label: blue (a) x 1 (a) x 1 (a) x 1 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

21 First learning algorithm: Nearest neighbor classifier Algorithm How to choose an optimal K? 2 K = 1 2 K = 3 2 K = 3 1 x 7 x 7 x x x x 6 When K increases, the decision boundary becomes smooth. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

22 First learning algorithm: Nearest neighbor classifier Algorithm Mini-summary Advantages of NNC Computationally, simple and easy to implement just computing the distance Theoretically, has strong guarantees doing the right thing Disadvantages of NNC Computationally intensive for large-scale problems: O(ND) for labeling a data point We need to carry the training data around. Without it, we cannot do classification. This type of method is called nonparametric. Choosing the right distance measure and K can be involved. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

23 Outline 1 Administration 2 First learning algorithm: Nearest neighbor classifier 3 More deep understanding about NNC Measuring performance The ideal classifier Comparing NNC to the ideal classifier 4 Some practical sides of NNC 5 What we have learned Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

24 Is NNC too simple to do the right thing? To answer this question, we proceed in 3 steps 1 We define a performance metric for a classifier/algorithm. 2 We then propose an ideal classifier. 3 We then compare our simple NNC classifier to the ideal one and show that it performs nearly as good. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

25 Measuring performance How to measure performance of a classifier? Intuition We should compute accuracy the percentage of data points being correctly classified, or the error rate the percentage of data points being incorrectly classified. Two versions: which one to use? Defined on the training data set A train = 1 I[f(x n ) == y n ], N n ε train = 1 I[f(x n ) y n ] N n Defined on the test (evaluation) data set A test = 1 I[f(x m ) == y m ], M m ε test = 1 I[f(x m ) y m ] M M Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

26 Measuring performance Example Training data What are A train and ε train? Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

27 Measuring performance Example Training data What are A train and ε train? A train = 100%, ε train = 0% Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

28 Measuring performance Example Training data Test data What are A train and ε train? What are A test and ε test? A train = 100%, ε train = 0% Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

29 Measuring performance Example Training data Test data What are A train and ε train? A train = 100%, ε train = 0% What are A test and ε test? A test = 0%, ε test = 100% Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

30 Measuring performance Leave-one-out (LOO) Idea For each training instance x n, take it out of the training set and then label it. For NNC, x n s nearest neighbor will not be itself. So the error rate would not become 0 necessarily. Training data What are the LOO-version of A train and ε train? Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

31 Measuring performance Leave-one-out (LOO) Idea For each training instance x n, take it out of the training set and then label it. For NNC, x n s nearest neighbor will not be itself. So the error rate would not become 0 necessarily. Training data What are the LOO-version of A train and ε train? A train = 66.67%(i.e., 4/6) ε train = 33.33%(i.e., 2/6) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

32 Measuring performance Drawback of the metrics They are dataset-specific Given a different training (or test) dataset, A train (or A test ) will change. Thus, if we get a dataset randomly, these variables would be random quantities. A test D 1, A test,, A test D q, D 2 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

33 Measuring performance Drawback of the metrics They are dataset-specific Given a different training (or test) dataset, A train (or A test ) will change. Thus, if we get a dataset randomly, these variables would be random quantities. A test D 1, A test,, A test D q, These are called empirical accuracies (or errors). D 2 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

34 Measuring performance Drawback of the metrics They are dataset-specific Given a different training (or test) dataset, A train (or A test ) will change. Thus, if we get a dataset randomly, these variables would be random quantities. A test D 1, A test,, A test D q, These are called empirical accuracies (or errors). D 2 Can we understand the algorithm itself in a more certain nature, by removing the uncertainty caused by the datasets? Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

35 Measuring performance Expected mistakes Setup Assume our data (x, y) is drawn from the joint and unknown distribution p(x, y) Classification mistake on a single data point x with the ground-truth label y { 0 if f(x) = y L(f(x), y) = 1 if f(x) y Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

36 Measuring performance Expected mistakes Setup Assume our data (x, y) is drawn from the joint and unknown distribution p(x, y) Classification mistake on a single data point x with the ground-truth label y { 0 if f(x) = y L(f(x), y) = 1 if f(x) y Expected classification mistake on a single data point x R(f, x) = E y p(y x) L(f(x), y) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

37 Measuring performance Expected mistakes Setup Assume our data (x, y) is drawn from the joint and unknown distribution p(x, y) Classification mistake on a single data point x with the ground-truth label y { 0 if f(x) = y L(f(x), y) = 1 if f(x) y Expected classification mistake on a single data point x R(f, x) = E y p(y x) L(f(x), y) The average classification mistake by the classifier itself R(f) = E x p(x) R(f, x) = E (x,y) p(x,y) L(f(x), y) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

38 Measuring performance Jargons L(f(x), y) is called 0/1 loss function many other forms of loss functions exist for different learning problems. Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

39 Measuring performance Jargons L(f(x), y) is called 0/1 loss function many other forms of loss functions exist for different learning problems. Expected conditional risk R(f, x) = E y p(y x) L(f(x), y) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

40 Measuring performance Jargons L(f(x), y) is called 0/1 loss function many other forms of loss functions exist for different learning problems. Expected conditional risk R(f, x) = E y p(y x) L(f(x), y) Expected risk R(f) = E (x,y) p(x,y) L(f(x), y) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

41 Measuring performance Jargons L(f(x), y) is called 0/1 loss function many other forms of loss functions exist for different learning problems. Expected conditional risk R(f, x) = E y p(y x) L(f(x), y) Expected risk R(f) = E (x,y) p(x,y) L(f(x), y) Empirical risk R D (f) = 1 L(f(x n ), y n ) N Obviously, this is our empirical error (rates). n Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

42 Measuring performance Ex: binary classification Expected conditional risk of a single data point x R(f, x) = E y p(y x) L(f(x), y) = P (y = 1 x)i[f(x) = 0] + P (y = 0 x)i[f(x) = 1] Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

43 Measuring performance Ex: binary classification Expected conditional risk of a single data point x R(f, x) = E y p(y x) L(f(x), y) = P (y = 1 x)i[f(x) = 0] + P (y = 0 x)i[f(x) = 1] Let η(x) = P (y = 1 x), we have R(f, x) = η(x)i[f(x) = 0] + (1 η(x))i[f(x) = 1] = 1 {η(x)i[f(x) = 1] + (1 η(x))i[f(x) = 0]} }{{} expected conditional accuracies Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

44 Measuring performance Ex: binary classification Expected conditional risk of a single data point x R(f, x) = E y p(y x) L(f(x), y) = P (y = 1 x)i[f(x) = 0] + P (y = 0 x)i[f(x) = 1] Let η(x) = P (y = 1 x), we have R(f, x) = η(x)i[f(x) = 0] + (1 η(x))i[f(x) = 1] = 1 {η(x)i[f(x) = 1] + (1 η(x))i[f(x) = 0]} }{{} expected conditional accuracies Exercise: please verify the last equality. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

45 The ideal classifier Bayes optimal classifier Consider the following classifier, using the posterior probability η(x) = p(y = 1 x) f (x) = { 1 if η(x) 1/2 0 if η(x) < 1/2 equivalently f (x) = { 1 if p(y = 1 x) p(y = 0 x) 0 if p(y = 1 x) < p(y = 0 x) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

46 The ideal classifier Bayes optimal classifier Consider the following classifier, using the posterior probability η(x) = p(y = 1 x) f (x) = { 1 if η(x) 1/2 0 if η(x) < 1/2 equivalently f (x) = { 1 if p(y = 1 x) p(y = 0 x) 0 if p(y = 1 x) < p(y = 0 x) Theorem For any labeling function f( ), R(f, x) R(f, x). Similarly, R(f ) R(f). Namely, f ( ) is optimal. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

47 The ideal classifier Proof (not required) From definition R(f, x) = 1 {η(x)i[f(x) = 1] + (1 η(x))i[f(x) = 0]} R(f, x) = 1 {η(x)i[f (x) = 1] + (1 η(x))i[f (x) = 0]} Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

48 The ideal classifier Proof (not required) From definition R(f, x) = 1 {η(x)i[f(x) = 1] + (1 η(x))i[f(x) = 0]} R(f, x) = 1 {η(x)i[f (x) = 1] + (1 η(x))i[f (x) = 0]} Thus, R(f, x) R(f, x) = η(x) {I[f (x) = 1] I[f(x) = 1]} + (1 η(x)) {I[f (x) = 0] I[f(x) = 0]} = η(x) {I[f (x) = 1] I[f(x) = 1]} + (1 η(x)) {1 I[f (x) = 1] 1 + I[f(x) = 1]} = (2η(x) 1) {I[f (x) = 1] I[f(x) = 1]} 0 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

49 The ideal classifier Bayes optimal classifier in general form For multi-class classification problem f (x) = arg max c [C] p(y = c x) when C = 2, this reduces to detecting whether or not η(x) = p(y = 1 x) is greater than 1/2. We refer p(y = c x) as the posterior probability of x. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

50 The ideal classifier Bayes optimal classifier in general form For multi-class classification problem f (x) = arg max c [C] p(y = c x) when C = 2, this reduces to detecting whether or not η(x) = p(y = 1 x) is greater than 1/2. We refer p(y = c x) as the posterior probability of x. Remarks The Bayes optimal classifier is generally not computable as it assumes the knowledge of p(x, y) or p(y x). However, it is useful as a conceptual tool to formalize how well a classifier can do without knowing the joint distribution. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

51 Comparing NNC to the ideal classifier Comparing NNC to Bayes optimal classifier How well does our NNC do? Theorem For the NNC rule f nnc for binary classification, we have, R(f ) R(f nnc ) 2R(f )(1 R(f )) 2R(f ) Namely, the expected risk by the classifier is at worst twice that of the Bayes optimal classifier. In short, NNC seems doing a reasonable thing Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

52 Comparing NNC to the ideal classifier Proof sketches (not required) Step 1 To show that when N +, x and its nearest neighbor x(1) are similar. In particular p(y = 1 x) and p(y = 1 x(1)) are the same. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

53 Comparing NNC to the ideal classifier Proof sketches (not required) Step 1 To show that when N +, x and its nearest neighbor x(1) are similar. In particular p(y = 1 x) and p(y = 1 x(1)) are the same. Step 2 To show the mistake made by our classifier on x is attributed to the probability that x and x(1) have different labels. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

54 Comparing NNC to the ideal classifier Proof sketches (not required) Step 1 To show that when N +, x and its nearest neighbor x(1) are similar. In particular p(y = 1 x) and p(y = 1 x(1)) are the same. Step 2 To show the mistake made by our classifier on x is attributed to the probability that x and x(1) have different labels. Step 3 To show the probability having different labels is R(f, x) = 2p(y = 1 x)[1 p(y = 1 x)] = 2η(x)(1 η(x)) Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

55 Comparing NNC to the ideal classifier Proof sketches (not required) Step 1 To show that when N +, x and its nearest neighbor x(1) are similar. In particular p(y = 1 x) and p(y = 1 x(1)) are the same. Step 2 To show the mistake made by our classifier on x is attributed to the probability that x and x(1) have different labels. Step 3 To show the probability having different labels is R(f, x) = 2p(y = 1 x)[1 p(y = 1 x)] = 2η(x)(1 η(x)) Step 4 To show the expected risk of the Bayes optimal classifier on x is R(f, x) = min{η(x), 1 η(x)} Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

56 Comparing NNC to the ideal classifier Proof sketches (not required) Step 1 To show that when N +, x and its nearest neighbor x(1) are similar. In particular p(y = 1 x) and p(y = 1 x(1)) are the same. Step 2 To show the mistake made by our classifier on x is attributed to the probability that x and x(1) have different labels. Step 3 To show the probability having different labels is R(f, x) = 2p(y = 1 x)[1 p(y = 1 x)] = 2η(x)(1 η(x)) Step 4 To show the expected risk of the Bayes optimal classifier on x is Step 5 Then tie everything together R(f, x) = min{η(x), 1 η(x)} R(f, x) = 2R(f, x)(1 R(f, x)) We are also most there. But one more step is needed (and omitted here). Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

57 Comparing NNC to the ideal classifier Mini-summary Advantages of NNC Computationally, simple and easy to implement just computing the distance Theoretically, has strong guarantees doing the right thing Disadvantages of NNC Computationally intensive for large-scale problems: O(ND) for labeling a data point We need to carry the training data around. Without it, we cannot do classification. This type of method is called nonparametric. Choosing the right distance measure and K can be involved. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

58 Some practical sides of NNC Outline 1 Administration 2 First learning algorithm: Nearest neighbor classifier 3 More deep understanding about NNC 4 Some practical sides of NNC How to tune to get the best out of it? Preprocessing data 5 What we have learned Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

59 Some practical sides of NNC How to tune to get the best out of it? Hypeparameters in NNC Two practical issues about NNC Choosing K, i.e., the number of nearest neighbors (default is 1) Choosing the right distance measure (default is Euclidean distance), for example, from the following generalized distance measure for p 1. ( ) 1/p x x n p = (x d x nd ) p Those are not specified by the algorithm itself resolving them requires empirical studies and are task/dataset-specific. d Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

60 Some practical sides of NNC How to tune to get the best out of it? Tuning by using a validation dataset Training data (set) N samples/instances: D train = {(x 1, y 1 ), (x 2, y 2 ),, (x N, y N )} They are used for learning f( ) Test (evaluation) data M samples/instances: D test = {(x 1, y 1 ), (x 2, y 2 ),, (x M, y M )} They are used for assessing how well f( ) will do in predicting an unseen x / D train Development (or validation) data L samples/instances: D dev = {(x 1, y 1 ), (x 2, y 2 ),, (x L, y L )} They are used to optimize hyperparameter(s). Training data, validation and test data should not overlap! Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

61 Some practical sides of NNC How to tune to get the best out of it? Recipe for each possible value of the hyperparameter (say K = 1, 3,, 100) Train a model using D train Evaluate the performance of the model on D dev Choose the model with the best performance on D dev Evaluate the model on D test Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

62 Some practical sides of NNC How to tune to get the best out of it? Cross-validation What if we do not have validation data? We split the training data into S equal parts. We use each part in turn as a validation dataset and use the others as a training dataset. We choose the hyperparameter such that on average, the model performing the best S = 5: 5-fold cross validation Special case: when S = N, this will be leave-one-out. Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

63 Some practical sides of NNC How to tune to get the best out of it? Recipe Split the training data into S equal parts. Denote each part as D train s for each possible value of the hyperparameter (say K = 1, 3,, 100) for every s [1, S] Train a model using D train \s = D train Ds train Evaluate the performance of the model on Ds train Average the S performance metrics Choose the hyperparameter corresponding to the best averaged performance Use the best hyerparamter to train on a model using all D train Evaluate the model on D test Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

64 Some practical sides of NNC Preprocessing data Yet, another practical issue with NNC Distances depend on units of the features! (Show how proximity can be changed due to change in features scale; Draw on screen or blackboard) Drs. Sha & Liu CSCI567 Machine Learning (Fall 2014) September 9, / 47

65 Some practical sides of NNC Preprocessing data Preprocess data Normalize data so that the data look like from a normal distribution Compute the means and standard deviations in each feature x d = 1 N Scale the feature accordingly n x nd, s 2 d = 1 N 1 x nd x nd x d s d (x nd x d ) 2 Many other ways of normalizing data you would need/want to try different ones and pick them using (cross)validation n Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

66 What we have learned Outline 1 Administration 2 First learning algorithm: Nearest neighbor classifier 3 More deep understanding about NNC 4 Some practical sides of NNC 5 What we have learned Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

67 What we have learned Summary so far Described a simple learning algorithm Used intensively in practical applications you will get a taste of it in your homework Discussed a few practical aspects, such as tuning hyperparameters, with (cross)validation Briefly studied its theoretical properties Concepts: loss function, risks, Bayes optimal Theoretical guarantees: explaining why NNC would work Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, / 47

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Aug 22, 2018 August 22, 2018 1 / 63 Outline 1 About this course 2 Overview of machine learning 3 Nearest Neighbor Classifier

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal

More information

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in

More information

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing) k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Nearest neighbors classifiers

Nearest neighbors classifiers Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, 2017 1 / 25 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Experimental Design + k- Nearest Neighbors

Experimental Design + k- Nearest Neighbors 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3

More information

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Machine Learning for. Artem Lind & Aleskandr Tkachenko Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we

More information

Notes and Announcements

Notes and Announcements Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this class 1. Introduce a basic class of learning methods, namely local methods. 2. Discuss the fundamental concept

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued. The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

Classification and K-Nearest Neighbors

Classification and K-Nearest Neighbors Classification and K-Nearest Neighbors Administrivia o Reminder: Homework 1 is due by 5pm Friday on Moodle o Reading Quiz associated with today s lecture. Due before class Wednesday. NOTETAKER 2 Regression

More information

CS 584 Data Mining. Classification 1

CS 584 Data Mining. Classification 1 CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Input: Concepts, Instances, Attributes

Input: Concepts, Instances, Attributes Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Statistical Methods in AI

Statistical Methods in AI Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

Classification Key Concepts

Classification Key Concepts http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech 1 How will

More information

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges. Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k

More information

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training

More information

Application of Fuzzy Logic Akira Imada Brest State Technical University

Application of Fuzzy Logic Akira Imada Brest State Technical University A slide show of our Lecture Note Application of Fuzzy Logic Akira Imada Brest State Technical University Last modified on 29 October 2016 (Contemporary Intelligent Information Techniques) 2 I. Fuzzy Basic

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct

More information

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Classification Key Concepts

Classification Key Concepts http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Multi-label Classification. Jingzhou Liu Dec

Multi-label Classification. Jingzhou Liu Dec Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric

More information

CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm

CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm CS 170 Algorithms Fall 2014 David Wagner HW12 Due Dec. 5, 6:00pm Instructions. This homework is due Friday, December 5, at 6:00pm electronically via glookup. This homework assignment is a programming assignment

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set - solutions Thursday, October What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove. Do not

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

Machine Learning and Pervasive Computing

Machine Learning and Pervasive Computing Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?

More information

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015 Foundations of Machine Learning École Centrale Paris Fall 2015 7. Nearest neighbors Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr Learning

More information

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition

More information

EPL451: Data Mining on the Web Lab 5

EPL451: Data Mining on the Web Lab 5 EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Topic 7 Machine learning

Topic 7 Machine learning CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Nonparametric Classification Methods

Nonparametric Classification Methods Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)

More information

Clustering & Classification (chapter 15)

Clustering & Classification (chapter 15) Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1 Homework 2 Due: March 2, 2018 at 7:00PM Written Questions Problem 1: Estimator (5 points) Let x 1, x 2,..., x m be an i.i.d. (independent and identically distributed) sample drawn from distribution B(p)

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Nearest Neighbor Classification. Machine Learning Fall 2017

Nearest Neighbor Classification. Machine Learning Fall 2017 Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

NOTE: Your grade will be based on the correctness, efficiency and clarity.

NOTE: Your grade will be based on the correctness, efficiency and clarity. THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Department of Computer Science COMP328: Machine Learning Fall 2006 Assignment 2 Due Date and Time: November 14 (Tue) 2006, 1:00pm. NOTE: Your grade will

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Nearest Neighbor Methods

Nearest Neighbor Methods Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a

More information

k-nn classification & Statistical Pattern Recognition

k-nn classification & Statistical Pattern Recognition k-nn classification & Statistical Pattern Recognition Andreas C. Kapourani (Credit: Hiroshi Shimodaira) February 27 k-nn classification In classification, the data consist of a training set and a test

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Mining di Dati Web. Lezione 3 - Clustering and Classification

Mining di Dati Web. Lezione 3 - Clustering and Classification Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 14 Python Exercise on knn and PCA Hello everyone,

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information