MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11 Nearest Neighbour Classifier Keywords: K Neighbours, Weighted, Nearest Neighbour 1
Nearest neighbour classifiers This is amongst the simplest of all classification algorithms in supervised learning. This is a method of classifying patterns based on the class label of the closest training patterns in the feature space. The common algorithms used here are the nearest neighbour(nn) algorithm, the k-nearest neighbour(knn) algorithm, and the modified k-nearest neighbour (mknn) algorithm. These are non-parametric methods where no model is fitted using the training patterns. The accuracy using nearest neighbour classifiers is good. It is guaranteed to yield an error rate no worse than twice the Bayes error rate (explained in Module 10) which is the optimal error rate. There is no training time required for this classifier. In other words, there is no design time for training the classifier. Every time a test pattern is to be classified, it has to be compared with all the training patterns, to find the closest pattern. This classification time could be large if the training patterns are large in number or if the dimensionality of the patterns is high. Nearest neighbour algorithm If there are n patterns 1, 2,..., n in the training data, and a test pattern P, If k is the most similar pattern to P from, then the class of P is the class of k. The similarity is usually measured by computing the distance from P to the patterns 1, 2,..., n. If d(p, i ) is the distance from P to i, then P is the assigned the class label of k where 2
d(p, k ) = min{d(p, i )} where i = 1... n k-nearest Neighbour (knn) classification algorithm An object is classified by a majority vote of the class of its neighbours. The object is assigned to the class most common amongst its k nearest neighbours. If k=1, this becomes the nearest neighbour algorithm. This algorithm may give a more correct classification for boundary patterns than the NN algorithm. The value of k has to be specified by the user and the best choice depends on the data. Larger values of k reduce the effect of noise on the classification. The value of k can be arbitrary increased when the training data set is large in size. The k value can be chosen by using a validation set and choosing the k value giving best accuracy on the validation set. The main disadvantage of knn algorithm is that it is very time consuming especially when the training data is large. To overcome this problem, a number of algorithms have been proposed to access the k nearest patterns as fast as possible. Modified k-nearest Neighbour (mknn) classifier The contribution of the neighbours to the classification is weighted according to its distance from the test pattern. Hence, the nearest neighbour contributes more to the classification decision than the neighbours further away. 3
One weighting scheme would be to give each neighbour a weight of 1 d, where d is the distance from P to the neighbour. Another weighting scheme finds the weight from the neighbour as where i=1,...,k. w i = { dk d i d k d 1 if d k d 1 1 if d k = d 1 The value of w i varies from 1 for the closest pattern to 0 for the farthest pattern among the k closest patterns. This modification would mean that outliers will not affect the classification as much as the knn classifier. Example f2 4 5 3 2 1 P 6 7 8 9 f1 Figure 1: Two class problem Consider the two class problem shown in Figure 1. There are four patterns in Class 1 marked as and there are five patterns in Class 2 marked as +. The test pattern is is P. Using the nearest neighbour algorithm, the 4
closest pattern to P is 1 whose class is 1. Therefore P will be assigned to Class 1. If knn algorithm is used, after 1, P is closest to 6 and 7. So, if k=3, P will be assigned to Class 2. It can be seen that the value of k is crucial to the classification. If k=1, it reduces to the NN classifier. In this case, if k=4, the next closest pattern could be 2. If k=5 and 3 is closer to P than 5, then again due to majority vote, P will be assigned to Class 1. This shows how important the value of k is to the classification. If P is an outlier of one class but is closest to a pattern of another class, by taking majority vote, the misclassification of P can be prevented. f2 1 3 4 2 9 + 5 8 + + 10 P + +6 7 f1 Figure 2: Another two class problem For example, if we consider Figure 2 we can see that the test pattern P is closest to 5 which belongs to Class 1 and therefore, it would be classified as belonging to Class 1 if NN classifier is used. 5 is an outlier of Class 1 and it can be seen that classifying P as belonging to Class 2 would be more meaningful. If knn algorithm is used with k=3, then P would be classified as belonging to Class 2. Using mknn, the classification depends on the distances of the closest patterns from the test pattern. In the knn algorithm, all the k patterns 5
will have equal importance. In mknn, the closest pattern is given more significance than the farthest pattern. The weightage given to the class of the first closest pattern is more than for the second closest pattern and so on. For example, if the 5 nearest neighbours to P are 1, 2, 3, 4 and 5, where d( 1,P) = 1,d( 2,P) = 2,d( 3,P) = 2.5,d( 4,P) = 4 and d( 5,P) = 5, and if 1 and 4 belong to Class 1 and 2, 3 and 5 belong to Class 2, then the weight given to Class1 by 1 will be W 11 = 1 The weight given to Class 1 by 4 will be W 14 = 5 4 5 1 = 0.25 The total weight of Class 1 will be w 1 = W 1 +W 2 = 1.0+0.25 = 1.25 The weight given to Class 2 by 2 will be W 22 = 5 2 5 1 = 0.75 The weight given to Class 2 by 3 will be W 23 = 5 2.5 5 1 = 0.625 The weight given to Class 2 by 5 which is W 25 will be 0 since it is the farthest of the 5 neighbours. The total weight of Class 2 will be w 2 = W 22 +W 23 +W 25 = 0.75+0.625+0 = 1.375 Since w 2 > w 1, P is classified as belonging to Class 2. If we consider Figure 1, the closest points to P are 1, 6, 7, 2 and 5. If the distances from P to 1, 6, 7, 2 and 5 are 0.3, 1.0,1.1,1.5 and 6
1.6, then calculating the weight given to the two classes The weight given to Class 1 by 1 will be W 11 = 1 The weight given to Class 1 by 2 will be W 12 = 1.6 1.5 1.6 0.3 = 0.077 The total weight of Class 1 will be w 1 = W 11 +W 12 = 1+0.077 = 1.077 The weight given to Class 2 by 6 will be W 26 = 1.6 1.0 1.6 0.3 = 0.462 The weight given to Class 2 by 7 will be W 27 = 1.6 1.1 1.6 0.3 = 0.385 The weight given to Class 2 by 5 which is W 25 is 0, since 5 is the farthest of the 5 neighbours. Then the total weight for Class 2 will be w 2 = W 26 +W 27 +W 25 = 0.462+0.385+0 = 0.847 Since w 1 > w 2, P is classified as belonging to Class 1. One point to be noted here is that while knn algorithm classifies P as belonging to Class 2, mknn algorithm classifies P as belonging to Class 1. It can therefore be seen that the classification decision using knn and mknn may vary. 7