1-Nearest Neighbor Boundary

Size: px

Start display at page:

Download "1-Nearest Neighbor Boundary"

Whitney Carr
5 years ago
Views:

1 Linear Models

2 Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We would like here to draw a linear separator, and get so a classifier.

3 1-Nearest Neighbor Boundary The decision boundary will be the boundary between cells defined by points of different classes, as illustrated by the bold line shown here.

4 Decision Tree Boundary Similarly, a decision tree also defines a decision boundary in the attribute space. Although both 1-NN and decision trees agree on all the training points, they disagree on the precise decision boundary and so will classify some query points differently. This is the essential difference between different learning algorithms.

But, there is no guarantee that a single linear separator will successfully classify any

5 Linear Classification Linear separators are characterized by a single linear decision boundary in the space. The bankruptcy data can be successfully separated in that manner. But, there is no guarantee that a single linear separator will successfully classify any set of training data. In the spaces with more dimensions, the model will be a plane or a hyperplane.

6 Linear classification - Perceptron Assume that data is linearly separable. For a data point x=(x 1,..., x d ) with attributes x 1 to x d (in d dimensional space) and class value y: x is bankrupt if x is not bankrupt if d j=1 d j=1 w j x j > We can write the hypothesis set as: w x j j < threshold threshold h(x) d = sign j= 1 w j x j threshold

7 Perceptron Model d h(x) = sign w j= 1 j x j threshold Each choice of w j and the threshold will correspond to a separating line.

8 Perceptron Model A change in notation: replace (-threshold) with a weight w 0. d h( x) = sign + w j x j w j= 1 This requires to introduce an artificial attribute (coordinate) x 0 =1 for each data point. x=(x 0, x 1,..., x d ) Now we have a simplified formula. Perceptron in vector form is: w T x is called the signal. h(x) 0 d = sign w j x j j= 0 h ( x) = sign ( T w x)

9 Perceptron Learning Algorithm (PLA) How to find a linear separator? Algorithm Pick initial weight vector, e.g. w=[.1,,.1] Repeat until all points get correctly classified. In each iteration, for each training data point x i Check sign(w T x i ) = y i If true, data point x i is correctly classified. If not true, x i is misclassified. Pick a misclassified point x i and update the weight vector proportional to y i.x i, that is, make w w + y i x i

10 Discussion Changes in w for the different misclassified points interfere with each other. So, it will not be the case that one pass through the points will produce a correct weight vector. In general, we will have to go around multiple times. The algorithm is guaranteed to terminate with the weights for a separating hyperplane as long as the data is linearly separable. The proof of this fact is beyond the course scope.

11 The Pocket Algorithm If the data is not linearly separable, then PLA gets into an infinite loop and never converges. Pocket algorithm is a simple modification to PLA Set the number of iterations to a finite number (e.g. 1000). In each iteration, compute the error (i.e. number of misclassified points). Always keep the best separator (with smallest error) produced so far.

The separator at the end of the loop is [0.4, 0.94, -2.

12 Perceptron algorithm, Bankruptcy data 49 iterations through the bankruptcy data for the algorithm to stop. The separator at the end of the loop is [0.4, 0.94, -2.2] We can pick some small "rate" constant to scale the change to w. This is called eta. w = w + η.y i x i

13 Non-linearly Separable Data

14 Transform the data nonlinearly Very easy now to divide X's from O's using a linear separator. A possible transformation: (x 1,x 2 ) (x 12, x 22 ) Every point in the initial space will be transformed to a point in the new space.

15 None Linear transformation None-linear transforms are very useful and give us the ability to linearly separate any data points by moving to a new space with sufficiently larger dimensions. This is a tool to get more sophisticated surfaces in the initial space while we are still able to use linear techniques. Any transformation xz on data preserves the linearity of the model, w T x. This is because the signal is linear in weights and by transforming constant data points, we do not harm the linearity of weights.

16 Method: Main Idea Move all training data points (vectors) into another space (potentially with higher dimensions) using some transformation function φ z 1 = φ(x 1 ), z 2 = φ(x 2 ),..., z n = φ(x n ). Find linear separator in the new space, i.e. considering new vectors z 1, z 2,..., z n To classify a new data point, apply the classifier on Z space on the transformed data point. Note: The choice of the transformation function plays an important role in the performance of the final linear model (classifier).

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training