Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Size: px

Start display at page:

Download "Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges."

Augustine Byrd
5 years ago
Views:

1 Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k too low means that the result can be sensitive to noise too high means that the neighborhood may include too many items from other classes choice of distance measure should have the property that a smaller distance means a greater likelihood of belonging to the same class specific measure may depend on the domain Euclidean distance becomes less discriminating as the number of attributes increases may need to scale attribute values to avoid having some dominate CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring Idea. find the k nearest neighbors to the item in the dataset choose the majority class of the neighbors as the class for the item Challenges. combining class labels majority vote can be problematic if the neighbors vary widely in distance as all are given the same weight weighted vote weights a neighbor's vote by its distance d commonly 1/d 2 classification of an item is relatively expensive must locate the k nearest neighbors there are improvements e.g. condensing (eliminating stored items), proximity graphs (to quickly find neighbors) CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring

easy to understand and implement fast to build model can perform well in many situations in spite of its simplicity Kernel used: Linear Kernel: K(x,y) = <x,y> 0.543 * outlook=sunny + 1.

1354 1 if input instance matches the specified value, 0 if not result < 0 denotes one class, > 0 the other a b < classified as 7 2 a = yes 3 2 b = no Correctly Classified Instances 9 64.

2 easy to understand and implement fast to build model can perform well in many situations in spite of its simplicity Kernel used: Linear Kernel: K(x,y) = <x,y> * outlook=sunny * outlook=overcast * outlook=rainy * temperature=hot * temperature=mild * temperature=cool * humidity=normal * windy=false if input instance matches the specified value, 0 if not result < 0 denotes one class, > 0 the other a b < classified as 7 2 a = yes 3 2 b = no Correctly Classified Instances % CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring Idea binary classification (two classes). find the hyperplane that maximizes the margin between the two classes margin = shortest distance between closest item to the plane and the plane H 1 does not separate H 2 separates with a small margin H 3 separates with maximum margin outlook temperature humidity windy play rainy mild normal FALSE yes overcast hot normal FALSE yes rainy cool normal FALSE yes rainy mild high FALSE yes overcast hot high FALSE yes overcast cool normal TRUE yes sunny cool normal FALSE yes overcast mild high TRUE yes sunny mild normal TRUE yes rainy mild high TRUE no rainy cool normal TRUE no sunny mild high FALSE no sunny hot high TRUE no sunny hot high FALSE no CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring

humidity windy play rainy mild normal FALSE yes -1.8843 overcast hot normal FALSE yes -1.7728 rainy cool normal FALSE yes -1.1417 rainy mild high FALSE yes -1.0008 overcast hot high FALSE yes -1.

3 Extensions. use a soft margin to handle errors allow some items to be on the wrong side of the plane with different kernel functions, can be used when classes aren't linearly separable outlook temperature humidity windy play rainy mild normal FALSE yes overcast hot normal FALSE yes rainy cool normal FALSE yes rainy mild high FALSE yes overcast hot high FALSE yes overcast cool normal TRUE yes -1 sunny cool normal FALSE yes overcast mild high TRUE yes sunny mild normal TRUE yes rainy mild high TRUE no rainy cool normal TRUE no sunny mild high FALSE no sunny hot high TRUE no sunny hot high FALSE no CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring Kernel used: Poly Kernel: K(x,y) = <x,y>^ * < > * X] * < > * X] * < > * X] * < > * X] * < > * X] * < > * X] * < > * X] * < > * X] * < > * X] * < > * X] <x,y> denotes the dot product of vectors x and y (dot product = sum of the pairwise product of the components) X is the input instance to be classified < > * X refers to K(< >,X) a b < classified as 6 3 a = yes 3 2 b = no Correctly Classified Instances % Extensions. for multiple classes, use pairwise classification (1-vs-1) or one-against-all method pairwise train separate classifiers for each pairing of classes pick the majority classification one-against-all train separate classifiers for each class to distinguish that class from everything else pick the highest-confidence classification CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring

4 SVM Multiple Classes Kernel used: Linear Kernel: K(x,y) = <x,y> Classifier for classes: Iris setosa, Iris versicolor * sepallength * sepalwidth * petallength * petalwidth Classifier for classes: Iris setosa, Iris virginica * sepallength * sepalwidth * petallength * petalwidth Classifier for classes: Iris versicolor, Iris virginica * sepallength * sepalwidth * petallength * petalwidth a b c Correctly Classified Instances % < classified as a = Iris setosa b = Iris versicolor c = Iris virginica CPSC 444: Artificial Intelligence Spring Idea binary classification (two classes). based on the posterior probability = probability of an occurrence given evidence assume attributes are independent idea example for yes outcomes, consider separately the probability of a rainy outlook, a mild temperature, a normal humidity, and not windy for independent attributes, the probability of all of these things happening at once is the product of the individual probabilities also factor in the likelihood of a yes outcome compare to no outcomes outlook temperature humidity windy rainy mild normal FALSE CPSC 444: Artificial Intelligence Spring has proven to be robust and accurate in many cases does not require large training sets not sensitive to the number of dimensions efficient training methods solid theoretical foundation Compute ln ( P(1 x) P(x 1) P(1) )=ln( P (0 x) P(x 0) P(0) ) P(i x) = probability of x belonging to class i P(i) = probability of an object belonging to class i the sign of the log indicates whether the probability of x belonging to class 1 is larger or smaller than the probability of x belonging to class 0 P(x i) = probability of x within class i if the components of x are independent, can estimate as the product of P(x j i) for each component x j of x sign of the result indicates the class challenge: if probabilities are estimated from the training set, it could be the case that P(x i i) = 0 solution: use Laplace smoothing use count+1 and total+number of possible values instead CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring

5 Example Class Attribute yes no (0.63) (0.38) ============================= outlook sunny overcast rainy [total] temperature hot mild cool [total] humidity high normal [total] windy TRUE FALSE [total] uses Laplace smoothing, so counts are increased by 1 and totals are increased by the number of possible values for the attribute (avoids 0s if there are no training instances with a given value) a b < classified as 7 2 a = yes 4 1 b = no Correctly Classified Instances % CPSC 444: Artificial Intelligence Spring for more than two classes compute P(x i) P(i) for each class i choose the class i that maximizes P(x i) P(i) CPSC 444: Artificial Intelligence Spring Example Example outlook temperature humidity windy play sunny hot high TRUE no sunny hot high FALSE no rainy mild high TRUE no sunny mild high FALSE no rainy mild high FALSE yes sunny mild normal TRUE yes overcast mild high TRUE yes overcast hot high FALSE yes rainy cool normal TRUE no sunny cool normal FALSE yes rainy mild normal FALSE yes rainy cool normal FALSE yes overcast cool normal TRUE yes overcast hot normal FALSE yes CPSC 444: Artificial Intelligence Spring Class Attribute soft hard none (0.22) (0.19) (0.59) ========================================== age young pre presbyopic presbyopic [total] spectacle prescrip myope hypermetrope [total] astigmatism no yes [total] tear prod rate reduced normal [total] a b c < classified as a = soft b = hard c = none Correctly Classified Instances % CPSC 444: Artificial Intelligence Spring

6 age spectacle-prescrip astigmatism Example tearprod-rate contact -lenses soft hard none young hypermetrope yes normal hard pre-presbyopic myope yes normal hard presbyopic myope yes normal hard young myope yes normal hard young myope no reduced none young myope yes reduced none young hypermetrope no reduced none pre-presbyopic myope no reduced none young hypermetrope yes reduced none pre-presbyopic myope yes reduced none pre-presbyopic hypermetrope no reduced none presbyopic myope no reduced none pre-presbyopic hypermetrope yes normal none presbyopic hypermetrope yes normal none pre-presbyopic hypermetrope yes reduced none presbyopic myope yes reduced none presbyopic hypermetrope no reduced none presbyopic hypermetrope yes reduced none presbyopic myope no normal none presbyopic hypermetrope no normal soft young myope no normal soft pre-presbyopic myope no normal soft young hypermetrope no normal soft pre-presbyopic CPSC 444: Artificial Intelligence hypermetrope Spring 2019 no normal soft easy to implement easy to interpret / understand the resulting classification can be applied to large datasets tends to perform well frequently used in text classification and spam filtering many extensions / modifications Observations. assumption of independence of attributes is not necessarily a problem can start with attribute selection to eliminate highly correlated attributes even with correlated attributes, results based on the independence assumption aren't necessarily wrong CPSC 444: Artificial Intelligence Spring for numeric values discretize can assume a normal distribution and compute probabilities based on that Ensemble Learning Idea. use multiple classifiers to improve on the performance of any one Class Attribute Iris setosa Iris versicolor Iris virginica (0.33) (0.33) (0.33) =============================================================== sepallength mean std. dev weight sum precision sepalwidth mean std. dev weight sum precision petallength mean std. dev weight sum precision CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring

7 AdaBoost AdaBoost works with weak classifiers (accuracy just above random chance) often decision stumps (single level decision trees) simple algorithm accurate often does not overfit (but it can) solid theoretical foundation CPSC 444: Artificial Intelligence Spring CPSC 444: Artificial Intelligence Spring AdaBoost Algorithm. assign equal weights (1/n) to each training instance n = size of the training set repeat for T rounds or until no further improvement train a classifier using the current training set weights if the classifier algorithm can't deal with weights directly, choose training elements in accordance with their weights test the classifier on the training examples and determine the error adjust the weights based on the error increase the weight of incorrectly classified examples use weighted majority voting amongst the classifiers from each round to determine the class weight is based on the error more accurate models are given higher weights CPSC 444: Artificial Intelligence Spring

Input: Concepts, Instances, Attributes

Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,