Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Similar documents
Data Mining and Machine Learning: Techniques and Algorithms

9 Classification: KNN and SVM

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Machine Learning and Pervasive Computing

CISC 4631 Data Mining

Instance-based Learning

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining. Lecture 03: Nearest Neighbor Learning

CS 584 Data Mining. Classification 1

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Data Mining Practical Machine Learning Tools and Techniques

Geometric data structures:

Data Mining Practical Machine Learning Tools and Techniques

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Data Mining Classification - Part 1 -

CS Machine Learning

Nearest Neighbor Methods

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Data Preprocessing. Supervised Learning

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Classification with Decision Tree Induction

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Data Mining and Analytics

Classification and Regression

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Based on Raymond J. Mooney s slides

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Introduction to Machine Learning

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Decision Trees: Discussion

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

CSE4334/5334 DATA MINING

Data Mining and Data Warehousing Classification-Lazy Learners

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

CSC411/2515 Tutorial: K-NN and Decision Tree

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Nearest Neighbor Predictors

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Extra readings beyond the lecture slides are important:

Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

Unsupervised Learning

Classification. Instructor: Wei Ding

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Nearest Neighbor Classification. Machine Learning Fall 2017

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Advanced learning algorithms

CS 340 Lec. 4: K-Nearest Neighbors

Supervised Learning: Nearest Neighbors

Distribution-free Predictive Approaches

Problem 1: Complexity of Update Rules for Logistic Regression

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Chapter 5 Efficient Memory Information Retrieval

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Part I. Instructor: Wei Ding

CSC 411: Lecture 05: Nearest Neighbors

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

BITS F464: MACHINE LEARNING

Machine Learning in Real World: C4.5

Chapter 4: Algorithms CS 795

Instance-Based Learning. Goals for the lecture

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Instance-Based Learning.

7. Decision or classification trees

1) Give decision trees to represent the following Boolean functions:

Chapter 4: Algorithms CS 795

Data Mining Algorithms: Basic Methods

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Nearest Neighbors Classifiers

Slides for Data Mining by I. H. Witten and E. Frank

Search. The Nearest Neighbor Problem

CS6716 Pattern Recognition

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

A Lazy Approach for Machine Learning Algorithms

Topic 1 Classification Alternatives

Data Mining Part 5. Prediction

Lecture 3. Oct

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Machine Learning Techniques for Data Mining

CMSC 754 Computational Geometry 1

Data Preprocessing. Slides by: Shree Jaswal

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Nonparametric Methods Recap

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

ISSUES IN DECISION TREE LEARNING

Basic Data Mining Technique

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Ensemble Methods, Decision Trees

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

An Average-Case Analysis of the k-nearest Neighbor Classifier for Noisy Domains

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Transcription:

Data Mining and Machine Learning Instance-Based Learning Rote Learning k Nearest-Neighbor Classification Prediction, Weighted Prediction choosing k feature weighting (RELIEF) instance weighting (PEBLS) efficiency kd-trees IBL and Rule Learning NEAR: Nearest Nested Hyper- Rectangles RISE Acknowledgements: Some slides adapted from Tom Mitchell Eibe Frank & Ian Witten Kan, Steinbach, Kumar Ricardo Gutierrez-Osuna Gunter Grieser

Instance Based Classifiers No model is learned The stored training instances themselves represent the knowledge Training instances are searched for instance that most closely resembles new instance lazy learning Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Data Mining and Machine Learning Instance-Based Learning 2

Rote Learning Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 07-30 mild rain high false yes today cool sunny normal false yes Data Mining and Machine Learning Instance-Based Learning 3

Instance Based Classifiers No model is learned The stored training instances themselves represent the knowledge Training instances are searched for instance that most closely resembles new instance lazy learning Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest-neighbor classifier Uses k closest points (nearest neigbors) for performing classification Data Mining and Machine Learning Instance-Based Learning 4

Nearest Neighbor Classification Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 12-30 mild rain high false yes tomorrow mild sunny normal false yes Data Mining and Machine Learning Instance-Based Learning 5

Nearest Neighbor Classifier K-Nearest Neighbor algorithms classify a new example by comparing it to all previously seen examples. The classifications of the k most similar previous cases are used for predicting the classification of the current example. Training The training examples are used for providing a library of sample cases re-scaling the similarity function to maximize performance New Example? Classification Data Mining and Machine Learning Instance-Based Learning 6

Nearest Neighbors X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor k nearest neighbors of an example x are the data points that have the k smallest distances to x Data Mining and Machine Learning Instance-Based Learning 7

Prediction The predicted class is determined from the nearest neighbor list classification take the majority vote of class labels among the k-nearest neighbors y=max c i=1 can be easily be extended to regression predict the average value of the class value of the k-nearest neighbors y= 1 k k y i=1 i k { 1 if y i =c 0 if y i c =max k c i=1 1 y i =c indicator function Data Mining and Machine Learning Instance-Based Learning 8

Weighted Prediction Often prediction can be improved if the influence of each neighbor is weighted k w i=1 i y i y= k i=1 w i Weights typically depend on distance, e.g. 1 w i = d x i, x 2 Note: with weighted distances, we could use all examples for classifications ( Inverse Distance Weighting) Data Mining and Machine Learning Instance-Based Learning 9

Nearest-Neighbor Classifiers unknown example Require three things The set of stored examples Distance Metric to compute distance between examples The value of k, the number of nearest neighbors to retrieve To classify an unknown example: Compute distance to other training examples Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown example (e.g., by taking majority vote) Data Mining and Machine Learning Instance-Based Learning 10

Voronoi Diagram shows the regions of points that are closest to a given set of points boundaries of these regions correspond to potential decision boundaries of 1NN classifier Data Mining and Machine Learning Instance-Based Learning 11

Lazy Learning Algorithms knn is considered a lazy learning algorithm Defers data processing until it receives a request to classify an unlabelled example Replies to a request for information by combining its stored training data Discards the constructed answer and any intermediate results Other names for lazy algorithms Memory-based, Instance-based, Exemplar-based, Case-based, Experiencebased This strategy is opposed to eager learning algorithms which Compiles its data into a compressed description or model Discards the training data after compilation of the model Classifies incoming patterns using the induced model Data Mining and Machine Learning Instance-Based Learning 12

Choosing the value of k Data Mining and Machine Learning Instance-Based Learning 13

Choosing the value of k If k is too small sensitive to noise in the data (misclassified examples) If k is too large neighborhood may include points from other classes limiting case: k D all examples are considered largest class is predicted good values can be found e.g, by evaluating various values with cross-validation on the training data Data Mining and Machine Learning Instance-Based Learning 14

Distance Functions Computes the distance between two examples so that we can find the nearest neighbor to a given example General Idea: reduce the distance d (x 1, x 2 ) of two examples to the distances d A (v 1, v 2 ) between two values for attribute A Popular choices Euclidean Distance: straight-line between two points d x 1, x 2 = A d A v 1, A, v 2, A 2 Manhattan or City-block Distance: sum of axis-parallel line segments d x 1, x 2 = A d A v 1, A, v 2, A Data Mining and Machine Learning Instance-Based Learning 15

Distance Functions for Numerical Attributes Numerical Attributes: distance between two attribute values d A v 1, v 2 = v 1 v 2 Normalization: Different attributes are measured on different scales values need to be normalized in [0,1]: v i = v i min v j max v j min v j Note: This normalization assumes a (roughly) uniform distribution of attribute values For other distributions, other normalizations might be preferable e.g.: logarithmic for salaries? Data Mining and Machine Learning Instance-Based Learning 16

Distance Functions for Symbolic Attributes 0/1 distance d A v 1, v 2 ={ 0 if v 1=v 2 1 if v 1 v 2 Value Difference Metric (VDM) (Stanfill & Waltz 1986) two values are similar if they have approximately the same distribution over all classes (similar relative frequencies in all classes) sum over all classes the difference of the percentage of examples with value v 1 in this class and examples with value v 2 in this class d A v 1, v 2 = c n 1, c n 1 n 2,c used in PEBLS with k = 1 (Parallel Exemplar-Based Learning System; Cost & Salzberg, 1993) n 2 k k is a user-settable parameter (e.g., k=2) Data Mining and Machine Learning Instance-Based Learning 17

10 VDM Example Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes Class Refund Yes No Yes 0 3 No 3 4 Distance between values: d(refund=yes,refund=no) = 0/3 3/7 + 3/3 4/7 = 6/7 9 No Married 75K No 10 No Single 90K Yes Data Mining and Machine Learning Instance-Based Learning 18

10 VDM Example Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Class Marital Status Single Married Divorced Yes 2 0 1 No 2 4 1 Distance between values: d(single,married) = 2/4 0/4 + 2/4 4/4 = 1 d(single,divorced) = 2/4 1/2 + 2/4 1/2 = 0 d(married,divorced) = 0/4 1/2 + 4/4 1/2 = 1 Data Mining and Machine Learning Instance-Based Learning 19

Other Distance Functions Other distances are possible hierarchical attributes distance of the values in the hiearchy e.g., length of shortest path form v 1 to v 2 d (Canada, USA)=2,d (Canada,Japan)=4 Data Mining and Machine Learning Instance-Based Learning 20

Other Distance Functions Other distances are possible hierarchical attributes distance of the values in the hiearchy e.g., length of shortest path form v 1 to v 2 string values edit distance d (Virginia, Vermont )=5 Data Mining and Machine Learning Instance-Based Learning 21

Other Distance Functions Other distances are possible hierarchical attributes distance of the values in the hiearchy e.g., length of shortest path form v 1 to v 2 string values edit distance in general distances are domain-dependent can be chosen appropriately Distances for Missing Values not all attribute values may be specified for an example Common policy: assume missing values to be maximally distant Data Mining and Machine Learning Instance-Based Learning 22

Feature Weighting Not all dimensions are equally important comparisons on some dimensions might even be completely irrelevant for the prediction task straight-forward distance functions give equal weight to all dimensions Idea: use a weight for each attribute to denote its importance e.g., Weighted Euclidean Distance: d ( x 1, x 2 )= A w A d A (v 1, A, v 2, A ) 2 weights w A can be set by user or determined automatically Survey of feature weighting algorithms: Dietrich Wettschereck, David W. Aha, Takao Mohri: A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms. Artificial Intelligence Review 11(1-5): 273-314 (1997) Data Mining and Machine Learning Instance-Based Learning 23

RELIEF (Kira & Rendell, ICML-92) Basic idea: in a local neighborhood around an example x a good attribute A should allow to discriminate x from all examples of different classes (the set of misses) therefore the probability that the attribute has a different value for x and a miss m should be high have the same value for all examples of the same class as x (the set of hits) therefore the probability that the attribute has a different value for x and a hit h should be low try to estimate and maximize w A =Pr v x v m Pr v x v h where v x is the value of attribute A in example x this probability can be estimated via the average distance Data Mining and Machine Learning Instance-Based Learning 24

RELIEF (Kira & Rendell, ICML-92) 1.set all attribute weights w A = 0.0 2.for i = 1 to r ( user-settable parameter) select a random example x find h: h: nearest neighbor of of same class (near hit) m: nearest neighbor of of different class (near miss) for each attribute A w A w A 1 r d A m, x d A h, x where d A A (x,y) is is the distance in in attribute A between examples x and y (normalized to to [0,1]-range). Note: when used for feature weighting, all w A < 0.0 are set to 0 in the end. Data Mining and Machine Learning Instance-Based Learning 25

Learning Prototypes Only those instances involved in a decision need to be stored Noisy instances should be filtered out Idea: only use prototypical examples Data Mining and Machine Learning Instance-Based Learning 26

Learning Prototypes: IB-algorithms Case Study for prototype selection Aha, Kibler and Albert: Instance-based learning. Machine Learning,1991. IB1: Store all examples high noise tolerance high memory demands IB2: Store new example only if misclassified by stored examples low noise tolerance low memory demands IB3: like IB2, but maintain a counter for the number of times the example participated in correct and incorrect classifications use a significant test for filtering noisy examples improved noise tolerance low memory demands Data Mining and Machine Learning Instance-Based Learning 27

Instance Weighting Idea: we assign a weight to each instance instances with lower weights are always distant hence have a low impact on classification instance weight completely ignores this instance x Selecting instances is a special case of instance weighting Similarity function used in PEBLS (Cost & Salzberg, 1993) where w x 1 w x 1 w x =0 d x 1, x 2 = 1 w x1 w A d A v 1, v 2 k x 2 Number of times x has correctly predicted the class w x = Number of times x has been used for prediction if instance x predicts well if instance x does not predict well Data Mining and Machine Learning Instance-Based Learning 28 V2.1 J. Fürnkranz

Efficiency of NN algorithms very efficient in training only store the training data not so efficient in testing computation of distance measure to every training example much more expensive than, e.g., rule learning Note that knn and 1NN are equal in terms of efficiency retrieving the k nearest neighbors is (almost) no more expensive than retrieving a single nearest neighbor k nearest neighbors can be maintained in a queue Data Mining and Machine Learning Instance-Based Learning 29

Finding nearest neighbors efficiently Simplest way of finding nearest neighbour: linear scan of the data classification takes time proportional to the product of the number of instances in training and test sets Nearest-neighbor search can be done more efficiently using appropriate data structures kd-trees ball trees Data Mining and Machine Learning Instance-Based Learning 30

kd-trees common setting (others possible) each level corresponds to one of the attributes order of attributes can be arbitrary, fixed, and cyclic each level splits according to its attribute ideally use the median value (results in balanced trees) often simply use the value of the next example attributes Data Mining and Machine Learning Instance-Based Learning 31 V2.1 J. Fürnkranz

Building kd-trees incrementally Big advantage of instance-based learning: classifier can be updated incrementally Just add new training instance after it arrives! Can we do the same with kd-trees? Heuristic strategy: Find leaf node containing new instance If leaf is empty place instance into leaf Else split leaf according to the next dimension Alternatively: split according to the longest dimension idea: preserve squareness Tree should be re-built occasionally e.g., if depth grows to twice the optimum depth Data Mining and Machine Learning Instance-Based Learning 32

Using kd-trees: example The effect of a kd-tree is to partition the (multi-dimensional) sample space according to the underlying data distribution finer partitioning in regions with high density coarser partitioning in regions with low density For a given query point descending the tree to find the data points lying in the cell that contains the query point examine surrounding cells if they overlap the ball centered at the query point and the closest data point so far recursively back up one level and check distance to the split point if overlap also search other branch only a few cells have to be searched Data Mining and Machine Learning Instance-Based Learning 33 V2.1 J. Fürnkranz

Using kd-trees: example Assume we have example [1,5] Unweighted Euclidian distance d e 1, e 2 = A d A e 1, e 2 2 sort the example down the tree: ends in the left successor of [4,7] compute distance to example in the leaf d [1,5],[4,7] = 1 4 2 5 7 2 = 13 now we have to look into rectangles that may contain a nearer example remember the difference to the closest example d min = 13. 1 7 5 4 1 4 d min Data Mining and Machine Learning Instance-Based Learning 34

Using kd-trees: example go up one level (to example [4,7]) compute distance to the closest point on this split (difference only on X) 1 7 d ([1,5], [4,*])= (4 1) 2 +0 2 =3 If the difference is smaller than the current best difference d ([1,5], [4,*])=3< 13=d min then we could have a closer example in the right subtree of [4,7] which in our case does not contain any example done. d min Data Mining and Machine Learning Instance-Based Learning 35 V2.1 J. Fürnkranz

Using kd-trees: example go up one level (to example [5,4]) compute distance to the closest point on this split (difference only on Y) 1 7 d [1,5],[*,4] = 0 2 5 4 2 =1 if the difference is smaller than the current best difference d ([1,5], [*,4 ])=1< 13=d min then we could have a closer example in area Y < 4. go down the other branch and repeat recursively. d min Data Mining and Machine Learning Instance-Based Learning 36

Using kd-trees: example go down to leaf [2,3] compute distance to example in this leaf d ([1,5],[2,3])= (1 2) 2 +(5 3) 2 = 5 if the difference is smaller than the current best difference 1 7 d ([1,5], [2,3])= 5< 13=d min then the example in the leaf is the new nearest neighbor and d min = 5 13 this is recursively repeated until we have processed the root node no more distances have to be computed Data Mining and Machine Learning Instance-Based Learning 37. d min d min

Ball trees Problem in kd-trees: corners Observation: There is no need to make sure that regions don't overlap We can use balls (hyperspheres) instead of hyperrectangles A ball tree organizes the data into a tree of k-dimensional hyperspheres Normally allows for a better fit to the data and thus more efficient search Data Mining and Machine Learning Instance-Based Learning 38

Nearest Hyper-Rectangle Nearest-Neighbor approaches can be extended to compute the distance to the nearest hyper-rectangle a hyper-rectangle corresponds to a rule conditions are intervals along each dimension non-overlapping rectangles nested rectangles To do so, we need to adapt the distance measure distance of a point to a rectangle instead of point-to-point distance Data Mining and Machine Learning Instance-Based Learning 39

Rectangle-to-Point Distance A d A x, R d x, R =0 d x, R =d A x, R d B x, R d x, R =d B x, R d x, R = d A x, R d B x, R d B x, R d A x, R B Data Mining and Machine Learning Instance-Based Learning 40

Rectangle-to-Point Attribute Distance numeric Attributes distance of the point to the closest edge of the rectangle along this attribute (i.e., distance to the upper/lower bound of the interval) 0 if v min, AR v v max, AR d A v, v v R ={ max, A R if v v max, AR v min, AR v if v v min, AR if rule R uses symbolic attributes 0/1 distance if rule R uses v min, A R A v max, AR d A v, R = { 0 if v=v A R 1 if v v A R A=v A R as condition for attribute A as condition for attribute A One can also adapt other distances. RISE uses a version of the VDM. Data Mining and Machine Learning Instance-Based Learning 41

NEAR (Salzberg, 1991) 1. 1. randomly choose r r seed examples convert them them into into rules rules 2. 2. for for each example x choose rule rule R min =arg min R d x, R if if x x is is classified correctly by by R min min enlarge enlarge the the condition condition of of R min so min so that that x x is is covered covered for for each each numeric numeric attribute attribute enlarge enlarge the the interval interval if if necessary necessary for for each each symbolic symbolic attribute attribute delete delete the the condition condition if if necessary necessary else else if if x x is is classified incorrectly by by R min min add add example example x x as as a new new rule rule NEAR uses both instance and feature weighting d ( x, R)=w x A w A 2 d A ( x, R) 2 Data Mining and Machine Learning Instance-Based Learning 42

Instance and Feature Weighting in NEAR Instance Weighting as in PEBLS Feature Weights are computed incrementally if an example is incorrectly classified the weights of all matching attributes are increased by a fixed percentage (20%) this has the effect of moving the example farther away along these dimensions the weights of all attributes that do not match are decreased by a fixed percentage (20%) if an example is correctly classified do the opposite (decrease matching and increase non-matching weights analogously) Data Mining and Machine Learning Instance-Based Learning 43

Second Chance Heuristic An improved version used a Second Chance Heuristic if the nearest rule did not classify correctly, try the second one if this one matches expand it to cover the example if not add the example as a new rule this can lead to the generation of nested rules i.e., rectangles inside of other rectangles at classification time, use the smallest matching rectangle but this did not work well (overfitting?) such nested rules may be interpreted as rules with exceptions Data Mining and Machine Learning Instance-Based Learning 44

RISE (Domingos, 1996) (Rule Induction from a Set of Exemplars) 1. 1. turn turn each example into into a rule rule resulting in in a theory T 2. 2. repeat for for each rule rule R in in T i. i. choose uncovered example x min =arg min x d ( x, R) ii. ii. R' R' = minimalgeneralisation(r,x min ) min ) iii. iii. replace R with with R' R' if if this this does does not not decrease the the accuracy of of T iv. iv. delete R' R' if if it it is is already part part of of T (duplicate rule) rule) 3. 3. until no no further increase in in accuracy RISE uses the simple distance function d x, R = A d A x, R k Data Mining and Machine Learning Instance-Based Learning 45

RISE (Domingos, 1996) Classification of an example: use the rule that is closest to the example if multiple rules have the same distance, use the one with the highest Laplace-corrected precision Leave-one-out estimation of accuracy of a theory: For classifying an example, the rule that encodes it is ignored but only if it has not been generalized yet can be computed efficiently if each examples remembers the distance to the rule by which it is classified if a rule is changed, go once through all examples and see if the new rule classifies any examples that were classified by some other rule before count the improvements (+1) or mistakes (-1) only for those examples, and see whether their sum is > 0 or < 0. Data Mining and Machine Learning Instance-Based Learning 46

Differences NEAR and RISE NEAR focuses on examples incremental training instance weighted and feature-weighted Euclidean distance tie breaking using the smallest rule RISE focuses on rules batch training straight-forward Manhattan distance tie breaking with Laplace heuristic Data Mining and Machine Learning Instance-Based Learning 47

Discussion Nearest Neighbor methods are often very accurate Assumes all attributes are equally important Remedy: attribute selection or weights Possible remedies against noisy instances Take a majority vote over the k nearest neighbors Removing noisy instances from dataset (difficult!) Statisticians have used k-nn since early 1950s If n and k/n 0, error approaches minimum can model arbitrary decision boundaries...but somewhat inefficient (at classification time) straight-forward application maybe too slow kd-trees become inefficient when number of attributes is too large (approximately > 10) Ball trees work well in higher-dimensional spaces several similarities with rule learning Data Mining and Machine Learning Instance-Based Learning 48