Instance and case-based reasoning

Size: px

Start display at page:

Download "Instance and case-based reasoning"

Dwain Day
5 years ago
Views:

1 Instance and case-based reasoning ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado Instance-based Learning Instance-based Learning approximates real-values or discrete-valued functions. k-nearest Neighbour New instances are related to similar instances in memory. Key difference: Different approximation per instance queried. Constructs only local approximations not for entire instance space. Instance-based learning can use complex, symbolic representations. Typical examples are help desk, reasoning about legal cases, complex scheduling... Instance-based Learning - disadvantages High Classification Cost. Indexing approaches become very important. Needs to consider all instances stored to compare with the new one (specifically in KNN). The importance of being Lazy Lazy learning: Generalising beyond training examples is postponed until a new instance must be classified The importance of being lazy : instead of estimating the target function once for the whole instance space, estimate it locally and differently for each new instance A family of related techniques: k-nearest Neighbour Locally weighted regression

2 Radial basis functions Case-based reasoning Lazy vs. Eager learning Instance-Based Learning Two classification approaches: Nearest neighbour: K-Nearest Neighbors doesnt learn an explicit mapping f from the training data. Given query instance x q, first locate nearest training example x n, then estimate ˆf(x q ) f(x n ) k-nearest neighbour: Given x q, take vote among its k nearest neighbours, if discrete-valued target function ˆf(x q ) arg max v V k δ(v, f(x i )) take mean of f values of k nearest neighbours, if the target function is real-valued. i= ˆf(x q ) f(x i) k Representation All instances correspond to points in the n-dimensional space R n As before, an instance x will be described by a feature vector: i= a (x), a 2 (x),..., a n (x) Nearest neighbours can be defined in terms of standard Euclidean distance (but other measures are possible): d(x i, x j ) = n (a r (x i ) a r (x j )) 2 The k-nearest neighbour algorithm r= Consider learning a discrete-valued function with signature f : R n V, for a finite V = {v,..., v n } Training algorithm: For each training example x, f(x), add example to tlist Classification Algorithm: Input: x q, a query instance to be classified Let x,..., x k be the nearest instances to x q in tlist Return ˆf(x q ) arg max v V i= δ(v, f(x i)) where δ(a, b) = if a = b, and otherwise (Kronecker function) 2

3 Decision Surfaces xq What would a nearest neighbour classify x q? What would a 5-nearest neighbour algorithm do? What does the decision surface (for the -NN classifier) look like? Voronoi Diagrams Distance-Weighted knn One might want to weight nearer neighbours more heavily... For the discrete case: ˆf(x q ) arg max v V k w i δ(v, f(x i )) () where w i d(x q,x i) and d(x 2 q, x i ) is distance between x q and x i. If d(x q, x i ) = assign ˆf(x q ) def = f(x i ) For real-valued target functions?: i= ˆf(x q ) i= w if(x i ) i= w i (2) Now we could use all training examples instead of just k local method, global method, Shepard s method Locally Weighted Regression k-nn forms local approximation to f for each query. point x q So why not form an explicit approximation ˆf(x) for region surrounding x q? Ways in which this could be done: Fit linear function to k nearest neighbours Fit quadratic,... Produces piecewise approximation to f 3

4 ( N.B.: Locally Weighted Regression: Local: based only on data near x q Weighted: contribution of each instance weighted by its distance to x q Regression: approximates real-valued functions ) A global approximation Consider approximating f near x q by linear function ˆf(x) = w + w a (x) w n a n (x) One could use gradient descent to find the coefficients to minimise the error in fitting ˆf to training set D: E = (f(x) 2 ˆf(x)) 2 x D The gradient descent training rule: w j = η x D(f(x) ˆf(x))a j (x) (Recall the LMS algorithm from Lecture 3) Other ways of minimising error Gradient descent isn t the only way to find the coefficients for, say, One could also use... ˆf(x) = w + w a (x) w n a n (x) a variety of search methods such as simulated annealing, genetic algorithms, etc But first... the global approximation given by gradient descent (or GA, etc) needs to be adapted... Lazy and Eager Learning Lazy: wait for query before generalizing k-nearest Neighbour, Case based reasoning Eager: generalize before seeing query Radial basis function (RBF) networks, ID3, Backpropagation, Naive Bayes,... Does it matter? Eager learner must create global approximation Lazy learner can create many local approximations if they use same H, lazy can represent more complex functions (e.g., consider H = linear functions) 4

5 Presentation based on (Mitchell, 997, ch. 5). References Mitchell, T. M. (997). Machine Learning. McGraw-Hill. Sycara, K., Navin Chandra, D., Guttal, R., Koning, J., and Narasimhan, S. (992). CADET: a case-based synthesis tool for engineering design. International Journal for Expert Systems, 4(2): Yang, Y. (994). Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In Proceedings of SIGIR-94, 7th ACM International Conference on Research and Development in Information Retrieval, pages 3 22, Dublin, Ireland. ACM Press. 5

Instance Based Learning. k-nearest Neighbor. Locally weighted regression. Radial basis functions. Case-based reasoning. Lazy and eager learning

Instance Based Learning. k-nearest Neighbor. Locally weighted regression. Radial basis functions. Case-based reasoning. Lazy and eager learning Instance Based Learning [Read Ch. 8] k-nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning 65 lecture slides for textbook Machine Learning,