Non-Parametric Modeling

Size: px

Start display at page:

Download "Non-Parametric Modeling"

Nickolas Holland
5 years ago
Views:

1 Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani

2 Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule 2

3 Introduction Estimation of arbitrary density functions Parametric density functions cannot usually fit the densities we encounter in practical problems. e.g., parametric densities are unimodal. Non-parametric methods don't assume that the model (from) of underlying densities is known in advance Non-parametric methods (for classification) can be categorized into Generative Estimate ( ) from using non-parametric density estimation Discriminative Estimate ( ) from 3

4 Histogram Approximation Idea Histogram approximation of an unknown pdf ( ) ( )/ = 1,, ( ): number of samples (among n ones) lied in the bin The corresponding estimated pdf: = ( ) h Mid-point of the bin 4

5 Non-Parametric Density Estimation Probability of falling in a region : = R (smoothed version of ) We can estimate smoothed by estimating : = : a set of samples drawn i.i.d. according to The probability that of the samples fall in R: = 1 = This binomial distribution peaks sharply about the mean: as an estimate for More accurate for larger 5

6 Non-Parametric Density Estimation Assumptions: is continuous and the region enclosing is so small that is near constant in it: = R = Let approach zero if we want to find instead of the averaged version. 6

7 Necessary conditions for converge is the estimate of using samples: : the volume of region around : the number of samples falling in the region Necessary conditions for converge of to : 7 lim =0 lim = lim / = 0

8 Convergence of to 8

9 Non-parametric Density Estimation: Main Approaches Two approaches of satisfying conditions: is as a function of and constant for all (e.g., =1/ ) Number of points falling inside the volume can vary from point to point as a function of and constant for all (e.g., = ) Volume grows until it contains neighbors of 9 We are estimating density at the center of the circle

10 Parzen Windows Extension of histogram idea: -dimensional space is divided into hyper-cubes with length of side h (i.e., volume h ) Hypercube as a simple window function: = = = 1 ( ) 0.. = () = h () 1 1/2 1/2 1/2 1/2 number of samples in the hypercube around 1 10

11 Window Function = 1 1 () h = 1 = 1 h () Necessary conditions for window function to find legitimate density function: 11 () 0 =1 Windows are also called kernels or potential functions.

12 Density estimation: non-parametric adapted from: Shakhnarovich s slides

13 Window Function: Width Parameter Choosing h : Too large: low resolution Too small: much variability () For unlimited, by letting slowly approach zero as increases () converges to () 13

14 Parzen Window: Example = 0,1 = (0,1) h =h/ 14

15 Parzen Window: Example = 0,1 h =h / 15

16 Window Width For fixed,asmallerh results in higher variance while a larger h leads to higher bias. For a fixed h, the variance decreases as the number of sample points tends to infinity for a large enough number of samples, the smaller the h the better the accuracy of the resulting estimate In practice, where only a finite number of samples is possible, a compromise between h and must be made. h can be set using techniques like cross-validation 16

17 Practical Issues: curse of dimensionality Large is necessary to find an acceptable density estimation in high dimensional feature spaces must grow exponentially with the dimensionality. If equidistant points are required to densely fill a one-dim interval, points are needed to fill the corresponding -dim hypercube. We need an exponentially large quantity of training data to ensure that the cells are not empty Also complexity requirements 17

18 Parzen Window & Classification If () () otherwise decide () () ( ) ( ) decide = (=1,2): number of training samples in class : set of training samples labels as For large, it needs both high time and memory requirements 18

19 Parzen Window & Classification: Example Smaller h larger h 19

20 -Nearest Neighbor Estimation Cell volume is a function of the point location To estimate (), letthecellaround grow until it captures samples called nearest neighbors of. is a function of Two possibilities can occur: high density near cell will be small which provides a good resolution low density near cell will grow large and stop until higher density regions are reached 20

21 -Nearest Neighbor Estimation Necessary and sufficient conditions of convergence: lim lim / 0 A family of estimates by setting and choosing different values for = / 1/() is a function of 21

22 -Nearest Neighbor Estimation: Example Discontinuities in the slopes =3 =5 22

23 -Nearest Neighbor Estimation: Example 1 = 2 () 23

24 -Nearest Neighbor Estimation: Parameter For classification, a proper value of techniques such as cross-validation can be found using 24

25 -Nearest Neighbor Estimation & Classification If ( ) ( ) otherwise decide decide = (=1,2): number of training samples in class : set of training samples labels as shows the hypersphere volumes : the radius of the hypersphere centered at containing samples of the class ( = 1,2) may not necessarily be the same for all classes 25

26 Estimation of Posterior Probabilities To estimate We place a cell of volume around and captures samples samples labeled, = / =, Size of the cell, = Parzen window: e.g., =1/ -Nearest Neighbor: e.g., = fraction of samples within the cell labeled If is large and the cell sufficiently small, the performance will approach the best possible 26

27 k-nearest-neighbor (knn) Rule To classify : Find nearest training samples to Out of these samples, identify the number of samples belonging to class (=1,,). Assign to the class where =argmax,, It can be considered as a discriminative method. 27

28 k-nearest-neighbor (knn) Rule 2 1? =[, ] 28

29 Nearest-Neighbor Rule Nearest-Neighbor: knn with =1 It leads to an error rate greater than the Bayes error If islarge(unlimited),theerrorrateofthenearest-neighbor classifier is never worse than twice the Bayes rate If, it is always possible to find sufficiently close so that ( ) ( ) If ( ) 1, then the nearest neighbor selection is almost always the same as the Bayes selection 29

30 Nearest-Neighbor Classifier: Example Voronoi tessellation: Each cell consists of all points closer to a given training point than to any other training points All points in a cell are labeled by the category of the corresponding training point. 30

31 Bound on the Nearest Neighbor Error Rate class problem given infinite training samples = lim () : Nearest Neighbor error rate : The Bayes error rate

32 Bound on the k-nearest Neighbor Error Rate bounds on k-nn error rate for different values of (infinite training data) As increases, the upper bounds get closer to the lower bound (the Bayes Error rate). 32 When, the two bounds meet and k-nn rule becomes optimal.

33 Non-Parametric Density Estimation: Summary Generality of distributions With enough samples, convergence to an arbitrarily complicated target density can be obtained. The number of required samples must be very large to assure convergence grows exponentially with the dimensionality of the feature space These methods are very sensitive to the choice of window width There may be severe requirements for computation time and storage (needed to save all training samples). 33

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest