DATA DEPTH AND ITS APPLICATIONS IN CLASSIFICATION

Size: px

Start display at page:

Download "DATA DEPTH AND ITS APPLICATIONS IN CLASSIFICATION"

Lesley Parker
5 years ago
Views:

1 DATA DEPTH AND ITS APPLICATIONS IN CLASSIFICATION Ondrej Vencalek Department of Mathematical Analysis and Applications of Mathematics Palacky University Olomouc, CZECH REPUBLIC Abstract Concept of data depth provides one possible approach to the analysis of multivariate data. Among other it can be also used for classification purposes. The present paper summarizes main ideas how the depth can be used in classification. Two step scheme of the typical depth-based classifiers is discussed. Different choices of the depth function and classification procedure used in this scheme lead to different classifiers with different properties. Different distributional assumptions might lead to the preference of either global or local depth functions and classification procedures. 1 Introduction Depth function is basically any function which provides ordering (or quasi-ordering) of points in multidimensional space. Existence of ordering enables generalization of many nonparametric techniques proposed for univariate variables and thus the data depth creates one possible basis of nonparametric multivariate data analysis. Data depth has been also applied in classification. The aim of classification is to create a rule for allocation of new observations into one of two (or more) groups. Several such rules, known as classifiers, based on data depth were proposed since The aim of the present paper is to summarize main ideas how the depth can be used in classification and to present new trends in this area. 2 Concept of data depth There are several depth functions commonly used in applications halfspace depth, projection depth, spatial depth, Mahalanobis depth, zonoid depth, simplicial depth and some others. Their survey can be found in [8]. We recall here the first three of the depth functions listed above since they are used most frequently in context of classification: The halfspace depth of a point x in R d with respect to a probability measure P is defined as the minimum probability mass carried by any closed halfspace containing x, that is D(x, P ) = inf H { } P(H) : H a closed halfspace in R d : x H. 74

2 The projection depth of a point x in R d with respect to a probability measure P is defined as D(x, P ) = 1 u T x µ Pu, where O(x, P ) = sup, 1 + O(x, P ) u =1 σ Pu where µ Pu is some location and σ Pu is some scale measure of distribution of random vector u T X, usually the median and MAD. The spatial depth (also called L 1 -depth) of a point x in R d with respect to a probability measure P with variance matrix Σ is defined as Σ 1/2 (x X) D(x, P ) = 1 E Σ 1/2. (x X) All three depth functions listed above have desirable properties like affine invariance, maximality at a point of symmetry (if the distribution is symmetric in some sense, e.g. angularly), monotonicity on rays from the point with the maximal depth so called deepest point, which can be considered as multivariate analogy to median. See [10] for the discussion of desirable properties of the depth functions. The very important difference among the considered depth functions consists in different behaviour of their empirical versions. While the empirical halfspace depth is equal to zero for any point which lies outside of the convex hull of the data, the empirical projection depth (as well as empirical spatial depth) are nonzero everywhere. The depth of a point is a characteristic of the point specifying its centrality or outlyingness with respect to the considered distribution. Since the whole distribution is considered, the depth is said to be a global characteristic of the point. However, in recent years there have been attempts to localize depth, see [7] or [3]. Later we will discuss importance of these attempts for classification purposes. 3 Maximal depth classifier Let us first shortly recall the two-classes classification problem. We consider two unknown absolutely continuous probability distributions P 1 and P 2 on R d. Independent random samples from these distributions are available. Together they constitute so called training set. Empirical distributions based on the training set are denoted P 1 and P 2. The class to which the observation x is assigned is denoted by d(x). The maximal depth classifier the very first depth-based classifier is simple: d(x) = arg max i=1,2 D(x; P i ) (1) The points close to the centre (multivariate median) of some distribution have high depth with respect to this distribution and it seems to be natural to classify them to 75

3 Figure 1: Scheme of typical depth-based classifier. this distribution. The idea of the maximal depth classifier is thus in accordance to common sense. However Ghosh and Chaudhuri [2] proved that the maximal depth attains minimal possible probability of misclassification (known as Bayes risk) only in very special cases they showed optimality when the considered distributions are elliptically symmetric with the density decreasing from the centre, differing only in location and having equal prior probabilities. The optimality is lost even if only one of these assumptions is not fulfilled. 4 Advanced depth-based classifiers The paper by Ghosh and Chaudhuri [2] started the search for depth-based classifiers which would be applicable in broader class of distributional settings. Typical depthbased classifier can be described as a two-steps procedure: 1. The first step consists in computation of depths of the new observation x with respect to both parts of the training set. Each point is characterized by a pair of depths, these pairs lies in so called DD-space (depth-versus-depth space). Typically the DD-space is subset of [0, 1] [0, 1] R 2 and thus the first step can be usually considered as reduction of dimensionality from R d to the compact subset of R 2. This step is connected with the question Which depth function should be used? 2. The second step consists in application of some classification procedure in the DDspace. This step is connected with the question Which classification procedure should be applied in the DD-space? The scheme of typical depth-based classification procedure is shown in Figure 1. The difference among the classifiers proposed in literature consists mainly in different answers to the two questions connected with the two steps different depth functions can be applied in the first step and different classification procedures can be applied in the second step. 4.1 Which depth function should be used The classifiers which use any global depth function perform well only if the considered distributions have some global properties like symmetry or unimodality. In more gen- 76

4 eral settings, for example in the case of multimodality or nonconvexity of levelsets of density, some local depth should be used - see e.g. [1] or [4]. 4.2 Which classification procedure should be applied in the DD-space Procedures that are applied in the DD-space can be also either global or local in nature. As the global we denote the procedures that take into account all points of the training set when constructing the classifier while the local procedures take into account only points of the training set close to the point which is classified. Let us mention at least some of the both global and local procedures: The DD-classifier proposed by Li et al. [6] belongs to the global depth-based classifiers. The idea of the classifier is to separate the groups in DD-space by line (or more generally by a polynom) to minimize number of errors when classifying points from the training set. The DD-alpha procedure proposed by Lange et al. [5] is another global depthbased classifier. Instead of the pair [D(x, P 1 ), D(x, P 2 )], it works with a vector z := [D(x, P 1 ), D(x, P 2 ), D(x, P 1 ) D(x, P 2 ), D(x, P 1 ) 2, D(x, P 2 ) 2 ]. Lange et al. proposed a heuristic for finding proper parameters which specify separating hyperplane given by the equation ad(x, P 1 ) + bd(x, P 2 ) + cd(x, P 1 )D(x, P 2 ) + dd(x, P 1 ) 2 + ed(x, P 2 ) 2 = 0. The procedure was successfully tested on many real datasets leading usually to low misclassification rates. The procedure was implemented in the R-package ddalpha. The classifier which uses kernel density estimation proposed by Ghosh and Chaudhuri [2] is a local depth-based classifier. In the case of elliptically symmetric distribution the classifier which minimizes probability of misclassification can be expressed as d(x) = arg max i=1,2 π iθ i (D(x, P i )), where θ i, i = 1, 2 are some unknown real functions that can be estimated by kernel density estimation. This procedure can be used even if the distributions are nonelliptical. The k depth-nearest neighbour classifier used by Vencalek [9] is another local depth-based procedure. The idea is quite simple to use the well known k- nearest neighbour procedure in the DD-space. The question is which metric should be used to measure distances between distinct points. 5 Conclusion The data depth provides basis for nonparametric inference on multidimensional data. It can be also applied in classification. Although one can expect broad applicability of 77

5 the nonparametric depth-based classifiers their optimality can be guaranteed usually only under some restrictive assumptions. Global depth functions and global classification techniques applied on the DD-space lead to good results only if the considered distributions have some global properties like unimodality. In more general settings localization is needed one can use local depth functions or local classifiers used in the DD-space. Acknowledgement The research was supported by the grant of Czech Science Foundation GA S. References [1] Dutta, S., Chaudhuri, P., Ghosh, A. K. (2012). Classification using Localized Spatial Depth with Multiple Localization. Communicated for publication. [2] Ghosh, A. K., Chaudhuri, P. (2005). On maximum depth and related classifiers. Scandinavian Journal of Statistics. Vol. 32, pp [3] Hlubinka, D., Kotik, L., Vencalek, O. (2010). Weighted data depth. Kybernetika, Vol. 46, No. 1, pp [4] Hlubinka, D., Venclek, O. (2013). Depth-Based Classification for Distributions with Nonconvex Support. Journal of Probability and Statistics [5] Lange, T., Mosler, K., Mozharovskyi, P. (2014). Fast nonparametric classification based on data depth. Statistical Papers. Vol. 55, No. 1, pp [6] Li, J., Cuesta-Albertos, J. A., Liu, R. (2012). DD-classifier: nonparametric classification procedure based on DD-plot. Journal of the American Statistical Association. Vol. 107, No. 498, pp [7] Paindaveine, D., Van Bever, G. (2013). From depth to local depth: a focus on centrality. Journal of the American Statistical Association. Vol. 108, No. 503, pp [8] Vencalek, O. (2011). Weighted data depth and depth based classification. PhD thesis. URL: venco2am/datadepth.html [9] Vencalek, O. (2014). New depth-based modification of the k-nearest neighbour method. SOP Transactions on Statistics and Analysis, [10] Zuo, Y., Serfling, R. (2000). General notion of statistical depth function. Annals of Statistics. Vol. 28, pp

NONPARAMETRIC CLASSIFICATION OF DIRECTIONAL DATA THROUGH DEPTH FUNCTIONS

NONPARAMETRIC CLASSIFICATION OF DIRECTIONAL DATA THROUGH DEPTH FUNCTIONS Houyem Demni 1 Amor Messaoud 1 Giovanni C.Porzio 2 1 Tunis Business School University of Tunis, Tunisia 2 University of Cassino,