Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator

Size: px

Start display at page:

Download "Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator"

Austen Turner
6 years ago
Views:

1 Nonparametric Frontier Estimation: The Smooth Local Maximum Estimator and The Smooth FDH Estimator Hudson da S. Torrent 1 July, 2013 Abstract. In this paper we propose two estimators for deterministic production frontier models named as Max- Smooth and Smooth-FDH. The referred estimators have desirable properties: (i) they are smooth; (ii) they are not inherent biased as FDH and DEA estimators, (iii) they are simple to implement and (iv) they pursue less variance than estimators based on conditional moments. Max-Smooth estimator is constructed in three steps. The first step is a novel local maximum estimator. The second step is responsible to smooth out the variance presented in the first step. Then a third step is proposed to correct the position of the estimated frontier. Smooth-FDH is similar to Max-Smooth but with the first stage replaced by the FDH estimator. A cross-validation procedure for bandwidth selection is presented. We compare the relative performance of the proposed estimators with DEA and FDH in a simulation study. The results are very favorable to the proposed estimators even in a simulation setup that is very suitable for DEA estimator. Keywords and phrases. nonparametric frontier models; local smoothing; local linear regression. JEL Classifications. C14, C21 Area. Econometria 1 Department of Statistics and PPGE, Federal University of Rio Grande do Sul, Porto Alegre - Brazil, hudson.torrent@ufrgs.br.

2 1 Introduction Estimation of production frontiers and therefore efficiency (and inefficiency) of production processes has been the subject of a vast and growing literature since Farrell (1957). The problem can be stated as follows. Let x R p + be a set of inputs used to produce a set of outputs y R q +. So, there is a technological or production set defined as Ψ = {(x, y) R p+q + x can produce y}. A production frontier associated with Ψ is defined as ρ(x) = sup{y R q + (y, x) Ψ} for all x R p +. Thus, for given (x 0, y 0 ) Ψ, efficiency is measured by the distance between y 0 and ρ(x 0 ). In this problem we intend to estimate from a given random sample χ = {(x i, y i ), i = 1,..., n} an associated production frontier ρ( ) and efficiency measures for the observed production units 1. To solve these problems we can find in the literature two traditional nonparametric estimation procedures, Free Disposal Hull (FDH) estimator that was introduced by Deprins et al. (1984) and Data Envelopment Analysis (DEA), represented by Charnes et al. (1978). The idea is to estimate a production set from an observed random sample without being necessary to assume any restrictive parametric structure either on the production frontier ρ( ) or on the joint density of (X i, Y i ). Many works apply these methodologies. Gijbels et al. (1999) and Park et al. (2000) have obtained asymptotic distributions for DEA and FDH estimators, respectively. However, these estimators have some characteristics that may be undesirable. Both estimators consist of obtaining the smallest set that envelops all data with the restriction of free disposability (FDH and DEA) and convexity (DEA). Therefore the estimated set never exceeds the true frontier; hence the frontier estimators are inherently downwards biased. Moreover, FDH produces a discontinuous function that envelops the data and DEA produces a piecewise linear function. Martins-Filho and Yao (2007) propose a deterministic production frontier model and a nonparametric production frontier estimator in three stages. First step is estimating a conditional mean using local linear Kernel estimation. The second step follows Fan and Yao (1998), i.e., a local linear Kernel method is used to estimate the conditional variance function. The third and final step is an original estimator that is relative to their proposed production frontier model. The estimation is based on estimating the conditional variance (and the square root of the estimatives) in order to get the shape of the frontier. They derive the asymptotic normality and consistency of both production frontier and efficiency estimators under reasonable assumptions in the nonparametric context. This estimator shares the flexible nonparametric structure, moreover it has some extra desirable properties if compared to FDH and DEA estimators: i) the frontier estimator is a smooth function of input usage (not discontinuous neither piecewise linear) and ii) although the estimator envelops the data it is not inherently biased as FDH and DEA estimators. However, an undesirable result may be emerging in the second step of that estimator since the estimation procedure allows for a negative estimate of the variance. Furthermore, the referred estimator presents a pronounced variance since its based on conditional moments instead of conditional maximum. Therefore we propose in this paper an estimator that has advantages over the estimators cited above. Regarding 1 We consider here only the deterministic approach for efficiency problem. Within this approach all observations are supposed to lie inside the technological set. 1

3 DEA and FDH, our estimator is interesting since it incorporates the smoothness of nonparametric kernel methodology. Likewise, it has advantages over the estimator proposed by Martins-Filho and Yao (2007) since it is based on conditional maximum, which results in a lower variance of the proposed estimator. From here on we denoted our estimator by Max- Smooth. This estimator is characterized by three steps as follows. First we estimate the maximum output conditional on a given input value via a simple non-smooth nonparametric procedure. In the second stage we smooth out the variance of the first stage by making use of nonparametric kernel regression. Finally we correct the position of the estimated frontier, making it above all data points. We also consider an alternative version of Max-Smooth, denoted by FDH-Smooth. This version is characterized by substituting the first stage by the FDH estimator. This approach has the convenience of that just one bandwidth is needed to implement the estimator and it takes advantage of free disposability. Clearly, it is less flexible than Max-Smooth, which results in worse performance in some situations as we shall see in section 4. This paper is composed as follows. In the second section we present the model and the estimation procedure, giving details about our proposed estimators. In Section 3 we propose a cross-validation type procedure for bandwidth selection. In Section 4 the simulation study is presented in detail and we establish comparison among the proposed estimators and FDH and DEA. Finally, in Section 5 conclusions and final comments are stated. 2 Stochastic Model and Estimation Procedure In this section we present the stochastic model and the estimation procedure for that model. The problem may be viewed considering a firm that makes only one product from k inputs, that is, (x, y) R p + R +, where x describes p inputs used for production and y describes the output (one-output case) of a production unit. The production set is defined as previously. In a unique product case we have the following: Ψ = {(x, y) R p+1 + x can produce y} The production function or frontier associated with Ψ is ρ(x) = sup{y R q + (y, x) Ψ} for all x R p +. In practice Ψ and its frontier are unknown, so our prior interest is estimating this frontier from a set of observed firms, i.e., given a random sample of production units {(X i, Y i )} n i=1 that share a technology Ψ, obtaining estimates of ρ( ). By extension we are interested in constructing efficiency ranks and relative performance of production units. To see this, let (x 0, y 0 ) Ψ characterize the performance of a production unit and define 0 R 0 unit s (inverse) Farrell output efficiency measure. 2 From estimates of ρ we can obtain estimates of R 0. y0 ρ(x 0) 1 to be this We propose to estimate the frontier using nonparametric methods as follows. First we estimate the maximum output conditional on a given input value via a simple non-smooth nonparametric procedure. Denote the estimated 2 Note that if the production level y 0 associated with x 0 lies on the frontier function we have y 0 = ρ(x 0 ). The production process is efficient and R 0 = 1. 2

4 maximum output value for unit i as Yi max. The proposed estimator is ( Y max i = max 1 j n Y j I [ 1,1] ( Xj X i h 1 )), i = 1,, n; (1) where I is an indicator function and h 1 > 0 is a bandwidth. This first stage may be viewed as estimating those values of output that better represent more efficient units for a given level of input. The sequence Y max i follows. is then viewed as where E(u i X i ) 0. Now, let µ u := E(u i X i ). Therefore, Y max i = ρ(x i ) + u i, i = 1,, n; Y max i = m(x i ) + ɛ i, i = 1,, n; (2) where E(ɛ i X i ) E(u i µ u X i ) = 0, V ar(ɛ X i ) = σ 2 (X i ) and m(x i ) = ρ(x i ) + µ u. Equation 2 is suitable for nonparametric regression. We use the local linear Kernel estimator of Fan (1992) with regressand Y max i X i. That is, for any x R p + we obtain ˆm(x) ˆα where (ˆα, ˆβ) = arg min α,β n i=1 and regressors (Y max i α β(x i x)) 2 K h2 (X i x) (3) K( ) : R p R is a symmetric density function, K h (u) = (1/h)K(u/h) and h 2 > 0 is a bandwidth. Since our main interest lies on estimating ρ( ), we propose to estimate ˆµ u by Hence, we have for any x R p +, ˆµ u = max 1 i n (Y i ˆm(X i )), (4) ˆρ(x) = ˆm(x) ˆµ u. (5) Remark 1. The main purpose of the first step (eq. (1)) is to get the shape of frontier even though in a wrong position and without smoothness. Note that the proposed estimator is very flexible in the sense of not imposing any particular assumption on frontier s shape. From here on we call this estimator as Max-Smooth. Nevertheless, imposing some structure or restriction on the first step may be interesting if the restriction is in accordance with the DGP. In this regard, free disposability seems to be a reasonable assumption to be assumed about the technology. Therefore, we also consider an estimator that incorporates the free disposability in the first step. In order to do so we propose to use FDH estimator in place of eq. (1), keeping the other steps unchanged. From here on we call this estimator as Smooth-FDH. Clearly the Smooth-FDH estimator has the advantage of requiring just one bandwidth. Furthermore, for those DGP s characterized by strict monotonicity that estimator is more likely to show better performance than Max-Smooth estimator. On the other hand, we conjecture that the referred flexibility in the first step is valuable in some situations. For instance, when the frontier is not too monotone. In Figures (1) and (2) we give some examples of how the estimators described in remark 1 look like in general. The production frontiers used in the examples are considered in a simulation study described in Section 4. 3

5 Figura 1: Frontier I - Ilustration regarding Max-Smooth and Smooth-FDH estimators Max-Smooth - Smooth-FDH - y True 1st Stage 2nd Stage Max-Smooth y True FDH 2nd Stage Smooth-FDH x x Remark 2. Alternatively, it is possible to estimate the frontier in just two steps, eliminating the positioning step. This would be accomplished by different choices for the bandwidths. Notably, h 1 would be bigger (than that for the 3-stage procedure) in order to get at the same stage shape and position of the frontier. Two possible drawbacks in this approach are (i) there is no guarantee that all observed points lie bellow the estimated frontier; (ii) it seems that estimating the frontier in just two steps result in losing precision, at least in finite sample, as we shall see in a simulation study in section 4. On the other hand, we conjecture that the 2-stage estimator may be more robust to extreme values, since it is not restricted to be above all data points. We shall consider outlier scenario in a future version of the paper. Figura 2: Frontier II - Ilustration regarding Max-Smooth and Smooth-FDH estimators Max-Smooth - Smooth-FDH - y True 1st Stage 2nd Stage Max-Smooth y True FDH 2nd Stage Smooth-FDH x x 4

6 3 Bandwidth Selection We need to estimate h 1 and h 2 in order to implement our estimator. In this section we describe a simple cross-validation type procedure for selecting those bandwidths. We select h 1 and h 2 over a grid of points. The algorithm for bandwidth selection may be described as follows. 1. Select a candidate for h 1 and estimate the first stage (eq. (2)). 2. For all candidates for h 2 estimate the second stage (eq. (3)), but for each X i excluding the correspondent Y max i (and all Y max with the same value if that is the case). 3. Estimate the third stage (eq. (4)) in order to obtain the cross-validation estimated frontier, denoted here by ˆρ CV ( ). 4. Evaluate the following function n CV (h 1, h 2 ) = (Y i ˆρ CV (X i )) 2 (6) i=1 5. Repeat this process for all candidates for h 1. Pick h 1 and h 2 that minimize eq. (6). Note that the exclusion of Y max in 2 is important to avoid selecting a pair of bandwidths that in practice would interpolate the points. Furthermore, the validity of eq. (6) lies on the fact that all sample points are below the estimated frontier by construction of our estimator. 4 Simulations In this section we attempt to highlight the finite sample properties of our estimator. We consider the following DGP: Y i = σ(x i) σ R R i, with p = 1, where, X i are pseudo random variables with uniform distribution on [a, b] where a, b are specified in eq. (7) bellow. R i = exp( Z i ), where Z i are pseudo random variables from an exponential distribution with parameter λ. We consider λ = 3. This parameter for the exponential distribution results in mean efficiency of We consider two specifications for ρ( ): Frontier I: ρ 1 (x) = (x) with x [4, 25], and (7) Frontier II: ρ 2 (x) = 3(x 1.5) x with x [1, 2]. In order to evaluate performance of the estimators we consider two measures of error. One of them is denoted by MSE and is defined as n ( 2 MSE(ˆρ) j = ˆρ(X i ) ρ(x i )), j = 1,, n r, (8) i=1 5

7 where ˆρ( ) is a given estimator and n r is the number of repetitions. We present boxplots concerning this measure for each estimator in each case considered. Moreover we present boxplots of the error committed by an estimator around the true value of the frontier evaluated in a given point. In more detail, we define Err(ˆρ) j = ˆρ(x k ) ρ(x k ), k = 1,, K; j = 1,, n r. (9) We consider K = 3 that correspond to the 0.1, 0.5 and 0.9 quantiles of X. That is, regarding frontier I we have x 1 = 6.1, x 2 = 14.5 and x 3 = For frontier II we have x 1 = 1.1, x 2 = 1.5 and x 3 = 1.9. We consider three sample sizes, 200, 400; and n r = 1000 for each one of the experiments. 4.1 Simulation I In the first set of simulations we investigate the questions pointed out in Remark 1 and Remark 2 in section 2. We consider three estimators. The 3-stage Max-smooth estimator; the 3-stage Smooth-FDH estimator; and a 2-stage estimator, denoted by MaxS-2S. In order to focus on the performance of the estimators we consider an oracle bandwidth selection for the three estimators considered in this subsection. The oracle bandwidths are those that minimize the MSE for each sample, letting the true frontier to be known. This study attempts to highlight to properties of the estimators without concerning about the bandwidth selection and therefore about the error incurred by bandwidth selection. The results are presented and analyzed in subsection 4.2 bellow. 4.2 Simulation II In the second set of simulations we attempt to shed some light on the performance of the proposed estimators in practice. That is, we now consider the performance of the estimators using a data-driven procedure to select the bandwidths. In particular, we are interested in investigating wether the presence of two bandwidths could hidden the applicability of Max-Smooth estimator. We also include the traditional FDH estimator for both frontiers and the traditional DEA estimator for Frontier I. The results are presented and analyzed in the next subsection. 4.3 Analysis of the results Now we present and comment the results from the simulations described above Simulation I Frontier I: (Figures (3) - (10)). We see that the performance of all estimators gets better as n increases. The Smooth- FDH presents the best performance in all cases considered while MaxS-2S presents the worst one. It seems that a 3-stage estimator is preferable than a 2-stage estimator. Regarding Max-Smooth and Smooth-FDH, it seems that the second takes advantage of the free disposability in this case, since the DGP pursue that characteristic. Frontier II: In this scenario the best performance is presented by Max-Smooth estimator followed by MAxS-2S. This seems to be explained by a relatively poor performance of Smooth-FDH especially in the flat portion of the frontier, as we see on Figures (8) - (10). 6

8 As a general conclusion we conjecture that a 3-stage estimator is preferable than a 2-stage one. Furthermore, if the DGP is characterized by strict monotonicity with high derivsative, Smooth-FDH should be considered, but if that is not the case Max-Smooth is preferable Simulation II Frontier I: Here we consider also FDH and DEA estimators. It is worth noting that the DGP considered in this case is very favorable to DEA. Even though Smooth-FDH estimator outperforms its competitors in terms of MSE and almost all situations presented in Figures (11) - (14). Frontier II: In this simulation Max-Smooth shows the best performance among all estimators considered in terms of MSE and in almost all situations considered presented on Figures (15) - (18). It is worth noting that in this set of simulations (Simulation II) we have to estimate two bandwidths in order to implement Max-Smooth estimator, whereas just one bandwidth to implement Smooth-FDH. Even though Max-Smooth exhibits a better performance. This shows that selecting two bandwidths is worthing the price in this case. As a result the conclusions from Simulation II are quite similar to those presented for Simulation I. Furthermore, selecting two bandwidths is not a big concern since the relative performance between Smooth-FDH and Max-Smooth seems not to be affected if we compare Simulations I and II. 5 Conclusion In this paper we propose two estimators for deterministic production frontier models named as Max-Smooth and Smooth-FDH. The referred estimators have desirable properties: (i) they are smooth; (ii) they are not inherent biased as FDH and DEA estimators, and (iii) they are simple to implement. We also present a cross-validation procedure for bandwidth selection. We compare the relative performance of the proposed estimators with DEA and FDH in a simulation study. The results are very favorable to the proposed estimators even in a simulation setup that is very suitable for DEA estimator. Although the results are very satisfactory some extensions are desirable. In a future work we intend to establish a bandwidth selection criteria for the 2-stage estimator (MaxS-2S) and to analyze the performance of that estimator in the presence of outliers. We conjecture that MaxS-2S is very appropriate in that case. 7

9 Figura 3: Frontier I - MSE of Frontier Estimators - Simulation I Figura 4: Frontier I - Dispersion of Frontier Estimators around x 1 = Simulation I Figura 5: Frontier I - Dispersion of Frontier Estimators around x 2 = Simulation I Figura 6: Frontier I - Dispersion of Frontier Estimators around x 3 = Simulation I

10 Figura 7: Frontier II - MSE of Frontier Estimators - Simulation I Figura 8: Frontier II - Dispersion of Frontier Estimators around x 1 = Simulation I Figura 9: Frontier II - Dispersion of Frontier Estimators around x 2 = Simulation I Figura 10: Frontier II - Dispersion of Frontier Estimators around x 3 = Simulation I

11 Figura 11: Frontier I - MSE of Frontier Estimators - Simulation II DEA DEA DEA Figura 12: Frontier I - Dispersion of Frontier Estimators around x 1 = Simulation II DEA DEA DEA Figura 13: Frontier I - Dispersion of Frontier Estimators around x 2 = Simulation II DEA DEA DEA Figura 14: Frontier I - Dispersion of Frontier Estimators around x 3 = Simulation II DEA DEA DEA

12 Figura 15: Frontier II - MSE of Frontier Estimators - Simulation II Figura 16: Frontier II - Dispersion of Frontier Estimators around x 1 = Simulation II Figura 17: Frontier II - Dispersion of Frontier Estimators around x 2 = Simulation II Figura 18: Frontier II - Dispersion of Frontier Estimators around x 3 = Simulation II

13 6 References Aigner, D., C.A.K. Lovell and P. Schmidt, 1977, Formulation and estimation of stochastic frontiers production function models. Journal of Econometrics 6, Cazals, C., J.-P. Florens and L. Simar, 2002, Nonparametric frontier estimation: a robust approach. Journal of Econometrics 106, Charnes, A.,W. Cooper and E. Rhodes, 1978, Measuring the efficiency of decision making units. European Journal of Operational Research 2, Deprins, D., L. Simar and H. Tulkens, 1984, Measuring labor inefficiency in post offices, in: M. Marchand, P. Pestiau and H. Tulkens, (Eds.), The performance of public enterprises: concepts and measurements. North Holland, Amsterdam. Fan, J., 1992, Design-adaptive Nonparametric Regression. Journal of the American Statistical Association, Vol. 87, No. 420, Fan, J. and I. Gijbels, 1995, Data driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society, B 57, Fan, J. and Gijbels, I., 1996, Local Polynomial Modelling and Its Applications. London: Chapman and Hall. Fan, J., and Q. Yao, 1998, Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, Farrell, M., 1957, The measurement of productive efficiency. Journal of the Royal Statistical Society A 120, Gijbels, I., E. Mammen, B. Park and L. Simar, 1999, On estimation of monotone and concave frontier functions. Journal of the American Statistical Association 94, Korostelev, A. P., L. Simar and A. B. Tsybakov, 1995, Efficient estimation of monotone boundaries. Annals of Statistics 23, Martins-Filho, C. and Yao, F., 2007, Nonparametric frontier estimation via local linear regression. Econometrics. Journal of Park, B., L. Simar and Ch. Weiner, 2000, The FDH estimator for productivity efficient scores: asymptotic properties. Econometric Theory 16, Seifford, L., 1996, Data envelopment analysis: the evolution of the state of the art ( ). Journal of Productivity Analysis 7,

Nonparametric Frontier estimation: A Multivariate Conditional Quantile Approach

Nonparametric Frontier estimation: A Multivariate Conditional Quantile Approach Abdelaati Daouia and Léopold Simar GREMAQ, Université de Toulouse I allée de Brienne 3 TOULOUSE, France (e-mail: daouia@cict.fr)