Internatonal Journal of Appled Scence and Technology Vol. No. 9; November A Bootstrap Approach to Robust Regresson Dr. Hamadu Dallah Department of Actuaral Scence and Insurance Unversty of Lagos Akoka, Lagos, Ngera. Abstract We focus on the dervaton of consstent estmates of the standard devatons of estmates of the parameters of a multple regresson model ftted va a robust procedure, namely, the so-called M (M for maxmum lkelhood) regresson fttng method. M-regresson s mostly actualzed by way of weghted least squares (WLS). It s common knowledge that most commonly used statstcal packages offerng WLS assume that the weghts are fxed. In ths scenaro M-regresson yelds standard errors that are nconsstent and unstable, moreso f the underlyng sample s small. The alternatve approach on offer n ths artcle s the bootstrap. Usng the re-samplng mechansm nherent n bootstrappng, t s demonstrated emprcally that bootstrap standard errors are smaller than ther M-regresson counterparts. Key words: M-Regresson, WLS, Standard Errors, Bootstrap Methods, and Bootstrap Standard Errors.. Introducton Bootstrappng was frst ntroduced nto regresson by Efron (979). Snce then much research has gone nto nvestgatng the performance of the bootstrap method n regresson. Freedman (98) offers an early theoretcal analyss of the asymptotc theory of the bootstrap for regresson and correlaton models. Specfcally, the author has shown that the bootstrap approxmaton to the dstrbuton of least squares parameters estmates s vald. Freedman s work was extended by Wu (986) whose nterventon tself was extensvely dscussed by Efron and Tbshran (993) and Wlcox (). Freedman and Peters (984) present the bootstrap n the context of an econometrc regresson model, descrbng the demand for energy by ndustry. The man fndng s that for generalzed least squares wth estmated covarance matrx, the asymptotc formula for standard errors can be too optmstc, sometmes by qute large factors. Thus, the bootstrap procedure s apprecably better than the conventonal asymptotc approach when appled to the fnte sample stuaton. Stne (985) uses the bootstrap to set predcton ntervals n regresson. These ntervals approxmate the nomnal coverage probablty n small samples wthout requrng specfc assumptons about the samplng dstrbuton. The asymptotc propertes of the ntervals do not depend upon the samplng dstrbuton and Monte Carlo results suggest that nvarance approxmately holds for relatvely small samples. Furthermore, Stne states that the use of the bootstrap does however requre certan assumptons; for example, assumptons such as that the specfed model be the correct model. In the same ven Efron (983, 986) extended the problem of predcton rule to general exponental famles wth emphass on logstc regresson. After establshng a general theory for predcton rule, Efron uses the bootstrap to estmate error rate of a predcton rule and also determne how based the apparent error rate s. Breman (996) demonstrates the use of the bootstrap for the more prmary purpose of producng effcent estmates of regresson parameters. Tbshran and Knght (999) have proposed a bootstrap based method for enhancng a search through a space of models, ncludng applcatons to regresson models. Fnally, Hamadu (3) has extensvely studed the use of bootstrappng under a varety of regresson settngs. Ths artcle reports yet another contrbuton to the knds of research efforts descrbed above; that s, research efforts drected towards the study of the performance of the bootstrap n regresson. Specfcally, we demonstrate emprcally that the bootstrap s a vertable nstrument to enhance the effcency of robust (M) regresson. We brefly revew M regresson n Secton. Secton 3 descrbes the crtcal steps of the bootstrap n regresson. We show an emprcal example n Secton 4. The artcle s concluded wth a summary and some comments n Secton 5. 4
Centre for Promotng Ideas, USA www.jastnet.com. Revew of M-Regresson The usual multple regresson model, n matrx notaton s Y X (.) where, Y s an n vector of observatons of the response varable Y X s an n p (desgn) matrx of known constants s an p vector of unknown regresson coeffcents and s an n vector of random errors. It s assumed that elements of are ndependent and dentcally dstrbuted and V( ) In where I n s an n n dentty matrx and (>) s a constant. For the estmaton of by ordnary least squares (OLS) t s further requred that the data at hand be well behaved, that s, that data are devod of outlers. Robust or specfcally M-regresson s a good alternatve to OLS n the event that there are outlers n the data. M- regresson s descrbed as follows. Consder the functon Y X (.) ˆ where Y s the th element of Y (.) X s the th row of X and ˆ s a robust estmate of. The functon s to be maxmzed wth respect to the elements of. Thus, dfferentatng (.) partally wth respect to the elements of, say j, and equatng the dervatves equal to zero, we have ˆ Y X x j (.3) ˆ ' where (.) represent (.), that s, dervatve of, and x j s the (j)th element of X. The maxmzng values p,,..., assocated wth the p equatons are called the M estmators of the elements,,,..., p of, or we can just say that ˆ s M estmator of. Hogg (979) gves a detaled account of how ˆ can be mproved upon usng weghted least squares (WLS). Ths s summarzed n the followng steps:. Begn wth ntal estmates ˆ and ˆ. (Note that t s convenent to take OLS estmate of to be ntal estmate ˆ and followng ths ˆ ˆ medan Y X t. Calculate resduals r Y X ˆ,,,..., n 3. Calculate weghts w ( r )/ r Hence form n n dagonal matrx of weghts W whose dagonal elements are w 4. Carry out weghted least squares (WLS) to yeld new ˆ t t ( X WX X WY ( ) ) 5. Iterate between Step through Step 4 untl convergence. A few pertnent remarks are n order: () an approach that s slghtly dfferent from the above s to estmate and smultaneously. Dutter (977) has descrbed how ths can be done. 5
Internatonal Journal of Appled Scence and Technology Vol. No. 9; November () The choce of an approprate weghtng functon s crtcal n M-regresson. Hogg (979) has gven some tps to gude selecton of an approprate functon from among those that are commonly used n practce, Huber s, and Tukey s bweght functon s are used n the present artcle. 3. The Bootstrap n Robust Regresson Robust estmators such as ˆ of n model (.) are not maxmum lkelhood estmators n the classcal sense. Ths s because the form of the dstrbuton of s not known. Specfcally, the dstrbuton functon F ( ) s not specfed. By the same token F (ˆ ) ) s unknown. Whch goes to show that M estmators are essentally non parametrc. We venture to say that ths non-parametrc envronment provdes a proper settng for the bootstrap methodology to be appled. Let ˆ ( ˆ, ˆ,..., ˆn t ) Y X denote the resduals from the ftted (robust) regresson. The bootstrap sample 6 *., *,..., n* s generated by samplng.,,..., n wth replacement. Thus, the bootstrap sample leaves out some elements from (,,..., n) but could nclude other elements two, three, four or more tmes. Now defnng the bootstrap observatons as Y ˆ ˆ, =,,,n X we can obtan ˆ * as the soluton to ˆ * Y X xj ; j,,..., p (.4) ˆ Notce the smlarty of (.3) and (.4); the smlarty smply shows that applyng the robust estmator to the orgnal sample (Y, X) yelds ˆ, and applyng the same estmator to ( Y, X) yelds ˆ *, namely, the bootstrap estmate. As ndcted earler F (ˆ ) represents the true but unknown dstrbuton functon of ˆ, and F ˆ ( ˆ *) denotes the observed dstrbuton functon of ˆ *, whch s known by vrtue of the fact that t s obtaned va many Monte Carlo repettons of the bootstrap samplng process descrbed earler. That s, f we draw bootstrap samples a large number of tmes, B tmes say, then the B values of ˆ * wll yeld F ˆ ( ˆ *) whch approxmates a maxmum lkelhood estmate of F (ˆ ). The bootstrap varance estmates the true but unknown varance of ˆ. In practcal terms, there are two ways to carry out bootstrappng n regresson analyss where one has data (Y, X) followng the model n (.). One way s to resample the resduals from the ftted model and the other s to resample the data ( Y, X). 3. Bootstrappng Regresson Va Resdual Resamplng Resdual bootstrappng proceeds usng the followng steps:. Perform regresson wth the orgnal sample ( Y, X) to calculate predcted values Yˆ and resduals r. Randomly resample the resduals wth replacement, but leave X and Yˆ values unchanged. Let the bootstrap resduals be denoted by r*.. Construct new Y * values by addng r * to the orgnal predcted values to yeld Y* Y ˆ r *. v. Regress Y * on the orgnal X varable(s). v. Repeat steps () to (v) B tmes. We then study the dstrbuton of the bootstrap estmate ˆ * across the B bootstrap samples.
Centre for Promotng Ideas, USA www.jastnet.com 3. Data Bootstrappng Data resamplng, otherwse called model free bootstrap, bootstraps regresson wthout assumng fxed X or dentcally dstrbuted errors. It proceeds as follows:. Randomly choose samples of sze n, samplng complete cases ( Y, X) from the orgnal data wth replacement. Wthn each bootstrap sample regress Y * on the X* varable(s) as usual Unlke resdual resamplng, data resamplng, as noted above, does not assume ndependent and dentcally dstrbuted errors. Snce t allows for other possbltes, and also admts random X values as a new source of sample-to-sample varaton, data resamplng often yelds results qute dfferent from those expected under the usual regresson assumptons. Stne (99) recommends basng the choce of resdual versus data samplng on how the data were collected. Resdual resamplng would be preferred f the fxed X assumpton s realstc. Otherwse, f X vares as randomly as Y then data resampng should be the choce. In ether case we want the process of bootstrap resamplng to mmc the way n whch the sample was orgnally selected from the populaton. 4. Applcaton 4. Descrpton of Data When ol prces rose durng the 97s, wood stoves came back nto fashon for heatng n parts of the country. Although t s often cheaper than other sources, wood burnng pollutes both outdoor and ndoor ar. The followng table gves measures of the peak carbon monoxde (CO) levels durng tests of wood-burnng stoves. Robust methods are partcularly approprate here due to two unusual tests (9&): the stove F overheated, possbly due to overfllng wth wood, and expermenters reduced arflow by usng a damper that caused the house to fll wth smoke. Such ncdents are common wth no artght stoves, especally wth nexperenced operators (see Hamlton 99). Table 6. Data on Indoor carbon monoxde polluton from wood-burnng stoves Test Stove Burnng Amount of Wood Peak House Type Tme (hours) Burned (kg) CO (ppm) Artght 4.8 37.3.8 Artght 8.8 38.4. 3 Artght 3...6 4 Artght 3.7 7.. 5 Artght 8.5 4.6. 6 Artght 8. 43..4 7 Artght 6. 4. 3.8 8 Non artght 8.74.4 7.7 9 Non artght.4 3.4 35. Non artght 5.4 3. 43. Non artght 9.5 38.6 3.5 X = Burnng Tme, X = Amount of Wood Burned and as menton above Y = CO 4. Regresson Model for the Data The followng regresson model s proposed for the data: Y X X,,,..., (4.) where,,, are regresson parameters, and s random error n Y assumed to have constant varance, that V ( ). Furthermore, for nference purposes, t s necessary to assume that ~ N (, ). 7
Internatonal Journal of Appled Scence and Technology Vol. No. 9; November For fttng the model n (4.) to the data, we used three robust regresson technques, namely, robust bweght regresson on one hand and two bootstrap-based robust procedures on the other. The results of the regresson fts are presented n Table 4.. 8 Table 4. Regresson Estmates and ther Correspondng Standard Errors n Brackets. Methods of Estmaton Estmates WLS based on Huber s weght wth c=.345.497 (.779) WLS based on Bweght weght wth c = 4.875.53 (.93) Robust M-regresson va Model Bootstrap wth B=5 53.46 (.8) Robust M-regresson va Data Bootstrap wth B=5 35.6 (.5) ˆ ˆ ˆ -.977 (.68) -.347 (.) -.65 (.8) -.456 (.53) -.66 (.3) -.65 (.5) -.8 (.6) -. (.4) Relevant entres n the table show that estmates of regresson from the bootstrap robust regresson fttng methods have unformly smaller standard errors than those from the bweght regresson. Ths result s an ndcaton that bootstrappng can serve as an nstrument for boostng the effcency of robust regresson, whch n essence s the man am of ths research. However, we are surprsed at the large dfferences n magntudes of the estmated coeffcents although each estmate mantans the same sgn across the three methods under consderaton. As for the two bootstrap robust regresson models, t s hardly surprsng that ther results dffer. It s not a surprse because, as noted earler n secton 3. above, data resamplng, does not necessarly assume that the desgn X s fxed; nstead, t can admt random X whch occasons greater varablty n the estmaton data. Consequently, data resamplng often yelds results that are qute dfferent from those of resdual resamplng; the latter dependng on the usual least squares assumptons for ts valdty. References Andrews et al (97), Robust Estmates of Locaton. Survey and Advances, Prnceton, F Prnceton, Unversty Press Breman, P. (996) On Robust Estmaton. Annals of Statstcs, 9. 96-7. Freedman, D.A. (98), Bootstrappng Regresson Models. Annals of Statstcs, 9, 8-8. Freedman, D.A and Peters S.C. (984), Bootstrappng Regresson Equaton: some Emprcal Results. Journal of Amercan Statstcal Assocaton, 79, 97, 6. Efron, B. (979), Bootstrap Methods: Another Look at the Jackknfe, Annuals of Statstcs, 7, - 6. Efron, B. (983), Estmatng the Error Rate of a Predcton Rule: Improvement on Cross-valdaton. Journal of Amercan Statstcal Assocaton, 78, 36-33. Efron, B (987), Better Bootstrap Confdence Intervals (wth dscusson). Journal of Amercan Statstcal Assocaton, 8, 7-85. Efron, B. and Tbshran R. (993), ntroducton to the Bootstrap, Chapman and Hall Internatonal t Thomson Publshng, New York. Hamadu, D (3), Bootstrappng Heteroscedastc Regresson Models. Unpublshed PhD Thess. Department of Mathematcs, Unversty of Lagos, Lagos, Ngera. Hamlton, L.C. (99), regresson wth Graphcs: A second Course n Appled Statstcs, Duxbury Press. Calforna. Hampel, F.R. (974), The Influence Curve and ts role n Robust Estmaton. Journal of Amercan Statstcal Assocaton, 69, 383 394. Huber, P. J. (967), The Behavour of Maxmum Lkelhood Estmates under Nonstandard Condtons. Proceedngs Fft Berkeley Symposum Mathematc Statstcs and Probablty, -33. Huber, P. J. (98), Robust Regresson. John Wley and Sons New York. Stne, R. A (985) Bootstrap Predcton Intervals for Regresson. Journal of Amercan Statstcal Assocaton, 8, 6-3. Tbshran, R. and Knght, RK (999), Model Search by the Bootstrap Bumpng. Journal of Computatonal and Graphcal Statstcs, 8, 67-686. Wlcox, R. R. (). Fundamentals of Modern Statstcal Methods. Sprnger-Verlag New York. Wu, C. J. F. (986), Jackknfe Bootstrap and other Samplng Methods n Regresson Analyss. The Annals of Statstcs, 4, 6-94.