A Bootstrap Approach to Robust Regression

Similar documents
Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

S1 Note. Basis functions.

y and the total sum of

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

X- Chart Using ANOM Approach

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Estimating Regression Coefficients using Weighted Bootstrap with Probability

CS 534: Computer Vision Model Fitting

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Hermite Splines in Lie Groups as Products of Geodesics

Finite Population Small Area Interval Estimation

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Improved Methods for Lithography Model Calibration

Wishing you all a Total Quality New Year!

Feature Reduction and Selection

A Binarization Algorithm specialized on Document Images and Photos

C2 Training: June 8 9, Combining effect sizes across studies. Create a set of independent effect sizes. Introduction to meta-analysis

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

A Statistical Model Selection Strategy Applied to Neural Networks

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A Semi-parametric Regression Model to Estimate Variability of NO 2

Adjustment methods for differential measurement errors in multimode surveys

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Mathematics 256 a course in differential equations for engineering students

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Unsupervised Learning

Review of approximation techniques

A Robust Method for Estimating the Fundamental Matrix

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

Support Vector Machines

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Three supervised learning methods on pen digits character recognition dataset

Outlier Detection based on Robust Parameter Estimates

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

SVM-based Learning for Multiple Model Estimation

Smoothing Spline ANOVA for variable screening

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Meta-heuristics for Multidimensional Knapsack Problems

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Model Selection with Cross-Validations and Bootstraps Application to Time Series Prediction with RBFN Models

FITTING A CHI -square CURVE TO AN OBSERVI:D FREQUENCY DISTRIBUTION By w. T Federer BU-14-M Jan. 17, 1951

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

An Optimal Algorithm for Prufer Codes *

Solving two-person zero-sum game by Matlab

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Reducing Frame Rate for Object Tracking

Cluster Analysis of Electrical Behavior

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

THE THEORY OF REGIONALIZED VARIABLES

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Nonlinear Mixed Model Methods and Prediction Procedures Demonstrated on a Volume-Age Model

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

Relational Lasso An Improved Method Using the Relations among Features

Modeling Local Uncertainty accounting for Uncertainty in the Data

Dealing with small samples and dimensionality issues in data envelopment analysis

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 1

Analysis of Malaysian Wind Direction Data Using ORIANA

TN348: Openlab Module - Colocalization

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Variance estimation in EU-SILC survey

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Learning-Based Top-N Selection Query Evaluation over Relational Databases

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

Estimating Bias and RMSE of Indirect Effects using Rescaled Residual Bootstrap in Mediation Analysis

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Multilevel Analysis with Informative Weights

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

An Entropy-Based Approach to Integrated Information Needs Assessment

Malaysian Journal of Applied Sciences

Classifier Selection Based on Data Complexity Measures *

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

CMPS 10 Introduction to Computer Science Lecture Notes

Cluster-Based Profile Monitoring in Phase I Analysis. Yajuan Chen. Doctor of Philosophy In Statistics

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

UNIVERSITY OF CALIFORNIA. Los Angeles. Development of. Statistical Online Computational Resources. and Teaching Tools

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

Transcription:

Internatonal Journal of Appled Scence and Technology Vol. No. 9; November A Bootstrap Approach to Robust Regresson Dr. Hamadu Dallah Department of Actuaral Scence and Insurance Unversty of Lagos Akoka, Lagos, Ngera. Abstract We focus on the dervaton of consstent estmates of the standard devatons of estmates of the parameters of a multple regresson model ftted va a robust procedure, namely, the so-called M (M for maxmum lkelhood) regresson fttng method. M-regresson s mostly actualzed by way of weghted least squares (WLS). It s common knowledge that most commonly used statstcal packages offerng WLS assume that the weghts are fxed. In ths scenaro M-regresson yelds standard errors that are nconsstent and unstable, moreso f the underlyng sample s small. The alternatve approach on offer n ths artcle s the bootstrap. Usng the re-samplng mechansm nherent n bootstrappng, t s demonstrated emprcally that bootstrap standard errors are smaller than ther M-regresson counterparts. Key words: M-Regresson, WLS, Standard Errors, Bootstrap Methods, and Bootstrap Standard Errors.. Introducton Bootstrappng was frst ntroduced nto regresson by Efron (979). Snce then much research has gone nto nvestgatng the performance of the bootstrap method n regresson. Freedman (98) offers an early theoretcal analyss of the asymptotc theory of the bootstrap for regresson and correlaton models. Specfcally, the author has shown that the bootstrap approxmaton to the dstrbuton of least squares parameters estmates s vald. Freedman s work was extended by Wu (986) whose nterventon tself was extensvely dscussed by Efron and Tbshran (993) and Wlcox (). Freedman and Peters (984) present the bootstrap n the context of an econometrc regresson model, descrbng the demand for energy by ndustry. The man fndng s that for generalzed least squares wth estmated covarance matrx, the asymptotc formula for standard errors can be too optmstc, sometmes by qute large factors. Thus, the bootstrap procedure s apprecably better than the conventonal asymptotc approach when appled to the fnte sample stuaton. Stne (985) uses the bootstrap to set predcton ntervals n regresson. These ntervals approxmate the nomnal coverage probablty n small samples wthout requrng specfc assumptons about the samplng dstrbuton. The asymptotc propertes of the ntervals do not depend upon the samplng dstrbuton and Monte Carlo results suggest that nvarance approxmately holds for relatvely small samples. Furthermore, Stne states that the use of the bootstrap does however requre certan assumptons; for example, assumptons such as that the specfed model be the correct model. In the same ven Efron (983, 986) extended the problem of predcton rule to general exponental famles wth emphass on logstc regresson. After establshng a general theory for predcton rule, Efron uses the bootstrap to estmate error rate of a predcton rule and also determne how based the apparent error rate s. Breman (996) demonstrates the use of the bootstrap for the more prmary purpose of producng effcent estmates of regresson parameters. Tbshran and Knght (999) have proposed a bootstrap based method for enhancng a search through a space of models, ncludng applcatons to regresson models. Fnally, Hamadu (3) has extensvely studed the use of bootstrappng under a varety of regresson settngs. Ths artcle reports yet another contrbuton to the knds of research efforts descrbed above; that s, research efforts drected towards the study of the performance of the bootstrap n regresson. Specfcally, we demonstrate emprcally that the bootstrap s a vertable nstrument to enhance the effcency of robust (M) regresson. We brefly revew M regresson n Secton. Secton 3 descrbes the crtcal steps of the bootstrap n regresson. We show an emprcal example n Secton 4. The artcle s concluded wth a summary and some comments n Secton 5. 4

Centre for Promotng Ideas, USA www.jastnet.com. Revew of M-Regresson The usual multple regresson model, n matrx notaton s Y X (.) where, Y s an n vector of observatons of the response varable Y X s an n p (desgn) matrx of known constants s an p vector of unknown regresson coeffcents and s an n vector of random errors. It s assumed that elements of are ndependent and dentcally dstrbuted and V( ) In where I n s an n n dentty matrx and (>) s a constant. For the estmaton of by ordnary least squares (OLS) t s further requred that the data at hand be well behaved, that s, that data are devod of outlers. Robust or specfcally M-regresson s a good alternatve to OLS n the event that there are outlers n the data. M- regresson s descrbed as follows. Consder the functon Y X (.) ˆ where Y s the th element of Y (.) X s the th row of X and ˆ s a robust estmate of. The functon s to be maxmzed wth respect to the elements of. Thus, dfferentatng (.) partally wth respect to the elements of, say j, and equatng the dervatves equal to zero, we have ˆ Y X x j (.3) ˆ ' where (.) represent (.), that s, dervatve of, and x j s the (j)th element of X. The maxmzng values p,,..., assocated wth the p equatons are called the M estmators of the elements,,,..., p of, or we can just say that ˆ s M estmator of. Hogg (979) gves a detaled account of how ˆ can be mproved upon usng weghted least squares (WLS). Ths s summarzed n the followng steps:. Begn wth ntal estmates ˆ and ˆ. (Note that t s convenent to take OLS estmate of to be ntal estmate ˆ and followng ths ˆ ˆ medan Y X t. Calculate resduals r Y X ˆ,,,..., n 3. Calculate weghts w ( r )/ r Hence form n n dagonal matrx of weghts W whose dagonal elements are w 4. Carry out weghted least squares (WLS) to yeld new ˆ t t ( X WX X WY ( ) ) 5. Iterate between Step through Step 4 untl convergence. A few pertnent remarks are n order: () an approach that s slghtly dfferent from the above s to estmate and smultaneously. Dutter (977) has descrbed how ths can be done. 5

Internatonal Journal of Appled Scence and Technology Vol. No. 9; November () The choce of an approprate weghtng functon s crtcal n M-regresson. Hogg (979) has gven some tps to gude selecton of an approprate functon from among those that are commonly used n practce, Huber s, and Tukey s bweght functon s are used n the present artcle. 3. The Bootstrap n Robust Regresson Robust estmators such as ˆ of n model (.) are not maxmum lkelhood estmators n the classcal sense. Ths s because the form of the dstrbuton of s not known. Specfcally, the dstrbuton functon F ( ) s not specfed. By the same token F (ˆ ) ) s unknown. Whch goes to show that M estmators are essentally non parametrc. We venture to say that ths non-parametrc envronment provdes a proper settng for the bootstrap methodology to be appled. Let ˆ ( ˆ, ˆ,..., ˆn t ) Y X denote the resduals from the ftted (robust) regresson. The bootstrap sample 6 *., *,..., n* s generated by samplng.,,..., n wth replacement. Thus, the bootstrap sample leaves out some elements from (,,..., n) but could nclude other elements two, three, four or more tmes. Now defnng the bootstrap observatons as Y ˆ ˆ, =,,,n X we can obtan ˆ * as the soluton to ˆ * Y X xj ; j,,..., p (.4) ˆ Notce the smlarty of (.3) and (.4); the smlarty smply shows that applyng the robust estmator to the orgnal sample (Y, X) yelds ˆ, and applyng the same estmator to ( Y, X) yelds ˆ *, namely, the bootstrap estmate. As ndcted earler F (ˆ ) represents the true but unknown dstrbuton functon of ˆ, and F ˆ ( ˆ *) denotes the observed dstrbuton functon of ˆ *, whch s known by vrtue of the fact that t s obtaned va many Monte Carlo repettons of the bootstrap samplng process descrbed earler. That s, f we draw bootstrap samples a large number of tmes, B tmes say, then the B values of ˆ * wll yeld F ˆ ( ˆ *) whch approxmates a maxmum lkelhood estmate of F (ˆ ). The bootstrap varance estmates the true but unknown varance of ˆ. In practcal terms, there are two ways to carry out bootstrappng n regresson analyss where one has data (Y, X) followng the model n (.). One way s to resample the resduals from the ftted model and the other s to resample the data ( Y, X). 3. Bootstrappng Regresson Va Resdual Resamplng Resdual bootstrappng proceeds usng the followng steps:. Perform regresson wth the orgnal sample ( Y, X) to calculate predcted values Yˆ and resduals r. Randomly resample the resduals wth replacement, but leave X and Yˆ values unchanged. Let the bootstrap resduals be denoted by r*.. Construct new Y * values by addng r * to the orgnal predcted values to yeld Y* Y ˆ r *. v. Regress Y * on the orgnal X varable(s). v. Repeat steps () to (v) B tmes. We then study the dstrbuton of the bootstrap estmate ˆ * across the B bootstrap samples.

Centre for Promotng Ideas, USA www.jastnet.com 3. Data Bootstrappng Data resamplng, otherwse called model free bootstrap, bootstraps regresson wthout assumng fxed X or dentcally dstrbuted errors. It proceeds as follows:. Randomly choose samples of sze n, samplng complete cases ( Y, X) from the orgnal data wth replacement. Wthn each bootstrap sample regress Y * on the X* varable(s) as usual Unlke resdual resamplng, data resamplng, as noted above, does not assume ndependent and dentcally dstrbuted errors. Snce t allows for other possbltes, and also admts random X values as a new source of sample-to-sample varaton, data resamplng often yelds results qute dfferent from those expected under the usual regresson assumptons. Stne (99) recommends basng the choce of resdual versus data samplng on how the data were collected. Resdual resamplng would be preferred f the fxed X assumpton s realstc. Otherwse, f X vares as randomly as Y then data resampng should be the choce. In ether case we want the process of bootstrap resamplng to mmc the way n whch the sample was orgnally selected from the populaton. 4. Applcaton 4. Descrpton of Data When ol prces rose durng the 97s, wood stoves came back nto fashon for heatng n parts of the country. Although t s often cheaper than other sources, wood burnng pollutes both outdoor and ndoor ar. The followng table gves measures of the peak carbon monoxde (CO) levels durng tests of wood-burnng stoves. Robust methods are partcularly approprate here due to two unusual tests (9&): the stove F overheated, possbly due to overfllng wth wood, and expermenters reduced arflow by usng a damper that caused the house to fll wth smoke. Such ncdents are common wth no artght stoves, especally wth nexperenced operators (see Hamlton 99). Table 6. Data on Indoor carbon monoxde polluton from wood-burnng stoves Test Stove Burnng Amount of Wood Peak House Type Tme (hours) Burned (kg) CO (ppm) Artght 4.8 37.3.8 Artght 8.8 38.4. 3 Artght 3...6 4 Artght 3.7 7.. 5 Artght 8.5 4.6. 6 Artght 8. 43..4 7 Artght 6. 4. 3.8 8 Non artght 8.74.4 7.7 9 Non artght.4 3.4 35. Non artght 5.4 3. 43. Non artght 9.5 38.6 3.5 X = Burnng Tme, X = Amount of Wood Burned and as menton above Y = CO 4. Regresson Model for the Data The followng regresson model s proposed for the data: Y X X,,,..., (4.) where,,, are regresson parameters, and s random error n Y assumed to have constant varance, that V ( ). Furthermore, for nference purposes, t s necessary to assume that ~ N (, ). 7

Internatonal Journal of Appled Scence and Technology Vol. No. 9; November For fttng the model n (4.) to the data, we used three robust regresson technques, namely, robust bweght regresson on one hand and two bootstrap-based robust procedures on the other. The results of the regresson fts are presented n Table 4.. 8 Table 4. Regresson Estmates and ther Correspondng Standard Errors n Brackets. Methods of Estmaton Estmates WLS based on Huber s weght wth c=.345.497 (.779) WLS based on Bweght weght wth c = 4.875.53 (.93) Robust M-regresson va Model Bootstrap wth B=5 53.46 (.8) Robust M-regresson va Data Bootstrap wth B=5 35.6 (.5) ˆ ˆ ˆ -.977 (.68) -.347 (.) -.65 (.8) -.456 (.53) -.66 (.3) -.65 (.5) -.8 (.6) -. (.4) Relevant entres n the table show that estmates of regresson from the bootstrap robust regresson fttng methods have unformly smaller standard errors than those from the bweght regresson. Ths result s an ndcaton that bootstrappng can serve as an nstrument for boostng the effcency of robust regresson, whch n essence s the man am of ths research. However, we are surprsed at the large dfferences n magntudes of the estmated coeffcents although each estmate mantans the same sgn across the three methods under consderaton. As for the two bootstrap robust regresson models, t s hardly surprsng that ther results dffer. It s not a surprse because, as noted earler n secton 3. above, data resamplng, does not necessarly assume that the desgn X s fxed; nstead, t can admt random X whch occasons greater varablty n the estmaton data. Consequently, data resamplng often yelds results that are qute dfferent from those of resdual resamplng; the latter dependng on the usual least squares assumptons for ts valdty. References Andrews et al (97), Robust Estmates of Locaton. Survey and Advances, Prnceton, F Prnceton, Unversty Press Breman, P. (996) On Robust Estmaton. Annals of Statstcs, 9. 96-7. Freedman, D.A. (98), Bootstrappng Regresson Models. Annals of Statstcs, 9, 8-8. Freedman, D.A and Peters S.C. (984), Bootstrappng Regresson Equaton: some Emprcal Results. Journal of Amercan Statstcal Assocaton, 79, 97, 6. Efron, B. (979), Bootstrap Methods: Another Look at the Jackknfe, Annuals of Statstcs, 7, - 6. Efron, B. (983), Estmatng the Error Rate of a Predcton Rule: Improvement on Cross-valdaton. Journal of Amercan Statstcal Assocaton, 78, 36-33. Efron, B (987), Better Bootstrap Confdence Intervals (wth dscusson). Journal of Amercan Statstcal Assocaton, 8, 7-85. Efron, B. and Tbshran R. (993), ntroducton to the Bootstrap, Chapman and Hall Internatonal t Thomson Publshng, New York. Hamadu, D (3), Bootstrappng Heteroscedastc Regresson Models. Unpublshed PhD Thess. Department of Mathematcs, Unversty of Lagos, Lagos, Ngera. Hamlton, L.C. (99), regresson wth Graphcs: A second Course n Appled Statstcs, Duxbury Press. Calforna. Hampel, F.R. (974), The Influence Curve and ts role n Robust Estmaton. Journal of Amercan Statstcal Assocaton, 69, 383 394. Huber, P. J. (967), The Behavour of Maxmum Lkelhood Estmates under Nonstandard Condtons. Proceedngs Fft Berkeley Symposum Mathematc Statstcs and Probablty, -33. Huber, P. J. (98), Robust Regresson. John Wley and Sons New York. Stne, R. A (985) Bootstrap Predcton Intervals for Regresson. Journal of Amercan Statstcal Assocaton, 8, 6-3. Tbshran, R. and Knght, RK (999), Model Search by the Bootstrap Bumpng. Journal of Computatonal and Graphcal Statstcs, 8, 67-686. Wlcox, R. R. (). Fundamentals of Modern Statstcal Methods. Sprnger-Verlag New York. Wu, C. J. F. (986), Jackknfe Bootstrap and other Samplng Methods n Regresson Analyss. The Annals of Statstcs, 4, 6-94.