Estimating Regression Coefficients using Weighted Bootstrap with Probability

Similar documents
y and the total sum of

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

X- Chart Using ANOM Approach

Outlier Detection based on Robust Parameter Estimates

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

CS 534: Computer Vision Model Fitting

A Bootstrap Approach to Robust Regression

Support Vector Machines

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Cluster Analysis of Electrical Behavior

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A Semi-parametric Regression Model to Estimate Variability of NO 2

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

S1 Note. Basis functions.

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Estimating Bias and RMSE of Indirect Effects using Rescaled Residual Bootstrap in Mediation Analysis

An Optimal Algorithm for Prufer Codes *

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Classifier Selection Based on Data Complexity Measures *

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Machine Learning: Algorithms and Applications

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

A Binarization Algorithm specialized on Document Images and Photos

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Analysis of Continuous Beams in General

Mathematics 256 a course in differential equations for engineering students

Feature Reduction and Selection

Malaysian Journal of Applied Sciences

Wishing you all a Total Quality New Year!

Cluster-Based Profile Monitoring in Phase I Analysis. Yajuan Chen. Doctor of Philosophy In Statistics

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

The Research of Support Vector Machine in Agricultural Data Classification

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Data Mining: Model Evaluation

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

SVM-based Learning for Multiple Model Estimation

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

A Statistical Model Selection Strategy Applied to Neural Networks

An Entropy-Based Approach to Integrated Information Needs Assessment

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

UB at GeoCLEF Department of Geography Abstract

TESTING AND IMPROVING LOCAL ADAPTIVE IMPORTANCE SAMPLING IN LJF LOCAL-JT IN MULTIPLY SECTIONED BAYESIAN NETWORKS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Analysis of Malaysian Wind Direction Data Using ORIANA

Probability Base Classification Technique: A Preliminary Study for Two Groups

Cell Count Method on a Network with SANET

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Intra-Parametric Analysis of a Fuzzy MOLP

The Grouping Methods and Rank Estimator, Based on Ranked Set sampling, for the linear Error in Variable Models

A Comparative Study for Outlier Detection Techniques in Data Mining

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

3D vector computer graphics

Simultaneously Fitting and Segmenting Multiple- Structure Data with Outliers

CHAPTER 2 DECOMPOSITION OF GRAPHS

Smoothing Spline ANOVA for variable screening

THE THEORY OF REGIONALIZED VARIABLES

Improved Methods for Lithography Model Calibration

A Robust Method for Estimating the Fundamental Matrix

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

The BGLR (Bayesian Generalized Linear Regression) R- Package. Gustavo de los Campos, Amit Pataki & Paulino Pérez. (August- 2013)

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Problem Set 3 Solutions

Some variations on the standard theoretical models for the h-index: A comparative analysis. C. Malesios 1

Related-Mode Attacks on CTR Encryption Mode

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

TN348: Openlab Module - Colocalization

Summarizing Data using Bottom-k Sketches

Reducing Frame Rate for Object Tracking

Adjustment methods for differential measurement errors in multimode surveys

Variance estimation in EU-SILC survey

Finite Population Small Area Interval Estimation

Lecture 5: Probability Distributions. Random Variables

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

JMASM42: An Alternative Algorithm and Programming Implementation for Least Absolute Deviation Estimator of the Linear Regression Models (R)

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Solving two-person zero-sum game by Matlab

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

EXTENDED BIC CRITERION FOR MODEL SELECTION

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Transcription:

Norazan M R, Habshah Md, A H M R Imon Estmatng Regresson Coeffcents usng Weghted Bootstrap wth Probablty NORAZAN M R, HABSHAH MIDI AND A H M R IMON Faculty of Computer and Mathematcal Scences, Unversty Technology MARA, 445 Shah Alam, Selangor, MALAYSIA Laboratory of Appled and Computatonal Statstcs, Insttute for Mathematcal Research, Unversty Putra Malaysa, 44 Serdang, Selangor, MALAYSIA Department of Mathematcal Scences, Ball State Unversty, Munce, IN 476, USA Emal: norazan@tmskutmedumy, habshahmd@gmalcom, mon_ru@yahoocom Abstract: In ths paper we propose a new Weghted Bootstrap wth Probablty ( The basc dea of the proposed bootstrap technque s to do re-samplng wth probabltes These probabltes become the control mechansm for gettng good estmates when the orgnal data set contan multple outlers Numercal examples and smulaton study are carred out to evaluate the performance of the estmates as compared to the Bootstrap and Dagnostc-Before Bootstrap estmates The results of the study sgnfy that the method s more effcent than the other two methods Key-Words: - regresson, outlers, weghted bootstrap wth probablty, weghtng functon Introducton Bootstrap method s a procedure that can be used to obtan nference such as confdence ntervals for the regresson coeffcent estmates The bootstrap method proposed by Efron wth the basc dea of generatng a large number of sub-samples by randomly drawng observatons wth replacement from the orgnal dataset [4, 5] These sub-samples are then beng termed as bootstrap samples and are used to recalculate the estmates of the regresson coeffcents Bootstrap method has been successful n attractng practtoners n many areas, as ts usage does not rely on the normalty assumpton Kun and Yan, for example, dd analyss bullwhp effect n supply chan model usng bootstrap technques [9] An nterestng property of the bootstrap method s that t can provde the standard errors of any complcated estmator wthout requrng any theoretcal calculatons It s now evdent that the presence of outlers have an unduly effect on the bootstrap estmates Outlers are observatons that are markedly dfferent from the bulk of the data or from the pattern set by the maorty of the observatons In a regresson problem, observatons correspondng to excessvely large resduals are treated as outlers There s a possblty that the bootstrap samples may contan more outlers than the orgnal sample because the bootstrap re-samplng procedure s wth replacement [] As a consequence, the varance estmates and also the confdence ntervals are affected and thus resultng to bootstrap dstrbuton break down We may use robust estmator to deal wth possble outlers, but ths may not be enough snce robust estmaton s expected to perform well only up to a certan percentage of outlers In ths paper, we propose a modfcaton of the bootstrap procedure proposed by Imon and Al [] The man dea s to form each bootstrap sample by re-samplng wth probabltes so that the more outlyng observatons wll have smaller probabltes of selecton We organze ths paper as follows we dscuss and summarze several exstng bootstrap ISSN: 9-769 6 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon procedures n Secton ; n Secton we present the newly proposed bootstrap method and examne ts performance; and fnally, some conclusons are made n Secton 4 Some Bootstrap Technques In ths paper, the applcaton of bootstrap technques wll be appled to multple lnear regresson models These models are consdered as they are among the most popular ones and wdely used n varous areas especally for forecastng or predcton [, 8, 8] Let a general lnear regresson model be n the followng form Y = X + ε ( where Y s a (n x vector of contnuous response varable, X s a ( n x p data matrx that ncludes the ntercept, s a ( p x vector of unknown parameters to be estmated from the data, and ε s an ( n x vector of unobservable random errors, normally and ndependently dstrbuted wth mean zero and constant varance σ For an th observaton, equaton ( can be wrtten as y = + x + + x k + ε ( It s generally well known that fxed- x resamplng and random- x re-samplng are the two commonly bootstrappng technques that usually used for lnear regresson model [6,,, ] For clarty we summarze the procedures for the two technques n the followng sectons Fxed-x Re-samplng In the fxed- x re-samplng, we generate bootstrap replcatons when the model matrx X s fxed We treat the ftted values ŷ from the model as gvng the expectaton of the response for the bootstrap samples The fxed- x re-samplng can be summarzed as follows: Step Ft a model to the orgnal sample of observatons to get ˆ and the ftted values ŷ = f ( x, ˆ Step Get the resduals ε = y - ŷ Step Draw ε from ε and attach to ŷ to get a fxed- x bootstrap value y k where y = ˆ f (, + x Step 4 Regress the bootstrapped values on the fxed X to obtan Step 5 Repeat Step and Step 4 for B tmes to get ˆ B ˆ, K, Random-x Re-samplng On the other hand, the random- x re-samplng offers a dfferent approach of bootstrappng Assumng that we want to ft a regresson model wth response y and predctors x whch forms a sample of n ' observatons z = ( y, x The followng summarzes the random- x re-samplng procedure: Step The bootstrap data ( y, x, K,( y n, xn are taken ndependently wth equal probabltes / n from the orgnal cases ( y, x, K,( y n, xn Step Compute for the bootstrap data set ( y,x, K,( y n,xn Step Repeat Step and Step for B tmes to get B ˆ, K, ˆ These two re-samplng methods are also known by other names Some authors or researchers refer to the fxed- x re-samplng as bootstrappng the resduals of lnear regresson models or bootstrap method of lnear regresson model Meanwhle, the random- x re-samplng s also known as the bootstrappng pars or case- re-samplng or bootstrap methods of lnear regresson estmate [6,,, 5, 6] Dagnostc-Before Bootstrap A new way of bootstrappng n lnear regresson was proposed by Imon and Al [] The method s called Dagnostcs-Before Bootstrap In ths procedure, the suspected outlers are dentfed and omtted from the analyss before performng bootstrap wth the remanng set of observatons The bootstrap estmates of parameters nvolve only good observatons Outlers are dentfed usng robust reweghted least squares (RLS resduals as proposed by Rousseeuw and Leroy [4] In order to compute the RLS resduals, a regresson lne s ftted wthout the observatons dentfed as outlers by the least medan square (LMS technque [, 4] The matrx X and Y are parttoned as follows: ε y ISSN: 9-769 6 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon X R YR X =, Y = ( X D YD where R represents the set of cases remanng n the analyss and D as the set of deleted cases If ˆ D represents the vector of the estmated parameters after the deleton of d cases, then the Dagnostcs-Before Bootstrap can be summarzed as follows: Step Ft a model to the remanng observatons to get ˆ D and the ftted values ( D = (, ˆ ( D ŷ f x ( D Step Get the resduals ˆ ε = y - ˆ ( D f (, ( D ˆ ε ( D Step Draw ε from and attach to to get a fxed- x bootstrap values y = ˆ ( D ( D f x, + ε ( D ( R x y ( D yˆ where ( D Step 4 Regress the bootstrapped values y on the fxed X to get ˆ ( D R Step 5 Repeat Step and Step 4 for B tmes to get ˆ ( D ˆ B ( D, K, When outlers are present n our data, both fxedx and random- x re-samplng methods are expected to breakdown Ths can happen as there s no mechansm used to control the presence of outlers n the bootstrap samples produced by these methods Consequently, the possblty to have bootstrap samples wth larger percentage of outlers than that n the orgnal data set s hgh The Dagnostcs- Before Bootstrap on the other hand accommodates the outlers nfluence by frst dentfyng outlers based on the robust re-weghted least squares resduals as proposed n [4] by applyng the weght functon wrtten n equaton (4, f abs ( r > 5 s w = (4, otherwse The s n equaton (4 s the robust scale estmate beng defned as s = 4868[ + {5 / ( n - p}] medan ( (5 r ether receve weghts or whch depends on ts outlyngness Due to ts crude weght assgnment, the deleton set may not be very accurate and ths may possbly affect ts bootstrap estmates Thus, n the next secton, a proposed bootstrap method that s expected to accommodate the problem wll be presented Weghted Bootstrap wth Probablty ( Many researchers use a mechansm so that the resamplng plan s not so much affected by the outlyng observatons For example, Amado and Pres used an nfluence functon to compute those selecton probabltes and appled the procedure to obtan confdence ntervals for the unvarate locaton and for the correlaton coeffcent and selecton of varables n two group lnear dscrmnant analyss [] Other authors have addressed the problem n slghtly dfferent ways for dfferent applcatons Stromberg, for example recommended to use a 5% breakdown S-estmate of varablty nstead of the sample varance for the computaton of the bootstrap varance estmate [] Robustfyng the bootstrap method by applyng wnsorzaton for certan L and M estmators was proposed by Sngh [] Our proposed bootstrap method also attempts to protect the bootstrap procedure aganst a gven number of arbtrary outlers We propose several modfcatons on the Dagnostc-Before Bootstrap procedure Hampel s weghtng ps functon wll be used to determne the weght assgned to each observaton These weghts are calculated from the least medan squares (LMS standardzed resduals If we let r to represent the LMS resduals (where =,, K, n, then the standardzed LMS resduals r u = The Hampel s weghtng ps MAD( r functon (as shown n equaton (7 wth tunng constants a =, b = 9, c = 4 s used to compute the weghts for all cases of orgnal sample If w denotes the weght for the th observaton, then ths weght s defned as w u = ( ψ ( u (6 u where n s the sample sze and p s the number of regresson coeffcents Each observaton would ISSN: 9-769 64 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon ψ ( u Hampel u a sgn( u, = a ( c abs ( u /( c, abs( u a,a abs( u b b sgn ( u,b abs( u c,c abs( u Based on these weghts, we expect that outlers n the orgnal sample wll receve proper weghts accordng to ts outlyng ness We expect that only the very bad outlers wll receve weght and be ncluded n deleted set D To protect the whole procedure aganst outlers, we propose to do bootstrap re-samplng wth probabltes Thus, the th observaton wll get the selecton probablty of p where p = w (8 n w = For p and =,, K, n These probabltes become the control mechansm whereby the bad observatons are ascrbed less mportance than the good ones and thus attrbuted wth lower probabltes for re-samplng Assgnng probabltes p, p, K, p n to { ( y,x, ( y,x, K, ( y n xn }, we are now to present our newly proposed bootstrap method Our proposed method wll be called as the Weghted Bootstrap wth Probablty ( For smplcty, most of the notatons used n [8] are adopted n ths paper We let R represents the set of remanng cases and D represents the set of deleted cases We propose that the remanng set R should contan observatons wth p >, thus allowng more observatons to be nvolved n the bootstrappng process The matrx X and Y are as defned n equaton ( Let ˆ (-D to denote the vector of the estmated parameters after the deleton of d cases and the ˆ (-D s estmated by fttng a lnear model to the remanng observatons only, namely the X R and the YR The followng steps descrbe the procedure: Step : Ft orgnal data wth LMS Apply Hampel s weghtng functon to dentfy outlers based on the LMS resduals Ft a model to the remanng observatons (wth > to (7 get ˆ D and the ftted values y ˆ D = f x, ˆ - ( D ( w Step : Get the resduals ˆ ε D = y - ˆ (-D f ( x, (-D Step : Draw ε from ˆ ε D by re-samplng wth probabltes as shown n equaton (8 Attach ( - ε D to yˆ D to get a fxed- x bootstrap values (-D y where y = ˆ (-D (-D f ( x R, + ε Step 4: Regress the bootstrapped values (-D on ˆ (-D the fxed X R to get Step 5: Repeat Step and Step 4 for B tmes to get (-D (-D B(-D ˆ, ˆ, K, ˆ In ths study, re-samplng wth probablty n Step above was done by makng use of the avalable S- Plus procedure called sample Examples usng real data sets It s generally known that least squares estmates are very senstve to the outlers, thus can lead to msleadng nference Smlarly, as we mentoned earler, not all exstng bootstrappng technques can reman effcent when outlers are present We wll assess the goodness of our proposed bootstrap method, compared to the Bootstrap and Dagnostcs-Before Bootstrap Bootstrap s not ncluded as ts performance was already found to be very poor [] Two real data sets namely the Hawkns-Bradu-Kass data and the Stackloss data that commonly used by other researchers for valdatng ther robust methods, were used as numercal examples It was reported that the frst ten observatons n Hawkns-Bradu-Kass data set are outlers [] Meanwhle, Stackloss data set conssts of 4 outlers [4] For each bootstrap method, 5 bootstrap subsamples were drawn Least squares estmates for each sub-sample were computed For smplcty, let the term ˆ B th represents the estmate of the B bootstrapped sample and ˆ s the vector for estmate from the orgnal sample To check for the stablty of the bootstrapped estmates, we constructed 95% confdence ntervals for the bootstrapped regresson parameters base on the varance of the bootstrapped re-calculated estmates The 95% standard confdence ntervals for defned as ( ˆ ± (9 z 5 s y s ISSN: 9-769 65 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon B where s the sample standard devaton of ˆ s To graphcally llustrate the stablty of the proposed bootstrap procedure, we also dsplayed the scatter B plots of ˆ - ˆ (where B =,, K,5 and =,,, p We expect that a bootstrap procedure s B stable when ˆ - ˆ s close to zero For all the bootstrappng technques dscussed earler, the estmate of s defned as ˆ 5 B = ˆ 5 = 5 B( -D ˆ ( B= 5 B= th Hence, the resdual for each bootstrap method can be wrtten as ˆ ε = y x ( T ˆ B Fgure - Fgure exhbt the plots of ( ˆ - ˆ B versus ( ˆ - ˆ for Hawkns-Bradu-Kass data Plots for other regresson coeffcents are not dsplayed here due to space constrant, however ther results are consstent These fgures show that the proposed bootstrap procedure s the most stable estmates followed by the Bootstrap and the Dagnostc-Before bootstrap estmates The plot of Fgure clearly ndcates that the estmates s the most stable even wth the presence of multple outlers, evdenced by the values of the bootstrap bases whch are close to zero On the other hand, the Dagnostc-Before bootstrap and the Bootstrap method fal to provde stable estmates as can be observed from Fgure and Fgure Bootstrapped resdual estmates for Hawkns- Bradu-Kass data are presented n Table For comparson purpose, we also nclude least squares (LS and re-weghted least squares (RLS resduals Many authors generally agree that for ths data set, the robust RLS can generate estmates that most lkely to be very close to the error true values [, 4] In ths respect, we would expect that the more robust method would be the one wth resduals closest to the RLS resduals From Table, t reveals that the method s apprecably the most robust method snce ts resduals are very close to the RLS resduals The performance of the OLS and Bootstrap are farly close and not encouragng Ther resduals are very far from the RLS resduals It s nterestng to note here that both the proposed and the Dagnostc-before bootstrap methods can easly detect and dentfy all the outlers n the gven data set Unfortunately, both the least squares and Bootstrap method not only fal to dentfy the correct outlers but suffer maskng problem -4-4 Bootstrap -4-4 B B Fgure : Plots of ( ˆ - ˆ versus ( ˆ - ˆ for Hawkns-Bradu-Kass data usng Bootstrap method -4-4 Dagnostc-before Bootstrap -4-4 B B Fgure : Plots of ( ˆ - ˆ versus ( ˆ - ˆ for Hawkns-Bradu-Kass usng Dagnostc-before Bootstrap method -4-4 Weghted Bootstrap wth Probablty ( -4-4 B B Fgure : Plots of ( ˆ - ˆ versus ( ˆ - ˆ for Hawkns-Bradu-Kass usng Weghted Bootstrap wth Probablty method ISSN: 9-769 66 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon Table : Bootstrap Resduals for Hawkns-Bradu- Kass Data LS BOOT RLS 8 8 89 974 979 995 998 88 85 8 6 957 45 45 4 56 564 8 9656 9655 5 6 65 8757 8 7 6 46 48 865 9997 9996 7 45 455 945 797 796 8 87 84 9 8 8 9 79 7 844 9768 9767 9 4 8749 4-78 -787-46 -6-64 -97-967 -555-4 - -68-66 -7 66 6 4-8 -8-557 -5-5 5-66 -66-89 -499-5 6 867 866-878 46 456 7 646 647-4 -67-7 7 47 474-488 866 86 7 6 6-5 7 8 9-49 -65-7 7 44 44-75 67 6 74-9 -89-76 -79-75 75-47 -45-879 478 474 Table presents the least squares (LS and the robust RLS coeffcent parameter estmates from the orgnal Stackloss data Meanwhle, Table - Table 4 llustrate the least squares coeffcent parameter estmates of the Stackloss bootstrapped subsamples It s worth to menton here that the least squares estmates of the orgnal sample s senstve to multple outlers, but not the robust RLS We clearly observe that the agan repeats ts excellent performance The results of Table ndcate that the always outperforms the other two bootstrap methods (see Table 4 and Table 5 The confdence ntervals of the estmates sgnfy the narrowest average nterval length for all of the regresson coeffcents On the other hand, the confdence ntervals for the Bootstrap and the Dagnostc-Before Bootstrap gve bad results Ther average confdent lengths are promnently large Table : True Coeffcent Estmates obtaned from the orgnal Stackloss data Coeffcent LS Robust RLS -99-765 76 798 95 577-5 -67 Table : 95% Confdence Intervals for Bootstrap Estmates usng Stackloss Data Coeffcent CI CI Length (-4488, -7 47 (75, 89 85 (5, 84 454 (-5, 6 66 Table 4: 95% Confdence Intervals for Dagnostc Bootstrap Estmates usng Stackloss Data Coeffcent Dagnostc CI Dagnostc CI Length (-64555, -584 497 (6, 7 7 (44, 66 74 (-47, 66 66 Table 5: 95% Confdence Intervals for Bootstrap Estmates usng Stackloss Data Bootstrap Coeffcent Bootstrap CI CI Length (-6, -969 46 (48, 95 47 (66, 9 7 (-44, 9 54 ISSN: 9-769 67 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon Examples usng smulated data sets Examples from the real data sets n Secton have shown that the coeffcent estmates are n general found to be the most stable bootstrapped estmates wth the shortest confdence nterval lengths In ths secton we would further nvestgate the robustness of our proposed bootstrap method by conductng a smulaton study The smulaton study was performed usng a multple lnear model of three predctors Data sets of sze n =, 4 and wth resdual outlers α = % and % were created based on the adapted smulaton desgn used by Sebert et al [7] The observatons for predctor varables were selected at random from the U (, 5 dstrbuton For both good and bad observatons, ther random errors were generated from N(, All outlers were placed away from the good observatons by a dstance of standard devatons (standard devaton σ = We now llustrate the procedure for creatng artfcal data set for the case of a multple lnear model wth a sngle response and three predctor varables Our approach s to randomly generate n regresson observatons These n observatons nclude the n c (clean observaton and the n outlers (where n = α % of n Thus altogether, we have nc + n = n observatons The nc clean observatons were generated accordng to the model y x x = + + + ( + c c c c where =,, K,n c and = K = = 5 The and ε are from U (,5 and N(, respectvely The no outlyng observatons were generated from the model y = + x + x + x + y shft + ε ( c c c c x ε x c th each k bootstrapped sample ( k =,, K,, a least squares bootstrapped estmate was computed th and denoted the k bootstrapped estmate as ˆ ( =,,, Based on these re-computed k k ˆ, we calculated ts standard devaton The bootstrapped standard errors of the procedure wll be compared to the Bootstrap and Dagnostcs-Before Bootstrap ( standard errors Table 6 - Table 8 present the bootstrapped estmates usng the smulated data In the contamnated data sets, the bad performance s observed for both the Dagnostc-Before Bootstrap and the Bootstrap estmators Ther bootstrapped coeffcent estmates are far from the true value The method gves very appealng results wth the lowest values of standard errors More serous consequences are observed when we ncreased the outler percentage to % Enhancng the percentage of outlers by more than % would result to sgnfcant ncrease n the bootstrap standard errors In other words, we would suggest that generally, the relablty of these bootstrap estmates decreases as the percentages of outlers exceed % Ths s notceable n both the Dagnostc-Before bootstrap and the Bootstrap estmates, but no so apparent n the estmates Fgure 4 Fgure 9 provde densty plots for bootstrapped estmate The plots graphcally represent the summarzed performances of the three bootstrap methods for data sets wth % and % outlers It seems that the estmates from the are not so much affected as compared to those of the Bootstrap and the Dagnostc-Before-Bootstrap methods The advantages of the over other methods are more apparent n data sets wth bg sample szes and for outlers exceedng the level of % In summary, the results from the experments ndcate that the procedure performs well n most of the gven stuatons where =,, K,n o The resdual outlers were created when the yshft represents the number of standard devatons (n ths study s taken to be the outlers are placed away from the good observatons For any contamnated data sets of sze n, resduals outlers are placed as the last α % observatons Usng each of the bootstrap method, we generated bootstrapped random samples For ISSN: 9-769 68 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon Table 6 : Bootstrapped Estmates wth ther respectve standard errors wrtten n brackets for n = Outlers = % Coeffcent BOOTSTRAP 587 64 55 (9 (6678 (8 5 486 54 (4 (55 (648 494 57 494 ( 4964 ( (57 47 (499 Outlers = % (646 498 (66 Coeffcent BOOTSTRAP 99 495-554 (75 (86 (57 59 86 56 (58 (5 (578 55 966 55 (4 4996 (4 (88 49 (85 (7 4996 (6 Table 8 : Bootstrapped Estmates wth ther respectve standard errors wrtten n brackets for n = Outlers = % Coeffcent BOOTSTRAP 584 964 7665 (76 (4688 (569 54 48 54 (4 ( (86 497 56 4967 (6 4996 (5 57 ( 445 (7 Outlers = % (8 4987 (75 Coeffcent BOOTSTRAP 479 49 45 (9 (7 (8984 4994 4 56 ( (98 (489 56 79 548 ( 5 ( (4 455 (74 (544 4989 (45 Table 7 : Bootstrapped Estmates wth ther respectve standard errors wrtten n brackets for n =4 Outlers = % Coeffcent BOOTSTRAP 5569 6 44 (547 (5695 (74 4965 474 4967 (9 (87 (6 499 456 57 (9 5 ( ( 4966 ( (7 546 (94 Densty 5 5 BOOTSTRAP 4 6 8 Bootstrapped-Beta Fgure 4: Densty plots of bootstrapped coeffcent estmates for usng contamnated data set wth sample sze n= and % resdual outlers Outlers = % Coeffcent BOOTSTRAP 578 694 9898 (68 (4 (998 4969 45 498 ( (54 (578 498 476 4985 (8 498 ( (49 478 (58 (488 498 (566 ISSN: 9-769 69 Issue 7, Volume 8, July 9

Bootstrapped-Beta WSEAS TRANSACTIONS on MATHEMATICS Norazan M R, Habshah Md, A H M R Imon Densty 5 5 BOOTSTRAP Densty 5 5 BOOTSTRAP 4 6 8 4 6 8 Fgure 5: Densty plots of bootstrapped coeffcent estmates for usng contamnated data set wth sample sze n= and % resdual outlers Bootstrapped-Beta Fgure 8: Densty plots of bootstrapped coeffcent estmates for usng contamnated data set wth sample sze n= and % resdual outlers Densty 5 5 BOOTSTRAP Densty 5 5 BOOTSTRAP 4 6 8 4 6 8 Bootstrapped-Beta Fgure 6: Densty plots of bootstrapped coeffcent estmates for usng contamnated data set wth sample sze n=4 and % resdual outlers Bootstrapped-Beta Fgure 9: Densty plots of bootstrapped coeffcent estmates for usng contamnated data set wth sample sze n= and % resdual outlers Densty 5 5 BOOTSTRAP 4 6 8 Bootstrapped-Beta Fgure 7: Densty plots of bootstrapped coeffcent estmates for usng contamnated data set wth sample sze n=4 and % resdual outlers 4 Concluson In ths paper, we propose a new bootstrap method to reduce the effect of outlers on the bootstrap estmates The numercal studes suggest that Bootsrap performs poorly n the presence of outlers The Dagnostc-Before bootstrap s more effcent than the Bootstrap but t s not suffcently robust because t s not very stable and has relatvely large confdence nterval lengths The method consstently outperformed the Bootstrap and Dagnostc-Before bootstrap methods It emerges that the Hampel s weghtng functon and re-samplng probablty schemes ntroduced n the procedure help to mprove the performance of the bootstrapped estmates The results of the study clearly ndcate that the s the best estmator as t s consstently provdes stable estmates, closest resduals to the error true values and shortest average confdent length Hence, t should provde a robust alternatve to other exstng bootstrap methods ISSN: 9-769 7 Issue 7, Volume 8, July 9

Norazan M R, Habshah Md, A H M R Imon References: [] Amado, C and Pres, A M, Robust Bootstrap wth Non Random Weghts Based on the Influence Functon, Communcatons n Statstcs, Volume, Issue, 4, page 77-96 [] Azam, Z, Ibrahm, M, Shahrum, A and Mohd Sahar, Y An Evaluaton of Test Statstcs for Detectng level Change n BL (,,, Models WSEAS TRANSACTIONS on MAHEMATICS Issue, Volume 7, 8, page 67-7 [] Darmesah, G, Zanodn, H J, Kamsa, B, and Suran, H Multple Lnear Regresson n Forecastng the Number of Asthmatcs WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS Issue 6, Volume 5, 8, page 97-977 [4] Efron, B (979 Bootstrap Methods Another Look at the Jackknfe Ann Statst,Volume 7, 979, page -6 [5] Efron B and Tbshran RJ, An Introducton to the Bootstrap, Chapman Hall, 998 [6] Fox, J, Bootstrappng Regresson Models : An Appendx to An R and S-Plus Companon to Appled Regresson Avalable on lne: http://cranr-proectorg/doc/contrb/fox- Companon/appendx-bootstrappngpdf, [7] Hampel, FR, Ronchett, EM, Rousseeuw, PJ, and Stahel, WA, Robust Statstcs: The Approach based on Influence Functons John Wley and Sons, 986 [8] Kamsa, B, Zanodn, J, Darmesah, G, Noran, A and Amran, A, Effect of Water s on Ephemeroptera Abundance n Telpok Rver, Sabah Malaysa, WSEAS TRANSACTIONS on ENVIRONMENT and DEVELOPMENT Issue 5, Volume 4, 8, page 447-45 [9] Kun, LH, and, Yan, K C, Analyss Bullwhp Effect n Supply Chan Model by Usng Bootstrap Technque, WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS Issue, Volume, 6 [] Md, H, Bootstrap Methods n a Class of Non-Lnear Regresson Models Pertanka J Sc & Techno, 8(, page 75-89, [] Rahmatullah Imon, AHM, Identfyng Multple Hgh Leverage Ponts n Lnear Regresson, Journal of Statstcal Studes, Specal Volume, page 7-8, [] Rahmatullah Imon, AHM, and Al, M M, Bootstrappng Regresson Resduals, Journal of Korean Data & Informaton Scence Socety, 6(, page 665-68, 5 [] Rousseeuw, P J, Least Medan of squares regresson Journal of the Amercan Statstcal Assocaton, 79, page 87-88, 984 [4] Rousseeuw, P J & Leroy, A M, Robust Regresson and Outler Detecton, Wley, 987 [5] Salban-Barrera, M and Zamar, R H, Bootstrappng Robust Estmates of Regresson, Ann Stat, (, page 556-58, [6] Salban-Barrera, M, Bootstrappng MM- Estmates for Lnear Regresson wth Fxed Desgns, Statstcs and Probablty Letters, 6, page 59-66, 6 [7] Sebert, DM Montgomery, D C Roller, D A, A Clusterng Algorthm for Identfyng Multple Outlers n Lnear Regresson, Computatonal Statstcs and Data Analyss, 7, page 46-484, 988 [8] Seung, W L and Jun, Y S, Constructon and Operaton of Knowledge Base on Intellgent Machne Tools, WSEAS TRANSACTIONS on SYSTEMS, Issue, Volume 7, 8, page 48-55 [9] Shao, J, and Tu, D, The Jackknfe and Bootstrap, Sprnger-Verlag, 995 [] Sngh, K, Breakdown Theory for Bootstrap Quantles The Annals of Statstcs, 6, page 79-7, 998 [] Stromberg, A J, Robust Covarance Estmates based on Resamplng Journal of Statstcal Plannng and Inference, 57, page -4, 997 [] Ulmans, J, and Kolyshkn, A, The Impact of ICT on the Development of Latva as a New Member of the EU, WSEAS TRANSACTIONS on BUSINESS and ECONOMICS, Issue, Volume 4, 7, page 5-59 [] Venables, WN and Rpley, BD, Modern Appled Statstcs wth S-Plus, rd edton, Sprnger-Verlag, [4] Wllems, G and Aelst, S V, Fast and Robust Bootstrap for LTS Computatonal Stat and data Analyss, 48, page 7-75, 5 ISSN: 9-769 7 Issue 7, Volume 8, July 9