Robust Mean Squared Error Estimation for ELL-based Poverty Estimates under Heteroskedasticity - An Application to Poverty Estimation in Bangladesh

Statstcs and Applcatons {ISSN 45-795 (onlne)} Volume 6 Nos., 08 (New Seres), pp 75-97 Robust Mean Squared Error Estmaton for ELL-based Poverty Estmates under Heteroskedastcty - An Applcaton to Poverty Estmaton n Bangladesh Sumonkant Das, Hukum Chandra and Ray Chambers Shahjalal Unversty of Scence and Technology, Bangladesh ICAR-Indan Agrcultural Statstcs Research Insttute, Inda Unversty of Wollongong, Australa Receved: February, 08; Revsed: June 4, 08; Accepted: June 5, 08 Abstract The ELL poverty mappng method (Elbers, Lanjouw and Lanjouw, 00) has been crtczed because of the rsk of underestmatng the mean squared error (MSE) of the correspondng poverty estmates when the area homogenety assumpton s volated. Das and Chambers (07) descrbe a robust ELL-based MSE estmaton approach that assumes unt-level homoscedastcty. However, applcatons of the ELL approach are typcally based on an assumpton of unt-level heteroskedastcty. Ignorng ths behavor can lead to underestmaton of the MSE of the poverty estmates generated by the ELL approach. Ths paper extends the dea of Das and Chambers to ELL-based MSE estmaton assumng unt-level heteroskedastcty. The proposed method s then appled to poverty estmaton n Bangladesh n order to evaluate ts usefulness n a realstc data scenaro. Key words: Homoskedastcty, Small Area Estmaton, Poverty Mappng, Stratfcaton. Introducton. Poverty estmaton n Bangladesh Snce 98-84 Bangladesh poverty rates have been estmated at natonal and dvsonal levels usng data collected n the Household Income and Expendture Survey. To show the actual varaton n poverty ncdence (HCR) between local admnstratve unts, the Bangladesh Bureau of Statstcs (BBS) n conjuncton wth Unted Naton World Food Program (UNWFP) conducted a poverty mappng study usng the Bangladesh 00 Populaton and Housng Census (hereafter referred as Census 00) and the Bangladesh 000 Household Income and Expendture Survey (hereafter referred as HIES 000) datasets (BBS and UNWFP, 004). Ths poverty map was updated usng the HIES 005 data (WB, BBS and WFP, 009). Though the natonal level poverty ncdence was about 40 percent n 005 (BBS, 0), sub-dstrct level poverty ncdence vared from about 0 to 55 percent (WB, BBS and UNWFP, 009). The ELL method developed by the World Bank was used to obtan these sub-dstrct poverty estmates. Both poverty maps show that areas close to the captal cty Dhaka have lower poverty rates but the actual sze of the poor populaton n these areas s large. In comparson, the sub-dstrcts n the Chttagong Hll Tracks (south-eastern part of Bangladesh) have hgh poverty ncdence but ther populaton szes are relatvely small. On Correspondng Author: Sumonkant Das Emal: sumonkant-sta@sust.edu

76 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. the other hand, the sub-dstrcts n the northern part (well known for ther seasonal food nsecurty) have large populaton szes as well as hgher poverty rates (WB, BBS, and WFP, 009).. Data sources and model specfcatons In the frst poverty mappng study of Bangladesh (BBS and UNWFP, 004), hereafter referred as the BBS-004 study, the BBS systematcally sampled 5% of the enumeraton areas (EAs) from each sub-dstrct of Census 00 nstead of usng the full census data. Ths 5% of Census 00 covers 5 dvsons, 64 dstrcts (Zla), 507 sub-dstrcts (Upzla), 908 EAs, 5840 households (HHs), and 6,56,000 ndvduals. The target small domans are the sub-dstrcts whch are not consdered n the survey samplng desgn. The structures of the full Census 00 and the 5% of Census 00 are detaled n BBS and UNWFP (004). The HIES 000 s used as the accompanyng economc survey dataset requred by the ELL methodology to ft models that are then used to smulate poverty data based on the 5% of Census 00. Ths survey dataset covers 95 out of 507 sub-dstrcts. The sample for HIES 000 s drawn followng a standard two-stage stratfed samplng desgn, where 44 EAs (clusters) are drawn from 6 strata at the frst stage and 748 HHs are drawn from the selected EAs (0-0 HHs per PSU) at the second stage. It should be noted that about two-thrds of sampled sub-dstrcts ( out of 95) had just a sngle sample cluster and so sub-dstrct sample szes are very small. The samplng desgn and the structure of HIES 000 data are detaled n BBS-004 study. In ths paper we wll use the HIES 000 and the 5% of Census 00 datasets to examne the emprcal performance of the estmators proposed n what follows. The man task when applyng ELL methodology for poverty mappng s to dentfy approprate explanatory varables at dfferent levels n the herarches present n the populaton data n order to reduce overall resdual varaton gven the model specfcaton. In the BBS-004 study, 7 HH specfc and sub-dstrct specfc explanatory varables are used to ft a two-level random effects model wth HH and cluster as levels and respectvely. The between-area varaton s gnored, most lkely because about 75% of the sampled sub-dstrcts have only a sngle sampled cluster. In ths study, we examne the contrbutons to overall varablty due to the presence of cluster and sub-dstrct herarches n the survey data by fttng both -level (L) and -level (L) models to these data wth sub-dstrct correspondng to level-three n the data herarchy. We note that about 0% of the varaton n HH expendture s due to between cluster and between sub-dstrct varablty (see set- n Table ). In partcular, the contrbuton of sub-dstrct level varablty s neglgble (about 5%) but statstcally sgnfcant (p-value < 0.000). The presence of neglgble between-area varaton and also lack of suffcent survey data to ft an approprate -level model essentally enforce mplementaton of the standard -level model-based ELL method, whch gnores subdstrct level random effects.

08 POVERTY ESTIMATION IN BANGLADESH 77 Table : Estmate of varance component parameters obtaned by method of moment estmaton under -level (L) and -level (L) homoskedastc models Data Set Model DF u,% u e e,% MR CR p-value of LRT Set- L 0. 0.05-8.8-59.84 67.8 All L 4 0. 0.09 0.006.8 4.46 59.97 67.9 <0.00005 Set- L 0.09 0.067-9.67-64.7 7. - Multple Clusters L 4 0.09 0.086 0.008.69 6.0 64.50 7.50 0.0000 Set- Rural L 7 0. 0.0-6.5-47.8 56.45 - Clusters Set-4 L 7 0.4 0.08-9.77-6.07 70.7 - Urban Clusters L 8 0.4 0.05 0.0066 5. 4.68 6.4 70.60 0.00065 Note: MR Margnal R-squared, CR Condtonal R-squared Scrutny of the HIES 000 data set out n Table suggests that the sampled sub-dstrcts wth a sngle sampled cluster are manly rural (97 out of rural clusters), whle sampled sub-dstrcts wth multple sampled clusters are manly urban (8 out of 0 urban clusters). In order to nvestgate the presence of sgnfcant between area varablty n the urban parts of Bangladesh, a new sample dataset (herenafter, set-) was created contanng only sampled sub-dstrcts wth multple clusters. Ths allowed both -level and - level random effects models to be ftted and the sgnfcance of between-area varablty to be tested. The results for set- n Table confrm the presence of statstcally sgnfcant sub-dstrct level random effects n the urban parts of Bangladesh. Both survey and census datasets ndcate that a sgnfcant number of sub-dstrcts have both urban and rural parts (Table ) and so t s reasonable to partton each dataset nto rural and urban sub-sets, and then estmate between area varablty usng data from resdental areas. In the rural sample data (hereafter set-), only 5 out of sub-dstrcts have multple clusters, whle 7 out of 96 sub-dstrcts n the urban sample of HIES 000 data (hereafter set-4) have multple clusters. The results for set-4 n Table confrm the sgnfcance of between area varablty n the urban sample. These results therefore questons regardng the sutablty of applyng naïve ELL methodology (.e. based on a -level model) to the data set as a whole snce ths mght not capture the actual between area varablty present n the HIES 000 data. Table : Dstrbuton of household (HH), cluster and sub-dstrcts (area) by type of resdence n HIES 000 and Census 00 Type of Resdence Census 00 HIES 000 Overall Sngle Cluster Multple Cluster HH Cluster Area HH Cluster Area HH Area HH Cluster Area Urban 44849 506 6 775 08 96 489 5 86 8 7 Rural 04 040 455 465 4 99 97 74 7 5 Total 58090 908 507 748 44 95 448 00 0 7 Note: Census EA has both rural () and urban (68) HHs. Suggested alternatves to ELL methodology Note that the ELL method of poverty estmaton s typcally based on an assumpton of cluster-heterogenety. If ths assumpton s volated, the method produces approxmately unbased poverty estmates, but wth MSE estmates that are based low, whch n turn leads to poor coverage rates for the poverty estmates (Tarozz and Deaton, 009). The MSE estmates help to prortze the small areas accordng to ther correspondng poverty

78 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. estmates. An area wth an underestmated MSE wll receve less prorty compared to an area wth the same value of the assocated poverty estmate, but wth MSE estmated correctly. Estmaton of MSE depends on several ssues ncludng specfcaton of the ftted model, unexplaned varaton n the response varable after accountng for the explanatory varables, estmated varaton at hgher levels (e.g. cluster/area levels), and the populaton sze of a small area. In typcal applcatons of the ELL method, HH specfc random errors are usually assumed to be heteroskedastc (HT) whle the cluster random effects are usually assumed to be homoskedastc (HM), n large part because there are often few clusters per sampled area n the survey data. Ignorng level one heteroskedastcty generally leads to based estmates of dstrbuton functons and hence based estmates of the Foster, Greer and Thorbecke (984, hereafter FGT) measures of poverty ncdence. In these cases the HM cluster-specfc varance component s estmated assumng heteroskedastcty at HH-level, usng a HH level heteroskedastcty model (typcally referred as an alpha model) for between HH varances whch s a functon of the potental explanatory varables. Under ether HM or HT level-one errors, a -level nested-error regresson model s then ftted gnorng area-specfc random effects. It s well known that f between area varaton remans sgnfcant after ncorporatng area-level contextual varables n the regresson model, then the ELL method leads to estmated MSEs that are based low, and hence under coverage of the true populaton values of the poverty measures. The modfed ELL (MELL) method of Das and Chambers (07) wll perform better n such stuatons. But the MELL s based on an assumpton of HM level one errors. Consequently the MELL needs to be modfed n order to account for heteroskedastc HH-level errors. Alternatvely, one could consder mplementng ether the optmstc or the conservatve ELL methods (Elbers, et al., 008; World Bank, 0) assumng HH-level heteroskedastcty at HH-level. Unfortunately, however, both these methods assume HM random effects for both -level and -level random effects models ftted to the survey data. Furthermore, both these approaches perform poorly when there s between area varablty (Das and Chambers, 07). In ths artcle, we develop and test a modfcaton to the MELL approach assumng heteroskedastcty at HH-level. In the next Secton we revew the ELL method and ts modfcatons (ncludng the modfed MELL) for capturng potental between area varablty under both HM and HT household-level errors. In Secton we then demonstrate the applcaton of these methods usng the HIES 000 data of Bangladesh. Secton 4 contans a dscusson of our emprcal results and summarzes our major fndngs. Fnally, n Secton 5 we provde some concludng remarks and ndcate further research avenues.. The ELL Methodology and Its Extensons To start, let E and m denote the per capta household expendture (.e. welfare th measure) and the number of famly members (.e. famly sze) respectvely of the k th household (HH) belongng to the j cluster n the th small area. Let y log( E ) denote the log transformed per capta household expendture. The area-specfc FGT poverty ; 0,, C N ndcators are then calculated as F = M m E t I E t where and CI N C j k j M m M and clusters n th area; M j k k C are respectvely the total number of ndvduals N m and N are respectvely the total number of th ndvduals (populaton) and households (HHs) n the j cluster of the th area. When HHspecfc weghts are gnored or equal, the FGT ndcator becomes

08 POVERTY ESTIMATION IN BANGLADESH 79 wth N E t k C = j F N I E t N C j N. Here, t s the threshold for E under whch a person/hh s consdered as beng n poverty. In the standard ELL approach, a -level nested error regresson model s consdered assumng HHs at level-one and clusters at level-two as T T y xβ u xβ u, () where and are dentcally and ndependently dstrbuted cluster-specfc and HH-specfc random errors,,,..., D; j,,..., C ; k,,..., N. Here, ~ N 0,. The subscrpt l s used to ndcate any parameter under a perfectly specfed l-level model. If the HH-specfc random errors are assumed to be HT, the model () can be expressed as T ht T ht y xβ u xβ u,, () where ; and. Here the superscrpt ht stands for heteroskedastcty. When an addtonal area-specfc random effect s also assumed n the above two models, the correspondng -level models can be expressed T T y xβ u xβ u, () wth and, and T ht T ht, y x β u x β u, (4) wth and. If the model () s true but the correspondng -level model () s used to mplement the standard -level ELL, the resultng MSE estmates wll be underestmates. A smlar problem wll occur f the HT -level model () s used nstead of HT -level model (4) n the ELL. Snce the area homogenety assumpton of the ELL method s volated n both stuatons, t s necessary to capture the level-three varablty n order to prevent subsequent MSE underestmaton. The MELL method of Das and Chambers (07) s desgned to account for HH-level heteroskedastcty. Note that estmaton of varance components s dffcult when the HH random errors are assumed to be HT. Estmaton methods for HM varance components and for HT error varances are frst descrbed n what follows and then the ELL based methods are dscussed under both -level and -level workng models assumng HT level-one errors. Fnally the -level model-based ELL type estmators are modfed to account for the gnored between area varablty of a -level true model... Varance component estmaton under heteroskedastcty The varance components are estmated va the method of moments (MM) approach under both homoscedastcty and heteroskedastcty. The varance components under -level and -level HM models can be easly estmated, whch s not the case under heteroskedastcty. Under the -level HT model (), the MM estmator of ht can be obtaned under the assumpton of known HH-level error varances as ht u w w w e. e... w w s s s u

80 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. where w n n,. n k ht..., n n n. n e. n e k, e n e jks, k n and e y y. In the smlar manner, the MM estmators of ht and u under () can be obtaned as and......... w e e w e e w w w w ht s s s s u w w w s js w w w e.. e... w w ht s s w w w w w s s js w w w e. e... w w w w w w w s s js s js s where w n n, n, and e n.. n e. Negatve values of the js. estmator ht jks wll be treated as zero. The dervatons of these estmators wth ther propertes are gven n Appendces A. and A. respectvely. The HH-level error varances are estmated by fttng a heteroskedastcty model known as the alpha model n the ELL method. MM estmates of HH-level random errors are utlzed to develop a logstc-type regresson model to estmate the parameters of ths alpha model. These estmated alpha parameters are then used to obtan estmated HH-level error varances usng the estmator ( ) ELL AD AD D, v r D D where =.05 maxmum ê parameters and the estmated mean squared error { }, = exp( â) and = ( ). The estmated alpha v r are obtaned from the ftted alpha model. The procedure s detaled n Elbers et al. (00) under -level HT workng model. Under the -level model (4), the estmates of can be estmated by the ELL estmator, usng the correspondng estmated HH-level resduals... The ELL Method After fttng the regresson model and obtanng the correspondng parameters, the second stage of the ELL method s to conduct ether a parametrc bootstrap (PB) or a non-parametrc bootstrap (NPB) procedure to obtan the area-specfc poverty estmates of

08 POVERTY ESTIMATION IN BANGLADESH 8 nterest and ther correspondng estmated mean square errors (ESMEs). For both the PB or NPB procedures the basc steps are: () generate regresson parameters * β from a sutable samplng dstrbuton, say the multvarate normal dstrbuton gls, gls N β v β ; () generate level-specfc random errors usng an approprate parametrc dstrbuton or by resamplng va smple random samplng wth replacement (SRSWR) from the estmated level-specfc sample resduals; () generate bootstrap ncome values y usng the generated regresson parameters and the level-specfc random errors. The generated ncome values are used to estmate the area-specfc parameter of nterest say F * N C N I exp y * t j k for a specfc poverty lne t. These steps are terated a large number of tmes say B=500 and then the mean and varance of the B estmates are consdered as the fnal estmates and ther MSEs respectvely... Modfcaton of the ELL Method The ELL approach based on a -level HM workng model may produce underestmated MSE f the area varablty s gnored. However, the approach can be modfed to capture the potental area varablty followng the MELL approach of Das and Chambers (07). Ths MELL approach s adapted here to allow for HT level-one random errors. The bass of the MELL methodology s the adjustment to the varance estmator of a C N weghted area mean Y M m y under an ncorrect -level model to make t j k unbased under the correspondng -level model. Under the -level HM model (), the varance of Y and ts plug-n estmator can be expressed Var Y umu m and U V Y umu m U where m M C M U j and m M C N m. Under an ncorrect -level U j k model (), the varance of Y and ts plug-n estmator can be wrtten as Var m Y u U m U V Y m u U m. U and The expectaton of the varance estmator V V U U becomes u * Y under the true -level model E Y R m m whch always underestmates the true Var Y snce m and R n n n n. varance U 0 0 An unbased plug-n estmator of Var( ) s dffcult to obtan under a multlevel model wth HT level- random errors. However, a plug-n consstent estmator of ths varance can be obtaned f a consstent estmator of the HT error varances s avalable. ( ) Under the HT models (4) and (), we have Var ( ) ( ) = s h ( ) +s ( ) ( ) ( ) + x ( ) and ( ) ( ) Var ( ) ( ) = s ( ) + x ( ), where C N M m j k, and M C N m j k,. It can be shown that ht, ht, and u ht are unbased and consstent estmators u under an assumpton of known HH-level error varances (see Appendces A. and A.). Now

8 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. suppose that the HH error varance estmators and,, Var Y are V and,, are consstent estmators of respectvely. Then consstent plug-n estmators of Var ( ) ( ) = ŝ h M C N m j k, ( ) ( ) +ŝ ( ) + ( ) x and V ( ) ( ) = ŝ Y and ( ) ( ) + ( ) x respectvely where and C N. Under the assumpton of M m j k, known, t can be shown that ht ht ht, E u n n u 0 n n0. It follows that when the HT -level model (4) holds, ht ht ht ht E V Y R m U mu E V u u Y under the assumpton of E E. That s, the estmator V ( ) ( ) mght underestmate the true varance Var ( ) ( ) n both the HM and HT cases. Two area-specfc adjustments or robustfcatons of V ( ) ( ) that lead to an unbased or approxmately unbased estmator of Var and Y are V V ( ) ( ) = ( ( ) )ŝ { h( ) +ŝ ( )} ( ( ) ( ) )ŝ h +ŝ ( ) { } ( ) +ŝe ( ) ( ) ( ) ( ) ( ) +ŝe, ( ) ( ) = ( ) ( ) ( ) both of whch are approxmately unbased under the true -level model. Note that n ether case the varance estmator V ( ) ( ) would be robust under model msspecfcaton, snce mght be very small (close to zero) under a true -level model, and hence the frst term of V M Y mght be neglgble. Under the MELL approach, the man task s to adjust the cluster-level varance component n order to capture the potental area-level varablty. As n Das and Chambers (07), the adjustment factors for the cluster-varance component can then be defned as follows: D s ht ht ht k u Ds m s u, D ht ht ht k u D m U u, and D p ht ht ht k D m U ; p,..., P p u p u where C s j m m m, D s s the number of sampled areas, sampled ndvduals n the th sampled area and th j sampled cluster of the m s the total number of m s the total number of sampled th ndvduals n the sampled area. Note that when no household sze nformaton s avalable we set household szes to one so that the adjustment factors then depend on the number of HHs. Snce the adjustment factors are based on the area-specfc

08 POVERTY ESTIMATION IN BANGLADESH 8 sample and populaton szes, there are no basc dfference between the adjusted factors under homoscedastcty and heteroskedastcty except for the estmated varance components. The MELL can be mplemented va both PB and NPB procedures. In PB procedure, the cluster level resduals are generated from a sutable parametrc dstrbuton wth the adjusted cluster-varance component, say M u M u N 0, where s the adjusted cluster-specfc varance component. In the NPB procedure, the cluster-level scaled resduals rescaled so that the rato of bootstrap varatons need to be approxmates the rato of correspondng estmated varance components M, where the scaled resduals are u and wth. The procedure s smlar for all varatons of these proposed modfcatons. For the stratfcaton-based MELL, the generaton of cluster-specfc resduals or resamplng of sample cluster-specfc resduals s carred out usng stratum-specfc adjusted cluster-varance components, say M h k, h,.., H. These bootstrap procedures are explaned n more detal n the uh. u next secton. u. Implementaton of ELL Method and Its Alternatves.. Frst stage regresson modellng The frst step when applyng the ELL method s to ft a regresson model utlzng the survey dataset. It s strongly recommended that ths model ncorporates a large number of explanatory varables at dfferent levels of the data herarchy n order to capture the potental between-cluster and between-area varabltes. However, overfttng ths regresson model can then become a problem. Furthermore, the ncluson of more ndvdual level explanatory varables and contextual varables does not guarantee a sgnfcant reducton of the betweenarea varablty. In order to avod ths overfttng problem, the logarthm of HH per capta monthly consumpton expendture s frst regressed on the same 0 explanatory varables that were used n BBS-004 study. For the frst two datasets (Set- and Set-), these explanatory varables are next used to ft -level and -level models. The estmated varance-components from these fts are then used to calculate the generalzed least squares (GLS) estmates of the regreson model parameters and ther assocated estmated varance-covarance matrx. Margnal and condtonal R-squared values are calculated followng Nakagawa and Schelzeth (0) n order to compare the two multlevel models. Estmated values of parameters and level-specfc random errors are stored for each model for use n the bootstrap procedure. Note that these models are ftted assumng both HM and HT varances for the HHspecfc random errors... Heteroskedastcty Modellng To examne the heteroskedastcty of the HH-level random errors, the squared least squares (LS) resduals obtaned from both Set- and Set- are plotted aganst the correspondng predcted values (Fgure ). The plots suggest neglgble monotone heteroskedastcty due to less nformaton at the extreme tals. The alpha model s ftted wth the explanatory varables that were used n the BBS-004 study to avod any extensve exploraton of potental explanators of heteroskedastcty. Fgure shows that the HT error. varances estmated by the ELL parametrc approach under the -level model ( ) fluctuate more than those under the -level model (. ELL, ELL, ), whch may be due to -level

84 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. model gnorng the area varablty n the survey dataset. The HT error varances are slghtly hgher and wth less varaton when estmated under a -level model compared wth a -level model. The estmated area-level and cluster-level HM varance components under the assumpton of HT HH-level errors are set out n Table A n the Annex. The varance components estmated by MOM under the HT approaches are close to those estmated under the assumpton of HM HH-level errors. Ths suggests that level-one heteroskedastcty has a neglgble effect on estmaton of hgher level varance components... Bootstrappng Both the PB and NPB bootstrap procedures have been used to calculate the FGT poverty estmates and ther MSEs. Under the HT model, the NPB procedure s a semparametrc bootstrap (SPB) snce the HH-level error varances are generated on the bass of a parametrc specfcaton for the alpha parameters. In both types of bootstrap procedure, the regresson parameters are generated from a multvarate gls gls N β., v β. dstrbuton. For the PB procedure, level specfc random errors are generated from a normal dstrbuton wth zero mean and the correspondng estmated varances as varance. In the NPB and SPB procedures, the LS raw resduals are used to generate level-specfc random errors va SRSWR. Under a -level model, the resduals can be drawn ether from sample raw resduals or scaled resduals f the bootstrap varaton s same as the estmated varance component. Under a -level model, the moment-based cluster-specfc resduals and are-specfc resduals are same for the 75% sub-dstrcts and so all the resduals need to be re-scaled. We therefore use scaled resduals n all bootstrappng under both the -level and -level models. Under HH-level heteroskedastcty there s no recommended way to re-scale the HH-level resduals when they are used for bootstrappng. In ths paper we therefore scale the HH-specfc HT resduals by the HH-level varance component, estmated by under a -level model, where e s the MSE of the ntal sngle-level lnear model ftted by -. the LS method. The mean of the estmated HH-level resdual varances (say, å ŝ e ( ) ), or the HM varance component can also be used for scalng. SRSWR resamplng of the level-specfc resduals can be mplemented ether uncondtonally or condtonally when mplementng the ELL (00) method. Under uncondtonal samplng, the level-specfc errors are assgned to census unts from the full set of sample resduals. Under condtonal samplng, the level-specfc resduals are drawn followng a nested approach. In ths case, suppose a cluster-specfc sample resdual s randomly assgned to a census cluster. Then census HHs nested wthn that census cluster are assgned HH-level random errors from the subset of sample HHs nested wthn the selected sample cluster. Under a -level workng model, condtonal approach can be used f a suffcent number of sampled areas have multple clusters. In our study, we have followed a condtonal approach under a -level model and an uncondtonal approach under a -level model. We note that both the uncondtonal and condtonal approaches behave smlarly under a -level model, but the condtonal approach produces unstable results under a -level model. Snce ELL-based methods can be mplemented n dfferent ways based on () the adopted bootstrap procedure (PB/NPB/SPB), () heteroskedastcty vs, homoskedastcty of HH-level random errors (HM/HT), and () the assumed workng model (L/L); the resultng estmators are denoted dfferently n Table A n the Annex. For example PELL.HM.L stands for PB based ELL estmator under a -level HM workng model. e ht u

08 POVERTY ESTIMATION IN BANGLADESH 85.4. Applcaton of modfed ELL methods Das and Chambers (07) found that the stratfcaton-based adjustment (Adjustment ) works well, and so ths approach has been used when mplementng the MELL methods under both HM and HT level-one errors. Sub-dstrcts are grouped nto sx strata accordng to 0 th, 5 th, 50 th, 65 th, and 80 th percentle of the dstrbuton of ther populaton sze. The cluster-level varance component s adjusted wthn each stratum accordng to ( ) = ( )ŝ s ( ) ŝ (, =,...,6. Note that the same stratfcaton has been used under both the ) ( ) HM and HT workng models. As a consequence, stratum specfc bootstrap procedures based on the stratum-specfc adjusted cluster-level varance component are mplemented when applyng the ELL method. Fgure : ELL-based estmates of unt-level heteroskedastc error varances under - level and -level models 4. Results and Dscusson In the HIES 000, lower (LPL) and upper (UPL) poverty lnes were developed n order to calculate FGT poverty ndces. These poverty lnes were defned on the bass of the 6 strata that were created for the survey samplng desgn. Snce the HIES 000 was based on the Bangladesh Populaton and Housng Census 99, these poverty lnes were therefore mapped to the 00 Census n the poverty mappng study based on BBS-004. However, t should also be noted that the poverty lnes used n the BBS study are not exactly mantaned n the study reported here because some sub-dstrcts lacked nformaton. Sub-dstrct level poverty ncdences n BBS-004 were calculated usng the ELL method wth SPB va a

86 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. condtonal approach under a -level workng model. In ths study the smlar approach s followed usng scaled resduals to estmate poverty ndcators under a -level model (SPELL.HT.L). The consstency of the resultng estmated FGT poverty ndces was checked by comparng summary statstcs for the estmated poverty measures and ther MSEs wth those obtaned va BBS and UNWFP (004, page 4). Table shows that means of the estmated FGT measures are slghtly lower wth slghtly hgher standard devaton (SD) compared to those obtaned n the BBS study, whle means and SDs of the estmated MSEs are margnally smaller. The reasons of these dfferences may be () a dfferent procedure was used to ft the regresson model, () a dfferent approach to heteroskedastcty modellng, () a dfferent bootstrap procedure, (v) slghtly dfferent poverty lnes for some sub-dstrcts, and (v) the number of resamples n the bootstrap procedure (500 nstead of 00 bootstrap). We also notced that two of the model covarates that were created for our study behaved dfferently n the ftted regresson model. Summary values of the estmated MSEs generated by the -level model-based ELL (SPELL.HT.L), and the modfed ELL (MSPELL.HT) estmators are also shown n Table. It can be seen that the SPELL.HT.L estmator produces underestmated MSEs n comparson to the SPELL.HT.L estmator, whle the MSPELL.HT estmator corrects ths downward bas under an assumpton of between area varablty. We therefore conclude that the SPELL.HT.L estmator mght provde better accuracy f real between area varablty s absent. In the BBS-004 poverty mappng study, MSEs were calculated for the FGT estmates as well as the number of households. For examnng the relatonshp between these quanttes, sub-dstrct specfc populaton szes, estmated HCR at UPL, and estmated MSE (SPELL.HT.L and MSPELL.HT) are plotted n Fgure. The frst three maps of Fgure suggest that sub-dstrcts wth large populatons and closer to the captal (Dhaka) or port ctes (Chttagong) have lower HCRs wth lower MSEs. Conversely, sub-dstrcts wth smaller populatons n coastal and hlly regons have hgher HCRs wth hgher MSEs. Some sub-dstrcts n the north-western part of Bangladesh wth large populatons (Rangpur and Rajshah dvsons) have relatvely hgher HCRs wth lower MSEs. The maps (c) and (d) n Fgure show that the values of the SPELL.HT.L MSE estmator are generally lower than comparable MSE values generated by the MSPELL.HT estmator. In general, ths good performance of SPELL.HT.L s due to the populaton szes of sub-dstrcts. The hgher the populaton sze, the lower the MSE f there s no other potental source of varablty. Ignorng between area varablty s the man reason why the SPELL.HT.L estmator has low MSEs. If real between area varablty exsts, ths false accuracy wll mslead polcy makers. Under the assumpton of between area varablty, t s clear that the SPELL.HT.L estmator leads to underestmated MSEs partcularly for large ctes (e.g., blue ponts) because t gnores between area varablty whle the MSPELL.HT estmator corrects for ths underestmaton by allowng for potental between area varablty. The SPELL.HT.L and MSPELL.HT estmators lead to smlar MSEs only for the sgnfcantly smaller sub-dstrcts, partcularly those n the south-eastern regons. The varaton n the estmated MSEs s more explct when Map (c) s compared to Map (d). The reason for ths may be the tendency of lower MSEs for sub-dstrcts wth a modest populaton szes (more than,000) and hghest MSEs for the smaller sub-dstrcts when usng the SPELL.HT.L estmator, compared wth the MSPELL.HT estmator, whch provdes comparatvely hgher weght to the smaller sub-dstrcts and comparatvely lower weght to the larger sub-dstrcts when accountng for between area varablty. We conclude that when there s neglgble between area varablty, SPELL.HT.L s hghly optmstc whle MSPELL.HT s reasonably conservatve. More detaled comparsons are gven n the followng paragraphs.

08 POVERTY ESTIMATION IN BANGLADESH 87 Table : Summary statstcs of the estmated HCR, PG, and PS at lower (LPL) and upper (UPL) poverty lnes wth ther estmated MSEs by dfferent estmators assumng unt-level heteroskedastcty (HT) wth those of BBS-004 study Parameter Poverty lne Estmated by Mn. Max. Mean SD HCR PG PS Estmated MSE of HCR Estmated MSE of PG Estmated MSE of PS LPL UPL LPL UPL LPL UPL LPL UPL LPL UPL LPL UPL BBS-004 0.004 0.55 0.90 0.06 SPELL.HT.L 0.0089 0.5898 0.80 0.74 BBS-004 0.0049 0.708 0.4 0.8 SPELL.HT.L 0.045 0.79 0.464 0.80 BBS-004 0.000 0.68 0.0679 0.097 SPELL.HT.L 0.00 0.7 0.06 0.05 BBS-004 0.0006 0.44 0.4 0.04 SPELL.HT.L 0.007 0.46 0.08 0.044 BBS-004 0.0000 0.0650 0.08 0.0 SPELL.HT.L 0.000 0.0676 0.00 0.0 BBS-004 0.000 0.09 0.049 0.08 SPELL.HT.L 0.009 0.079 0.09 0.090 BBS-004 0.005 0.45 0.088 0.046 SPELL.HT.L 0.0047 0.047 0.040 0.0 SPELL.HT.L 0.0070 0.5 0.067 0.059 MSPELL.HT 0.046 0.5 0.055 0.046 BBS-004 0.006 0.08 0.046 0.04 SPELL.HT.L 0.07 0.07 0.07 0.05 SPELL.HT.L 0.06 0.47 0.0680 0.08 MSPELL.HT 0.004 0.549 0.0596 0.05 BBS-004 0.000 0.046 0.07 0.0054 SPELL.HT.L 0.0008 0.0 0.006 0.0045 SPELL.HT.L 0.00 0.040 0.096 0.007 MSPELL.HT 0.007 0.0549 0.06 0.006 BBS-004 0.0009 0.058 0.069 0.0066 SPELL.HT.L 0.007 0.044 0.044 0.005 SPELL.HT.L 0.0045 0.058 0.069 0.007 MSPELL.HT 0.0085 0.070 0.070 0.0066 BBS-004 0.000 0.00 0.0055 0.006 SPELL.HT.L 0.000 0.05 0.0044 0.00 SPELL.HT.L 0.000 0.00 0.008 0.005 MSPELL.HT 0.00 0.050 0.00 0.005 BBS-004 0.000 0.000 0.008 0.006 SPELL.HT.L 0.0009 0.009 0.0068 0.008 SPELL.HT.L 0.004 0.068 0.06 0.0044 MSPELL.HT 0.00 0.086 0.049 0.004

88 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. Fgure : Bangladesh maps of sub-dstrct specfc populaton, estmated poverty ncdence at UPL and ther estmated MSEs (EMSE) by SPELL.HT.L and MSPELL.HT estmators In what follows FGT poverty ndcators at UPL are estmated assumng both HM and HT HH-level errors under -level and -level workng models for data Set-. The HCRs estmated by dfferent ELL estmators are plotted aganst the HCRs estmated by the standard -level model-based ELL estmator wth PB procedure (PELL.HM.L). Fgure shows that the NPB-based ELL estmators of HCR under a -level workng model (NPELL.HM.L) lead to almost same results as PELL.HM.L estmator under homoskedastcty. The HT estmator SPELL.HT.L leads to slghtly larger HCRs compared to the PELL.HM.L estmator partcularly for the areas wth hgher HCRs. These values are most lkely overestmates when -level model s taken to be the workng model. However, under homoscedastcty, the - level model-based estmator performs smlarly to the -level model-based estmator wth some fluctuatons. Ths suggests that the -level model-based estmators provde comparable unbased poverty estmates to the -level model-based estmators. The observed dfferences n the performances of the estmators under homoscedastcty and heteroskedastcty ndcate that the observed heteroskedastcty mght have nfluence on the estmated poverty measures. Ths nfluence of level-one heteroskedastcty has already been noted n the MSE estmates dsplayed n Fgure 4. Under both the -level and the -level model based approaches, estmators wth heteroskedastc level- errors have slghtly hgher MSEs than ther homoscedastc counterparts.

08 POVERTY ESTIMATION IN BANGLADESH 89 Fgure : NPELL and SPELL estmates of HCR at UPL under -level and -level workng model wth homoskedastc (HM) and heteroskedastc (HT) level-one errors Fgure 4: Comparson of estmated MSEs (EMSE) of the estmated HCRs at UPL under -level and -level workng model wth homoskedastc (HM) and heteroskedastc (HT) level-one errors for the NPELL and SPELL estmates The estmated MSEs of the estmated FGT ndcators (HCR, PG, PS) obtaned va the ELL estmators and ther modfed versons under the assumpton of both HM and HT level-one errors are plotted aganst area-specfc populaton szes n Fgure 5 for the data Set-. Ths shows a declnng trend of estmated MSE as populaton sze ncreases. As expected the -level model-based ELL MSE estmators are lower than the -level model-based MSE estmators, wth the modfed -level MSE estmators fxng ths underestmaton problem for all three ndcators. For HCR, the modfed estmators have estmated MSEs close to the naïve -level estmator for areas wth smaller populaton, but have lower estmated MSEs for areas wth larger populaton. The dfference between the estmated MSEs calculated by the - level and -level model-based estmators ncreases wth populaton sze. For PG and PS, the

90 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. performance of the modfed estmators does not vary as much. Fgure 5 shows that the modfed estmators lead to estmated MSEs very close to those generated by the -level model-based estmators for PG and estmated MSEs that are slghtly hgher for PS. Fgure 5: Estmated MSE (EMSE) of HCR at UPL under -level and -level models wth homoskedastc (HM) and heteroskedastc (HT) level-one errors by ELL estmators wth ther modfed versons for full data (set-) When between area varablty s clear n the dataset Set-, the modfed estmators based on the PB procedure behave n the same way as n the dataset Set-. See Fgure 6. In partcular, the modfed ELL estmators of HCR exhbt some downward bas n estmated MSE when compared to the -level ELL estmator but stll perform much better than the naïve -level ELL estmators. Ths downward bas dsappears when the modfed procedures are used to estmate PG and PS.

08 POVERTY ESTIMATION IN BANGLADESH 9 Fgure 6: Estmated MSE (EMSE) of HCR at UPL under -level and -level models wth homoskedastc (HM) level-one errors by PELL and NPELL estmators wth ther modfed versons for sampled sub-dstrcts wth multple clusters 5. Concludng Remarks The emprcal evdence presented n ths paper shows that the ELL estmators based on a -level workng model fal to capture the true MSE f the workng regresson model volates the area homogenety assumpton. In the Bangladesh dataset that s our focus, the area homogenety assumpton s clearly volated n urban small areas (sub-dstrcts). Snce use of ELL estmators based on a -level model s not realstc for ths dataset (most sampled sub-dstrcts have sngle sampled clusters), use of -level model-based ELL estmator leads to stable FGT estmates but underestmated MSEs. The proposed modfed verson of the - level ELL estmator overcomes ths underestmaton problem, and seems to work well under both HM and HT level-one errors. We explore the emprcal behavor of ths modfed -level approach where between area varablty s obvous (set-) and when t s neglgble (set-). In both stuatons, the modfed estmators performed better than the nave -level ELL estmators. When used to estmate the HCR, the modfed estmators appear to be slghtly downward based n terms of estmated MSE when compared to the -level estmators but much less based than the naïve -level estmators. In partcular, we noted that when there s sgnfcant between area varablty, as occurs n the urban areas of Bangladesh, the -level model-based naïve MSE estmators provde an naccurate mpresson of accuracy as far as estmaton of FGT measures s concerned. In contrast, although the modfed MSE estmators seem conservatve because they allow for between area varablty, they also consderably

9 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. reduce ths problem of MSE underestmaton. Ths s n accord wth smulaton results reported n Das and Chambers (07). Snce the cost of underestmatng the MSE (.e. ncorrectly clamng hgher precson) may be much hgher than the premum for obtanng conservatve precson (.e. a slghtly overestmated MSE), proper care and approprate nvestgaton should be undertaken before usng a naïve -level model-based MSE estmator to obtan FGT estmates. References BBS (00). 00 Populton Census: Prlmnary Report. Mnstry Plannng, Government of Bangladesh, Statstcs Dvson. Dhaka: Bangladesh Bureau of Statstcs. BBS (0). 0 Populaton and Housng Census: Prlmnary Results. Mnstry Plannng, Government of Bangladesh, Statstcs Dvson. Dhaka: Bangladesh Bureau of Statstcs. BBS & UNWFP (004). Local Estmaton of Poverty and Malnutrton n Bangladesh. Dhaka: The Bangladesh Bureau of Statstcs and The Unted Natons World Food Programme. Das, S. (06). Robust nference n poverty mappng. Doctor of Phlosophy thess. Unversty of Wollongong, Australa. Das, S. and Chambers, R. (07). Robust mean-squared error estmaton for poverty estmates based on the method of Elbers, Lanjouw and Lanjouw. Journal of Royal Statstcal Socety, Seres A, 80, 7 6. Elbers, C., Lanjouw, J.O., and Lanjouw, P. (00). Mcro-level estmaton of welfare. World Bank Polcy Research Workng Papaer, 9. Washngton DC: The World Bank. Elbers, C., Lanjouw, J.O. and Lanjouw, P. (00). Mcro-level estmaton of poverty and nequalty. Econometrca, 7(), 55 64. Elbers, C., Lanjouw, P. and Lete, P.G. (008). Brazl wthn Brazl : Testng the Poverty Map Methodology n Mnas Geras. World Bank Polcy Research Workng Paper, 45. Washngton, DC: The World Bank. Nakagawa, S. and Schelzeth, H. (0). A general and smple method for obtanng R from generalzed lnear mxed effects models. Methods n Ecology and Evoluton, 4(), - 4. Tarozz, A. and Deaton, A. (009). Usng census and survey data to estmate poverty and nequalty for small areas. The Revew of Economcs and Statstcs, 9(4), 77-79. WB, BBS, and WFP (009). Updatng Poverty Maps of Bangladesh. Washngton, DC & Dhaka: The World Bank, Bangladesh Bureau of Statstcs and World Food Program. World Bank (0). Nepal Small Area Estmaton of Poverty 0. Volume. Washngton DC: The World Bank. World Bank (05). Poverty and Equalty Databank. http://povertydata.worldbank.org.

08 POVERTY ESTIMATION IN BANGLADESH 9 Annexure Table A: Varance components under -level and -level models wth homoskedastc (HM) and heteroskedastc (HT) level-one errors by method of moments (MOM) and stratfed MOM for dfferent datasets Skedastcty at HH-level Cluster Model DF u u e e Sngle L 0.5 0.08-6.00 - L 0.09 0.067-9.67 - Multple HM: MOM L 4 0.09 0.086 0.008.69 6.0 L 0. 0.05-8.8 - All L 4 0. 0.09 0.006.8 4.46 Sngle L 0.6 0.00-5.90 - L 0. 0.068-9.5 - Multple HT: MOM* L 4 0. 0.087 0.008.65 5.95 L 0.7 0.054-8.6 - All L 4 0.76 0.095 0.0059 4.0 4.5 Sngle L 0.55 0.05-6. - L 0. 0.088-0.4 - Multple HT: Stratfed MOM (IGLS)* L 4 0. 0.00 0.0085 5.9 6.04 L 0.7 0.058-8.50 - All L 4 0.76 0.000 0.0058 4.5 4.04 *Under heteroskedastcty, e Table A: ELL estmators of FGT poverty ndcators and ther MSE based on dfferent bootstrap procedures under -level (L) and -level (L) models wth homoskedastc (HM) and heteroskedastc (HT) level-one errors Estmator Descrpton Parameter PELL.HM.L PB-based ELL estmator under L HM model FGT & MSE PELL.HM.L PB-based ELL estmator under L HM model FGT & MSE PELL.HT.L PB-based ELL estmator under L HT model FGT & MSE PELL.HT.L PB-based ELL estmator under L HT model FGT & MSE NPELL.HM.L NPB-based ELL estmator under L HM model FGT & MSE NPELL.HM.L NPB-based ELL estmator under L HM model FGT & MSE SPELL.HT.L SPB-based ELL estmator under L HT model FGT & MSE SPELL.HT.L SPB-based ELL estmator under L HT model FGT & MSE MPELL.HM Modfed PB-based ELL estmator under L HM model MSE MPELL.HT Modfed PB-based ELL estmator under L HT model MSE MNPELL.HM Modfed NPB-based ELL estmator under L HM model MSE MSPELL.HT Modfed SPB-based ELL estmator under L HT model MSE

94 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. Annexure A. Varance Component Estmaton under -level Model wth HT Level- Errors Under the populaton model (), the cluster and HH level random effects can be estmated at frst by the moment estmators u n e and e u respectvely ks usng the LS resduals e y y. Under the consdered HT -level model, t can be shown that Var Var ht e n. u, ks ht, Cov, u u n I j, ht E e n u,, s s / / j u, ks ht e... n nu n,, s e ht u and ne ht. s E n, s s. E n n u, s s k Then the expectaton of HH and cluster level sample resdual varances can be wrtten as n n0 ht u where E s E e e..., n s n n s and E s as E s n E e e.... Cs s Under the assumpton of known s and n n n 0 s n n C C n n,. 0 ht u s s s k, the estmator ht, u ht u Cs s, n n 0 s n n k. Puttng w n n, the terms nvolved n ht u n w w n s n n n n n s s s can be obtaned from can be expressed as nn 0 n n w w n, and, s C n w e e. Then the estmator of cluster level varance component s.... becomes ht.... u w w w e e w w n s s where n. Thus the estmator depends on a functon of cluster-level ks, components whch tself a functon of HH-level error varances. An unbased, estmator of s n n. ks n n e e. ks where

08 POVERTY ESTIMATION IN BANGLADESH 95. It s easy to show that E n,. n ks ks snce E e k s e. n n k s,. Thus the ultmate plug-n estmator of consderng heteroskedastcty at the level-one s u.... s u s s ht max w w w e e w w,0. A. Varance Component Estmaton: -level Model wth HT Level- Errors Under the populaton model (4), the moment-based estmates of area, cluster and HH level random effects are calculated as n e, u n e, and e u. Under (4), t can be shown that Var Var ht e ht. n u, ks ht ht e... n n n u, s s s jks ks, ht ht Var e.. n n u n,, ht ht E ne. n u,, s s n ks ht ht E ne.. n n0 u n s s s jks js jks ht ht u, s s, E e n, and, ht u. ht, s s s s E e n n Now the expectaton of the sample resdual varances become n n ht n n ht ht ht, u u E s n a a a 0 0 s n n E s C n n C n n C n n ht ht a au a, and n0 n0 s ht n n0 ht E s D n n, u D D ht ht s, s 0 u s 0 s ks s ht ht a au a where n n n. 0 s The estmators u ht jks ht, and can be derved drectly from the second and thrd ht ht smultaneous equatons, as s a u a a ht ht s a a u a respectvely. Puttng ht nto u ht, we can obtan and

96 SUMONKANTI DAS, HUKUM CHANDRA AND RAY CHAMBER [Volume 6 Nos. ht aa aa as as aa aa. In smlar way, puttng new ht n ht we can obtan u ht u aa aa as as aa aa. The terms nvolved n ht and ht can be smplfed as u a a 0 0 0 0 0 n n n n n n n n n n 0 n n 0 s s a a, n n n aa aa n n0 n, 0 n0,, n s n n jks s s n n ks n n 0 aa aa,, n s n n. ks s n n jks C D. Then the ultmate estmators of cluster and area level varance where n s components become D n n 0 s Cs n n0 s ht Cs D u n n 0 n n 0 n n 0,, s Cs D s n n ks s n n jks C,, s s D s n n 0 s n n, ks s n n jks ht s n n n n n n n n n n 0 0, 0, 0 s s ks s jks n n0 s Cs n0 n0 s D n n s 0 s w The complex known terms nvolved n ht and ht u can be expressed n terms of n n and w n n as n n n w w w s s js s s s n n n n n w w n, n n0 n w w, 0, C s n e. e... n w e. e..., s s..... D s n e e n w e.. e..., n w w n s and s s, s n n n w w, 0 s

08 POVERTY ESTIMATION IN BANGLADESH 97 s n n 0 0 n s w. ht u w js ht The estmators and shown n Secton. can be obtaned by replacng the complex terms wth these expressons. The detal dervatons of these two ndces are gven n Das (06).