A Comparison of RAS and Entropy Methods in Updating IO Tables

A Comparson of RAS and Entropy Methods n Updatng IO Tables S. Amer Ahmed 1 and Paul V. Preckel 2 PRELIMINARY VERSION PLEASE DO NOT DISTRIBUTE OR CITE WITHOUT PERMISSION COMMENTS WELCOME Selected Paper prepared for presentaton at the Amercan Agrcultural Economcs Assocaton Annual Meetng Portland, Oregon, July 29-August 1, 2007 1 S. Amer Ahmed (contact author), Ph.D. Canddate, Department of Agrcultural Economcs, Purdue Unversty, 403 W. State St., West Lafayette, IN 47907. Emal: saahmed@purdue.edu 2 Paul V. Preckel, Professor, Department of Agrcultural Economcs, Purdue Unversty, 403 W. State St., West Lafayette, IN 47907.

Abstract Snce the frst half of the 20 th century, the nput-output (IO) table has been the backbone of much emprcal work used to support polcy analyss and develop economywde models. The need for accurate, up-to-date IO tables s thus essental for establshng the valdty of the emprcal work that follows from them. However, the constructon of an IO table for any gven country s an expensve and tme-consumng endeavor. Current and accurate IO tables for many countres are thus often dffcult to obtan on a regular bass. Once an ntal IO table has been constructed, a common workaround s to collect partal nformaton for subsequent perods, such as fnal demands for commodtes wthn the economy, and then employ a Bayesan parameter estmaton technque to determne values for a new IO matrx usng the prevous perod IO table as a pror. Two such technques to acheve ths are RAS and Mnmum Cross Entropy (CE). The lterature has largely gnored the queston of the relatve merts of these two methods. Ths paper uses the actual IO tables for South Korea from two dstnct tme perods to compare the accuracy of the RAS and CE methods. The 1995 IO table for Korea s updated to 2000 usng column and row totals from the true 2000 IO table usng both RAS and CE methods. The estmated IO tables are then compared to the actual 2000 IO table n order to make some observatons on the relatve accuracy of the methods. The sums of squared devatons of the estmates tables from the true tables are used as the man nstrument to measure devatons of the updated matrces from the true year 2000 IO matrx. It s found that the CE approach s more accurate than the RAS approach, based on the lower summed squared devatons of the elements of the CE estmated 2000 matrx from the elements of the true 2000. 2

The maxmum absolute dfferences between the true and estmated tables were also calculated. It was found that the maxmum absolute dfference between CEestmated table and the true posteror table was smaller than the dfference between the RAS-estmated table and the true posteror. 3

1. Introducton Snce the frst half of the 20 th century, the nput-output (IO) table has been the backbone of much emprcal work used to support polcy analyss and develop economywde models. The need for accurate, up-to-date IO tables s thus essental for establshng the valdty of the emprcal work that follows from them. However, the constructon of an IO table for any gven country s an expensve and tme-consumng endeavor. Current and accurate IO tables for many countres are thus often dffcult to obtan on a regular bass. A common workaround s to collect partal nformaton for subsequent perods, such as fnal demands for commodtes wthn the economy, and then employ a parameter estmaton technque to determne values for a new IO matrx usng the prevous perod IO table as a pror. Two such technques to acheve ths are RAS and Mnmum Cross Entropy (CE). Ths paper wll examne the lterature assocated wth these two parameter estmaton technques wthn the context of updatng IO tables usng partal nformaton. We wll address the theoretcal underpnnngs of each method and then provde an applcaton of each based on real world data. A comparson of an actual IO table from a subsequent tme perod to a table updated from a benchmark dataset would reveal the effcacy of these parameter estmaton technques. Usng the 1995 IO table for Korea and partal nformaton from the 2000 IO table of the same country, RAS and CE wll be used to obtan estmated IO tables for 2000. The two estmated tables wll then be compared to the true table for that year n an attempt to provde nsght as to whch method was better suted to ths task. 3 3 GAMS code for ths paper avalable upon request. 4

2. The Structure of the IO Table Ths secton brefly descrbes the generalzed IO table. An understandng of the structure of the IO table wll be relevant for the dscusson of the Korean data to be used n Secton 4. The precursor to the modern IO table was Franços Quesnay s tableau économque 4 (Pressman, 1994). Ths economc table used a matrx framework to provde nformaton on the sale-purchase relatonshps between dfferent producers prmarly n agrculture wthn an economy. The key underlyng assumpton to ths framework was that nputs used by a gven ndustry were related to the output by a lnear and fxed coeffcent producton functon. Ths bascally means that the nput-output relatonshps descrbed by the rows and columns of the table are representatve of a producton technque (UN Handbook, 1999). The ndustres ndcated by the rows are thus producng a commodty (output) that s the nput n an ndustry ndcated by the columns. The row totals thus sgnfy total sales of ndustres, whle the column totals are the total costs. So, IO tables can be thought of as n x n matrces, wth n beng the number of ndustres n the economy. They are often represented as 2n x n, wth there beng twce the number of ndustres provdng nputs. The frst n rows are often for domestc nputs, whle the second n are for mports. In addton to the 2n rows and n columns, there are addtonal columns (to ndcate fnal demand) and rows (to ndcate value added and taxes). The column totals together wth the value added must always equal the sum of the row totals and fnal demand n order for the IO table to balance. 4 Quesnay s Tableau économque was frst publshed n 1758 and s now publshed n facsmle by the Brtsh Economc Assocaton, London. 5

3. Theoretcal Foundatons 3.1 Updatng wth RAS The RAS method s an teratve method of bproportonal adustment of rows and columns that has been ndependently developed by varous researchers, such as Kruthoff and Shelekhovsk n the 1930s. In 1961, Stone adapted the technque for use n updatng IO tables from the work of Demng and Stephan (UN Handbook, 1999). RAS s bascally an teratve scalng method whereby a non-negatve matrx, M, of dmenson x s adusted untl ts column sums and row sums equal gven vectors u * and v * (Schneder and Zenos, 1990). Ths adustment s acheved by multplyng each row by a postve constant so that the row total equals the target row total. Ths operaton would alter the column totals. The columns would then be multpled by constants to make ther totals correspond to the target column totals. Ths sequence of row and column multplcaton would contnue untl both the column and row totals converge to the target vectors. To llustrate, let us consder the matrx M, where M, s the vector of column totals. From ths matrx we can obtan ts matrx of coeffcents 5, A 0, as seen below. A 0 = M / M EQ (1) By pre and post multplyng ths matrx by the vectors r and s, the new matrx of coeffcents A 1 s obtaned. These two vectors are the vectors of target row and column totals. Ths matrx of coeffcents s now ready to undergo a sequence of teratve multplcatons, whch can be seen from equatons (2) through (4f). The multplcaton of 5 To avod confuson, matrx of coeffcents wll refer to a matrx wth elements dvded by column totals n ths paper, whle nput-output table wll refer to the table of nput-output value flows. 6

the ntal coeffcent matrx by the row and column multplers gves ths method ts name. A 1 = r A s EQ (2) 0 The teratve process s as follows, and can be seen below. The orgnal matrx of coeffcents s multpled by the row of target column totals, M * to obtan the matrx F. F 0 * = A M EQ (3) The row totals of ths matrx are represented n the vector u. The rato of u * to u s the multpler r. Multplyng r and F, we obtan a new F. Row vector v of column totals s obtaned and used to calculate the multpler s. F and s are then multpled. The entre sequence of operatons can be seen n equatons (4a-f) u = F EQ (4a) * r = u / u EQ (4b) F = r F EQ (4c) v = F EQ (4d) * s = v / v EQ (4e) F = s F EQ (4f) The teratve process n equatons (4a-f) then contnues untl the condtons u = u * and v = v * are met. At that pont, the matrx F s assumed to be the best estmate of the true posteror matrx M *. 3.2 Updatng wth Mnmum Cross Entropy The CE technque based on nformaton theory due to Shannon (1948) descrbes the entropy of a probablty dstrbuton as the measure H, where p k s the probablty of an event k occurrng (Equaton (5)): 7

H p) ( p k LNp k EQ (5) = k As reported by Robnson et al. (2001), from the work of Thel (1967), the entropy theory can be appled as follows. Wth cross entropy there s a set of K events wth probabltes q of occurrng. These probabltes form a pror. When confronted wth a new set of nformaton, we may desre to fnd a set of posteror probabltes p that are close to the pror whle satsfyng the restrctons emboded n the new nformaton. Kullback and Lebler (1951) develop one measure of closeness, namely, the crossentropy of the probablty dstrbuton descrbed by p, wth respect to q. Ths measure s denoted by Equaton (7a): I ( p : q) = p LN ( p q ) EQ (7a) Golan et al. (1994) and Golan et al. (1996) apply the Kullback-Lebler CE measure to estmate coeffcents n nput-output tables wth the dea beng to mnmze I subect to data consstency, normalzaton-addtvty, and new nformaton constrants (X and y) n Equatons (7b-c). y = Xp EQ (7b) p 1' = 1 EQ (7c) Ths partcular paper wll follow the above method as used by Robnson et al. (2001) to estmate posteror IO table coeffcents. After dvdng the elements of the pror IO table by ts column totals, a matrx of pror probabltes s obtaned, q. Then, q and p the matrx of posteror probabltes are then used n the Kullback-Lebler CE measure as the obectve functon to be mnmzed (Equaton (8) on the next page). 8

MIN : p LN ( p q ) EQ (8) The obectve s mnmzed subect to constrants. One of them s the addtvty constrant that says that all the posteror probabltes column totals sum to one. X and Y are known column and row totals of the posteror IO tables and they form the data consstency constrants. All the constrants can be seen below n Equatons (9a-c). p = 1 EQ (9a) p X = Y EQ (9b) p X = X EQ (9c) Once the problem has been solved, the values of the p X are the values of the estmated posteror IO table. 3.3 RAS versus CE as Parameter Estmaton Technques The lterature has largely gnored the queston of the relatve merts of these two methods. Robnson et al. (2001) addresses ths gap by comparng RAS and CE as parameter estmaton technques n the context of a 1994 SAM for Mozambque. The paper conducts smulatons startng wth the balanced SAM before randomly changng some row and column totals. The SAM s then updated usng both the RAS and the CE methods. From the comparson of these two methods, they found that f the focus s on column coeffcents, then the CE method appears to be superor to RAS. However, f the focus s on SAM flows, then the two methods are very smlar, wth RAS performng slghtly better. As mentoned n ther paper and n McDougall (1999), the RAS and CE are equvalent measures RAS beng entropy theoretc f the CE method uses as an 9

obectve a sngle cross-entropy measure nstead of attemptng to use the sum of column cross-entropes. Also, more data can be ncorporated n the CE formulaton, pckng up changes n the flows across the matrx, and thus provdng a more accurate update or estmate. RAS on the other hand s confned to usng ust column and row totals for the estmaton technque, and would not be able to use addtonal nformaton should t be avalable. 4. Comparson of RAS and CE wth Korean IO Tables 4.1 Korea as an Applcaton Whle Robnson et al. (2001) used Monte Carlo smulatons to obtan a perturbed SAM to update, we use the actual IO tables for South Korea from two dstnct tme perods to compare the accuracy of the RAS and CE methods. The 1995 IO table for Korea s updated to 2000 usng column and row totals from the true 2000 IO table usng both RAS and CE methods. The estmated IO tables are then compared to the actual 2000 IO table n order to make some observatons on the relatve accuracy of the methods. The Korean IO tables for 1995 and 2000 are obtaned from the Global Trade Analyss Proect (GTAP). The 1995 IO table was used n the GTAP Database Verson 5, whle the 2000 table was used Verson 6 of the database 6. The IO tables from GTAP have the advantage of beng unform n ther format and structure whch makes t unnecessary to worry about cross-comparablty of the two. Often IO tables used n the fnal GTAP Database must undergo some form of matrx balancng or data constructon n order to provde more complete nformaton n the fnal product (McDougall, 2002). These tables however were obtaned before any of 6 Documentaton on these IO tables can be found at Dmaranan and McDougall (2002) and Dmaranan and McDougall (2006). 10

those methods were used. For more nformaton on the format of the GTAP IO tables please refer to Huff et al. (2000) and Walmsley et al. (2002). Two IO matrces for each year are obtaned one wth taxes ncluded, and the other wthout. Through addton and subtracton of column totals between the two matrces, taxes are removed from the matrx core, added together and placed n a separate row. Asde from the rows wth nformaton on value added, there are 2n rows for commodtes, wth one beng for domestc goods and the other for mports. For a partcular column (use ), the domestc and mported quanttes of the commodtes are added together, to collapse the 2n x n core matrx nto an n x n matrx. Once that s done, the total mports for each of the rows was subtracted from the row total. At ths pont, the sum of all costs, value added, and taxes - the column totals s equal to the sum of all the sales and fnal demand mnus mports the row totals. The IO table row and column totals equal each other. Also, n the GTAP-ready IO tables that we use, there s a commodty known as dwellngs (dwe) whch s bascally the mputed rent for resdences. Ths faux-commodty s absent n the 1995 tables, but was ncluded n the 2000 tables. We exclude ths commodty from the 2000 table, to have a far comparson between the tables from the two perods. The focus of ths paper s only on updatng the core IO table,.e. the ffty-sx by ffty-sx table of transactons between ndustres that s left after the fnal demand, value added, taxes, and negatve mports rows and columns are removed. The IO core represents the ntermedate demand and supply relatonshps between ndustres. 11

4.2 Data The pror n ths paper s the mport and tax adusted 1995 core IO table. Ths table, as mentoned before s ffty-sx by ffty-sx, for ffty-sx commodtes, wth all values beng reported n mllons of Korean won. About 42.9% of the cells are zeroes. Some of these blocks of zero values can be explaned by ndustral structure. For example there s no ol produced n Korea, and so the entre column for ol s zeroed out. Smlarly, agrcultural products lke food gran are not used by the constructon ndustry. Examnng the empty values n the matrx, we can see that there are three commodtes that are not produced n Korea. The 2000 core IO table the posteror exhbts smlar features. About 40.2% of the cells are empty. Between 1995 and 2000, about 5% of cells n the IO matrx underwent a change from ether a zero to a postve value, or vce versa. Some ndustres thus underwent technology changes such that ther choces of ntermedate nputs changed. 4.3 Formulaton The formulaton for each method s as was descrbed n sectons 3.1 and 3.2 7. The pror matrx s taken as gven and the elements of each cell are dvded by the column totals to obtan a matrx of coeffcents. As posteror or target nformaton, the column and row totals of the 2000 IO table are used. After the estmaton methods are used on the coeffcent matrces to update the 1995 IO matrx to be consstent wth 2000 row and column totals, the elements of each cell are multpled by the target column totals to obtan posteror matrces that are the best estmates of the true IO table. 7 A summary of the CE formulaton s avalable n the Appendx. 12

4.4 Results The RAS-estmated and CE-estmated matrces are compared to the true 2000 core IO matrx. As was noted before the nputs used n some ndustres can change dramatcally between years, wth some nputs droppng out of some producton processes whle appearng n others. So, the estmated tables may have zero elements where the true IO table has postve values. As a result, when measurng devaton between the estmated tables and the true table, t s possble that we wll get some results that are very naccurate, especally f the true table s values for the correspondng zero cells s very large. Several dfferent metrcs are used for measurement of the methods accuracy, the frst of whch s the sum of squared errors (SSE). For each estmaton method, two SSE values were calculated, comparng the sum of the squared dfferences between the matrx of coeffcents for the true and the estmated matrces of coeffcents. For any gven estmaton method, SSE1 s the comparson of only the cells whch were nonzero n both the pror and the true IO tables. SSE2 measures the devaton for all cells n the pror and true tables. The matrces of coeffcents are compared frst. As wll be recalled, these matrces are bascally the full matrces wth the cells n a column dvded by the column total. By dong ths, we normalze wth respect to the columns to measure the relatve devaton. The values can be seen n Table 1. 13

Table 1: Comparson of SSE Values from Matrces of Coeffcents for Each Estmaton Technque 8 SSE1 SSE2 RAS 1.017 1.113 CE 0.792 0.796 Lookng down the column SSE1, t can be seen that the CE approach s superor to the RAS approach. The SSE1 measure for the CE approach s 22% smaller than the SSE1 value for the RAS approach. The results of Table 1 thus support the fndngs of Robnson et al. (2001) n that the CE approach provdes a posteror table much closer to a true table than a posteror estmated wth a RAS technque, when the matrces of coeffcents are beng compared. The more general SSE2 metrc consders all cells n the matrx ncludng those that undergo a change from zero to non-zero or vce versa between the two perods. The CE measure s agan superor to the RAS method. The conclusons to be drawn from Table 1 do not change when we calculate the SSE1 and SSE2 values for not ust the matrces of coeffcents, but rather the true and estmated nput-output tables. Table 2 compares these values and shows that the CE approach s once agan superor to the RAS approach. When the nput-output tables are compared, the SSE measure values for the CE approach are found to be smaller by 34%- 38% than the values obtaned from the RAS method. 8 The SSE between the RAS estmated posteror and the CE estmated posteror s 0.172. 14

Table 2: Comparson of SSE Values from Input-Output Tables for Each Estmaton Technque SSE1 SSE2 RAS 3.774 X 10 14 4.016 X 10 14 CE 2.488 X 10 14 2.488 X 10 14 In addton to the SSE1 and SSE2 measures, the maxmum absolute dfferences between the true posteror and the estmated posteror tables were calculated, and can be found n Table 3. As can be seen, the CE approach yelds smaller maxmum absolute dfferences when consderng both the matrces of coeffcents and the estmated nputoutput tables. Table 3: Comparson of Maxmum Absolute Dfferences Between True and Estmated Posteror Tables Matrx of Coeffcents Input-Output Table RAS 8088577.010 0.280 CE 5231717.586 0.275 4.5 Dscusson on Specfcaton Improvements We mght have expected the varaton between the RAS-estmated and the CEestmated tables to be smaller, snce RAS s an entropy-based technque, and the two methods should provde smlar results (McDougall, 1999). One possble reason that the CE method performed better than the RAS technque s that the RAS method of bproportonal adustment s less able to deal wth the cells that were zero n the pror but postve n the true posteror, and vce versa. A common strategy employed to work around ths problem s to smear the data. 15

Ths data smear conssts of takng a small percentage of the value from the nonzero elements of the row, and then redstrbutng the collected value across the cells wth zero values, mantanng row totals. Ths redstrbuton of value s equvalent to a tax on the ndustres usng the nput, and as a subsdy on the ndustres that dd not use pror to the smear, but do so now. Ths paper dd not use any data smearng technque on the data snce the purpose of ths exercse was to examne the power of each estmaton technque gven a real world IO matrx. However, f the zero-value cells had been smeared, then t s possble that both the estmaton methods mght have produced posteror tables would values closer to the true IO matrx. In addton to data-smearng, there are several dfferent ways of specfyng the RAS and CE methods. Ths paper used a very smple teratve specfcaton of the RAS. Most nput-output analysts employ more creatve specfcatons of the bproportonal adustment method. 5. Conclusons RAS and CE are two parameter estmaton technques that have a long hstory n updatng and estmatng IO data. Although they share theoretcal smlartes, they are of varyng accuracy. It s the consensus n the lterature that RAS may be better than CE when flows n updated IO tables are beng compared. Conversely, CE s supposed to appear more accurate than RAS when coeffcent matrces of updated IO tables are consdered. CE also has greater flexblty n ncorporatng more nformaton n the estmaton process. Ths paper compares coeffcent matrces of IO tables updated by both methods to the true posteror IO table, and fnds that CE s more accurate than CE. The SSE values 16

of CE-estmated matrx were found to be sgnfcantly smaller than the SSE values of the RAS-estmated matrx. The maxmum absolute dfference between the true nput-output table and the CE-estmated table was also found to be smaller than the maxmum absolute dfference between the true posteror table and the RAS-estmated table. The accuracy of these methods are senstve to creatve strateges used by nputoutput analysts n dealng wth these maor structural shfts n the economy. Changng the cells wth no value n the pror to a very small postve value whle mantanng the row total n a process know as data smearng may have a sgnfcant effect on the accuracy of the technques. Alternatve specfcatons of the estmaton methods may also provde more accurate results, and wll be the focus of future work. 17

REFERENCES Dmaranan, B. and R.A McDougall (eds) (2002) Chapter 11E, Global Trade, Assstance, and Producton: The GTAP 5 Database, Center for Global Trade Analyss, Purdue Unversty. Dmaranan, B. and R.A McDougall (eds) (2006) Chapter 11D, Global Trade, Assstance, and Producton: The GTAP 6 Database, Center for Global Trade Analyss, Purdue Unversty. Golan, A., G. Judge, and S. Robnson (1994) Recoverng Informaton from Incomplete or Partal Multsectoral Economc Data. Revew of Economcs and Statstcs, 76, pp185-193. Golan, A., G. Judge, and D. Mller (1996) Maxmum Entropy Econometrcs, Robust Estmaton wth Lmted Data. John Wley and Sons. Huff, K., R.A. McDougall, and T. Walmsley (2000) Contrbutng Input-Output Tables to the GTAP Data Base. GTAP Techncal Paper No. 1, Release 4.2. Center for Global Trade Analyss, Purdue Unversty. Kulback, S and R.A. Leber (1951) On nformaton and suffcency. Annals of Mathematcal Statstcs, 4, pp.99-111. Robnson, S A. Cattaneo, and M. El-Sad. (2001) Updatng and Estmatng a Socal Accountng Matrx Usng Cross-Entropy Methods. Economc Systems Research, Vol 13., No. 1. McDougall, R.A. (1999) Entropy Theory and RAS are Frends. GTAP Workng Paper No. 6. Center for Global Trade Analyss, Purdue Unversty. McDougall, R.A. (2002) V5 Documentaton - Chapter 19: Updatng and Adustng the Regonal Input-Output Tables. Center for Global Trade Analyss, Purdue Unversty. Pressman, S. (1994) Quesnay s Tableau économque : A Crtque and Reassessment. Frst ed., Farfeld, NJ: A.M. Kelley. Schneder, M.H. and S.A. Zenos (1990) A Comparatve Study of Algorthms for Matrx Balancng. Operatons Research, Vol. 38, No.3, pp. 439-455. Shannon, C.E. (1948) A mathematcal theory of nformaton. Bell System Techncal Journal, 27, pp. 379-423. Thel, H. (1967) Economcs and Informaton Theory (Amsterdam, North Holland). 18

UN Department of Economcs and Socal Affars. (1999) Handbook of Input-Output Table Complaton and Analyss. Studes n Methods Handbook of Natonal Accountng: Seres F. Unted Natons, NY Walmsley, T. and R. McDougall (2002) V5 Documentaton - Chapter 11.A: Overvew of the Regonal Input-Output Tables. Center for Global Trade Analyss, Purdue Unversty. 19

CE Formulaton APPENDIX p matrx of coeffcents of the posteror to be estmated q matrx of coeffcents of the pror X column totals of true posteror Y row totals of true posteror, commodtes p LN( p q ) the Kullback-Leber entropy measure to be mnmzed p = 1 sum of column coeffcents equal one. p X = Y row totals of posteror IO table equal row totals of true table. p X = X column totals of posteror IO table equal column totals of true table. 20