A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

62626262621 Journal of Uncertan Systems Vol.5, No.1, pp.62-71, 211 Onlne at: www.us.org.u A Smple and Effcent Goal Programmng Model for Computng of Fuzzy Lnear Regresson Parameters wth Consderng Outlers H. Omran 1, 2,, S. Aabdollahzadeh 2, M. Alnaghan 1 1 Department of Industral Engneerng, Iran Unversty of Scence and Technology, Tehran, Iran 2 Department of Industral Engneerng, Urma Unversty of Technology, Uromyeh, Iran Receved 13 July 29; Revsed 2 August 21 Abstract Fuzzy lnear regresson has been studed by several researchers snce the past three decades. Estng outlers n the data set, causes the result of fuzzy lnear regresson be ncorrect. In ths paper, a smple and effcent model s suggested for computaton of fuzzy lnear regresson wth outlers. The proposed method s based on Goal Programmng technque and for estmaton upper and lower fuzzy bands, two separated lnear programmng models are calculated. The proposed method mnmzes the estmaton error between observed and estmated values and has better performance n comparson wth prevous approaches. The proposed model s less senstve to outlers and also, we do not need to select any parameters beforehand. The performance of proposed model s llustrated by solvng several eamples and comparng the results wth the prevous studes. 211 World Academc Press, UK. All rghts reserved. Keywords: fuzzy lnear regresson, fuzzy lnear programmng, outler 1 Introducton The purpose of regresson model s analyzng the relatonshp between dependent and ndependent varables based on the gven data. In 1982, Tanaa et al. [2] ntroduced Fuzzy Lnear Regresson (FLR). He modeled the procedure of parameter estmaton as a lnear programmng problem, where the nputs are crsp and the output s a fuzzy number. In order to estmate regresson parameters, they appled lnear programmng and mnmzed the total spread of the fuzzy parameters subect to coverng the observed values by estmated values. Although ther approach s mproved by many researchers [3, 8, 15, 17], ths approach s stll one of the most frequently and smplest methods for estmatng parameters of fuzzy regresson. Generally, there are two approaches n fuzzy regresson analyss. Frst, Lnear programmng based on whch tres to mnmze the fuzzness of the model by mnmzng the total spreads of ts fuzzy coeffcents, subect to ncludng the data ponts of each sample wthn a specfed feasble data nterval [19, 2, 21]. Ge and Wang [7] tred to determne the relatonshp between threshold value and nput data when data contans a consderable level of nose or uncertanty. They used the threshold value to measure degrees of ftness n fuzzy lnear regresson. Eventually, they showed that the parameter h s nversely proportonal to the nput nose. Also, many researchers recommended a combnaton of fuzzy regresson models wth some other approaches, le Monte- Carlo methods [1] to mprove the result obtaned from ordnary LFR. Second, Least squares method, whch mnmzes the sum of squared errors n the estmated value, based on ther specfcatons [1, 4, 6, 7, 22]. Ths approach s ndeed a fuzzy etenson of the ordnary least squares, whch obtans the best fttng to the data, based on the dstance measure under fuzzy consderaton, applyng nformaton ncluded n the nput output data set. One of the mportant problem assocated wth the Tanaa approach s the nfluence of outlers on the predcted upper and lower fuzzy bands. The Tanaa model s very senstve to outlers and also the outlers mae the fuzzy lnear regresson not to be able correct predctng. There are many studes whch dscuss about handlng the problem of outlers [2, 3, 9, 11, 15]. More of the mentoned models need for selectng some parameters beforehand. As there Correspondng author. Emal: omran57@ust.ac.r (H. Omran); Tel.:+98-217724129, Fa: +98-217724482.

Journal of Uncertan Systems, Vol.5, No.1, pp.62-71, 211 63 could not be a systematc method for defnng parameters, the problem for these models s to select the parameters n advance. Ths paper presents a smple model for computaton of fuzzy lnear regresson wth and wthout outlers. The organzaton of ths paper s as follows. In Secton 2, the fuzzy lnear regresson s ntroduced. Secton 3 eplans the proposed model. The numercal eamples and results are reported n Secton 4. Fnally, conclusons are ncluded n the last secton. 2 Fuzzy Lnear Regresson Tanaa et al. [2] proposed the fuzzy lnear regresson (FLR) model n the case of crsp nput and fuzzy output data set as follow: ˆ 1 1 2 2... Y = A + A + A + + A (1) where A = ( α, c), =,1,..., s assumed to be a symmetrc trangular fuzzy number wth center a and half-wdth c, c. To estmate A, Tanaa et al. [2] appled followng model: mn = subect to: = = c ( a + (1 H) c ) y + (1 H) e = 1,2,..., n ( a (1 H) c ) y (1 H) e = 1,2,..., n a = free c, =,1,...,. (2) In model (2), n s the number of observatons and H [,1] s the threshold level to be chosen by decson maer. Later, the model (2) has been modfed by other researchers [16, 19]. They suggested the obectve functon should be as mn n c to prevent of beng ' = = c s =. As mentoned, Tanaa et al. [2] approach s very senstve to outlers. In the other words, f the outlers est n the data set, the Tanaa model s not able to predct upper and lower fuzzy bands, correctly. Chen [3] and Peters [15] proposed the models to handle the outlers' problem. Chen [3] dscussed the outler problem by applyng the followng model: mn n = = subect to: = = = c ( a + (1 H) c ) y + (1 H) e = 1,2,..., n ( a (1 H) c ) y (1 H) e = 1,2,..., n a = free c, =,1,..., c e (3) where s a lmtng value whch should be assgned by decson maer. The problem wth Chen [3] approach s how to choose value. Although Chen [3] proposed some methods for selectng, but there s stll some problem n defnng the sutable value. The Peters [15] model s presented as follow:

64 H. Omran, S. Aabdollahzadeh and M. Alnaghan: A Smple and Effcent Goal Programmng Model ma λ subect to: = = = ( a + c ) y (1 λ ) e = 1, 2,..., n ( a c ) y + (1 λ ) e = 1, 2,..., n λ = ( λ + λ +... + λ ) / n 1 2 P(1 λ ) λ 1, = 1,..., n, λ a = free c, =,1,...,. c n (4) The problem wth Peters [15] model s also selectng P and the selecton of dfferent P 's would result n dfferent outcomes. 3 Proposed Model In ths secton, the proposed model s eplaned. Ths model apples Goal Programmng (GP) technque for estmatng the fuzzy regresson lnear parameters. Let y U, y and y L be the upper, center and lower ponts of th observed fuzzy data, respectvely and y ˆU, y ˆL be the upper and lower pont of the th predcted nterval. Moreover, yˆl are predcted fuzzy upper and lower bands whch are shown n Fgure 1. In ths model, t s allowed yˆl to be larger than y L, but must be smaller than y, and y ˆU s allowed smaller than y U but must be greater than y. In other words, y s consdered as upper band of yˆl and lower band of yl smultaneously. In fact, the obectve of the proposed model s mnmzaton of the sum of devatons of yˆl from y and y L, and the sum of devatons of yˆu from y and y U. In other word, to obtan y ˆL, y s selected as upper pont (nstead of y + (1 H) eas the other models), and to obtan y ˆU, y s selected as lower pont (nstead of y (1 H) e). Note that the fuzzy bands yˆl are calculated, separately here n: Frst, a GP model s solved to fnd lower fuzzy band ( y ˆL ), then, another model s mplemented to get upper fuzzy band ( y ˆU ). In prevous studes, the estmated FLR parameters are affected by outlers, because the upper and lower ponts of fuzzy data are used, smultaneously. Snce, n proposed model y, whch s less senstve than outlers, s used nstead of upper and lower ponts of fuzzy data, the model can estmate the FLR parameters wth least error. So, the man dfference of proposed model and prevous models could be llustrated to be the way of handlng outlers wthout selectng any parameters n advance. To obtaned y ˆL band, the model (5) s solved. n mn ( d + + d + d + + d ) = subect to: = = U U L L ( a + (1 H) c ) + d d = y, = 1, 2,..., n U U ( a (1 H) c ) + d d = y (1 H) e = 1, 2,..., n L L d, d, = 1, 2,..., n U L a = free c, =,1,...,. (5)

Journal of Uncertan Systems, Vol.5, No.1, pp.62-71, 211 65 In the frst constrant, the y + (1 H) e s replaced by y n order the upper pont to be the y values when predctng y ˆL. In the model (5), du du s the dstance between y and y ˆL and dl dl s the dstance between lower pont of H-certan observed nterval and y ˆL. Thus, the sum of two devatons should be mnmzed. The y ˆU band s obtaned by solvng GP model below: n mn ( d + + d + d + + d ) = subect to: = = U U L L ( a + (1 H) c ) + d d = y + (1 H) e, = 1,2,..., n U U ( a (1 H) c ) + d d = y = 1,2,..., n L L d, d, = 1,2,..., n U L a = free c, =,1,...,. (6) In each models (5) and (6), there s one estmated band for lower and upper ponts. The upper lne of model (5) (lower band) and the lower lne of model (6) (upper band) are located around y 's and could be elmnated to decrease the predcted error (see Fgure. 1). Thus, the lower and upper fuzzy bands are: yˆ L = ( αl cl) + ( α1l c1l) (7) yˆ = ( α + c ) + ( α + c ) (8) U U U 1U 1U where α L and c L are the estmated values for y ˆL, and α U and c U are the estmated values for y ˆU. Fgure 1: The graphcal eplanaton of the proposed model There are three cases n the lnear fuzzy regresson analyss: (a) Constant spread; (b) Increasng spread; (c) Decreasng spread.

66 H. Omran, S. Aabdollahzadeh and M. Alnaghan: A Smple and Effcent Goal Programmng Model In the case (a), the predcted nterval should lay between two parallel lnes. Snce, n the proposed model, t s probable that the slops of yˆl lnes to be dfferent, the means of α ' s and c ' s ( ) can be used as the slops of yˆl lnes. The means of α ' s and c ' s ( ) are calculated as follow: 1 α = ( α L + α U ) 2 = 1,...,, (9) 1 c = ( cl + cu ) 2 = 1,...,, (1) where α L and c L are the estmated values for y ˆL, and α U and c U are the estmated values for y ˆU. In the cases (b) and (c), t s not necessary that the slops of yˆl lnes to be equal. Hence, some of the estmated α ' s and c ' s ( ) are used. 4 Results To llustrate the capablty of the proposed model, here three eamples are solved. The frst two eamples are wth outlers and the last eample s wthout outlers. Eample 1: Table 1 lsts the numercal values used by Chen [3]. In ths eample, three cases A, B and C wth one outler pont are consdered. Table 1: Outlers wth constant, ncreasng and decreasng spread (y,e ) A: Constant spread B: Increasng spread C: Decreasng spread 1 (8.,1.8) (11,2) (11,12) 2 (6.4,2.2) (13,2) (13,12) 3 (9.5,2.6) (21,4) (21,1) 4 (13.5,2.6) (29,4) (24,1) 5 (13.,2.4) (29,6) (31,8) 6 (15.2,2.3) (34,6) (34,8) 7 (17.,2.2) (45,15) a (42,4) 8 (19.3,4.8) a (44,8) (44,15) a 9 (2.1,1.9) (48,12) (51,2) 1 (24.3,2.) (54,12) (54,2) a Indcates the outler. In all cases, to fnd y ˆL and y ˆU, the models (5) and (6) are appled wth H=. The results are shown n Table 2. Note that the slops of yˆl lnes are modfed by usng of equatons (7) and (8). As shown, the value of n c n the proposed model s smaller than n comparson wth Tanaa et al. [2] and Chen [3] models. The = 1 = value of n = 1 = c n the proposed model s obtaned from combnaton of models (5) and (6) as follow: c = c + c n n n (11) = 1 = = 1 = S = 1 = 1 S2

Journal of Uncertan Systems, Vol.5, No.1, pp.62-71, 211 67 where S 1 and S2 show c 's obtaned from models (5) and (6), respectvely. Fgures 2, 3 and 4 show the results of above models graphcally. Table 2: Comparson between Tanaa et al. [2], Chen [3] and proposed model Model n c Results = 1 = Tanaa 44.4573 y = (4.43,3.67) + (1.86,.14) Case (A) Case (B) Chen 32.75 y = (4.75,4.55) + (1.85,.15) Proposed model 23 yˆ = (2.65 1.15) + (1.85) L yˆ = (5.55 + 1.15) + (1.85) Tanaa 123.75 y = (4.51,.65) + (5.7,2.13) Chen 95. y = (5.76,.9) + (4.95,1.38) Proposed model 69.68 U yˆ = (5.81.48) + (4.19.52) L yˆ = (6.7 +.42) + (5.29 +.58) U a a Case (C) Tanaa 144.469 y = (4.76,13.1) + (4.9,.24) Chen 89.375 y = (5,16.5) + (4.875, 1.375) Proposed model 8 a The means of α1l, α1 U and c1l, c1 U are used, respectvely. yˆ = ( 1.14 3.8) + (5.4) L yˆ = (11.4 + 4.2) + (4.6) U 3. 25. 2. y 15. 1. 5.. 1 2 3 4 5 6 7 8 9 1 yl yu Tanaa L Tanaa U Chen L Chen U y^l y^u Fgure 2: Case A - Comparson between Chen [3], Tanaa et al. [2] and proposed model

68 H. Omran, S. Aabdollahzadeh and M. Alnaghan: A Smple and Effcent Goal Programmng Model 9 8 y 7 6 5 4 3 2 yl yu tanl tanu chenl chenu y^l y^u 1 1 2 3 4 5 6 7 8 9 1 Fgure 3: Case B - Comparson between Chen [3], Tanaa et al. [2] and proposed model 8 7 y 6 5 4 3 2 1 yl yu tanl tanu chenl chenu y^l y^u -1 1 2 3 4 5 6 7 8 9 1 Fgure 4: Case C - Comparson between Chen [3], Tanaa et al. [2] and proposed model Eample 2: Ths eample has also the outler problem. The dfference between ths eample and prevous eample s the estence of the pont wth small e and large y. These data were used by Nasrabad et al. [13]. In ths case, the values of α 1 ' s need to be modfed. The results of Nasrabad et al. [13] model and proposed model are shown n Table 3 and Fgure 5. As shown, the value of n = 1 = c for the proposed model s smaller than the Nasrabad et al. [13] model. Eample 3: Ths eample has no outlers and used by Km and Bshu [1] to llustrate how the proposed method performs. We compare the results of our method wth methods n lterature. To evaluate the performance of a fuzzy regresson model, Km and Bshu [1] used the rato of the dfference between the membershp values to the observed membershp values as follows: error S S yˆ ( t) y ( t) dt yˆ = yˆ S y () t dt y (12)

Journal of Uncertan Systems, Vol.5, No.1, pp.62-71, 211 69 where S y and S y ˆ are the support of y ˆ and y, respectvely. To compare the performance of the FLR models, Eq. (12) s appled to calculate the errors n estmaton the observed responses. The data gven for ths eample was used by Tanaa et al. [2]. The data and results are shown n Table 4. By solvng models (5) and (6) (at H=) and modfcaton the values a ' s and c ' 1 1 s, two below fuzzy bands are calculated: yˆ L = (5.55,1.2) (1.325, ), (13) yˆ = (7.2,1.2) (1.325, ). (14) L Table 3: Numercal data and comparson of Nasrabad et al. [13] and proposed model (y,e ) Observed data Nasrabad model Proposed model 1 (6.4,2.2) (5.66,9.82) (4.82,8.57) 2 (8.,1.8) (7.21,11.61) (6.38,1.27) 3 (16.5,2.6) a (8.76,13.4) (7.94,11.97) 4 (11.5,2.6) (1.31,15.19) (9.5,13.67) 5 (13.,2.4) (11.86,16.98) (11.6,15.37) n c - 16.97 11.9 = 1 = a Indcates the outler. (y,e ) Table 4: Comparson between dfferent methods Errors n estmaton Tanaa et al. Damond Savc- Pedrycz Km- Bshu Modarres et al. Proposed method 1 1 (8.,1.8) 1.86 1.23 1.54 1.22 1.35.32 2 2 (6.4,2.2) 1.3 1.39 1.52 1.38 1.27 1.62 3 3 (9.5,2.6).58.42.7.4.23.59 4 4 (13.5,2.6).86 1.9 1.16 1.12 1.25 1.13 5 5 (13.,2.4) 1..4.86.36.13.16 Total error 5.6 4.53 5.78 4.48 4.23 3.82 The rght half of Table 4 shows the errors of the fve observatons for the dfferent methods. The total error of the proposed method s 3.82 whch obvously better than the other total errors. 5 Concluson There are two approaches n Fuzzy lnear regresson: lnear programmng and least squares. In ths paper, a smple model based on frst approach s presented for computng of fuzzy lnear regresson. Ths model s based on Goal Programmng. Snce the estence of outlers n the data set causes ncorrect results, the ablty of proposed model s less senstve to outlers. Furthermore, unle prevous models, t s not necessary to select any parameters beforehand. In ths model, the upper and lower fuzzy bands are computed by two lnear goal programmng model, separately. Several eamples are solved by usng the proposed model wth and wthout outlers and the results are compared wth prevous models. The proposed model results llustrate that ths model has the goodness ft depend both on the observaton and fuzzy bands.

7 H. Omran, S. Aabdollahzadeh and M. Alnaghan: A Smple and Effcent Goal Programmng Model 25 2 y 15 1 yl yu NasrL Nasru y^l y^u 5 1 2 3 4 5 Fgure 5: Comparson between Nasrabad et al. [13] model and proposed model References [1] Abdallah, A., and J.J. Bucley, Monte Carlo methods n fuzzy lnear regresson, Soft Computng, vol.12, pp.463 468, 28. [2] Chan, K.Y., C.K. Kwong, and T.C. Fogarty, Modelng manufacturng processes usng a genetc programmng-based fuzzy regresson wth detecton of outlers, Informaton Scences, vol.18, no.4, pp.56 518, 21. [3] Chen, Y.S., Outlers detecton and confdence nterval modfcaton n fuzzy regresson, Fuzzy Sets and Systems, vol.119, pp.259 272, 21. [4] Coppa, R., P. D Urso, P. Gordana, and A. Santorob, Least squares estmaton of a lnear regresson model wth LR fuzzy response, Computatonal Statstcs & Data Analyss, vol.51, pp.267 286, 26. [5] Damond, P., Fuzzy least squares, Informaton Scences, vol.46, pp.141 157, 1988. [6] D Urso, P., and T. Gastald, A least-squares approach to fuzzy lnear regresson analyss, Computatonal Statstcs & Data Analyss, vol.34, pp.427 44, 2. [7] Ge, H.W., and S.T. Wang, Dependency between degree of ft and nput nose n fuzzy lnear regresson usng non-symmetrc fuzzy trangular coeffcents, Fuzzy Sets and Systems, vol.158, pp.2189 222, 27. [8] Hoat, M., C.R. Bector, and K. Smmou, A smple method for computaton of fuzzy lnear regresson, European Journal of Operatonal Research, vol.166, pp.172 184, 25. [9] Hung, W.L., and M.S. Yang, An omsson approach for detectng outlers n fuzzy regresson models, Fuzzy Sets and Systems, vol.157, pp.319 3122, 26. [1] Km, B., and R.R. Bshu, Evaluaton of fuzzy lnear regresson models by comparson membershp functon, Fuzzy Sets and Systems, vol.1, pp.343 352, 1998. [11] Lee, E.S., and P.T. Chang, Fuzzy lnear regresson analyss wth spread unconstraned n sgn, Computatonal Mathematcal Applcatons, vol.28, no.4, pp.61 7, 1994. [12] Modarres, M., E. Nasrabad, and M.M. Nasrabad, Fuzzy lnear regresson models wth least square errors, Appled Mathematcs and Computaton, vol.163, pp.977 989, 25. [13] Nasrabad, M.M., E. Nasrabad, and A.R. Nasrabady, Fuzzy lnear analyss: a mult-obectve programmng approach, Appled Mathematcs and Computaton, vol.163, pp.245 251, 25. [14] Özelan, E.C., and L. Ducsten, Mult-obectve fuzzy regresson: a general framewor, Computers & Operatons Research, vol.27, pp.635 652, 2. [15] Peters, G., Fuzzy lnear regresson wth fuzzy ntervals, Fuzzy Sets and Systems, vol.63, pp.45 55, 1994. [16] Redden, D.T., and W.H. Woodall, Propertes of certan fuzzy lnear regresson methods, Fuzzy Sets and Systems, vol.64, pp.61 375, 1994. [17] Saawa, M., and H. Yano, Mult obectve fuzzy lnear regresson analyss for fuzzy nput-output data, Fuzzy Sets and Systems, vol.47, pp.173 181, 1992.

Journal of Uncertan Systems, Vol.5, No.1, pp.62-71, 211 71 [18] Savc, D.A., and W. Pedrycz, Evaluaton of fuzzy lnear regresson models, Fuzzy Sets and Systems, vol.39, pp.51 63, 1991. [19] Tanaa, H., I. Hayash, and J. Watada, Possblstc lnear regresson analyss for fuzzy data, European Journal of Operatonal Research, vol.4, pp.389 396, 1989. [2] Tanaa, H., S. Uma, and K. Asa, Lnear regresson analyss wth fuzzy model, IEEE Trans. Systems, Man Cybernet, vol.12, pp.93 97, 1982. [21] Tanaa, H., and J. Watada, Possblstc lnear systems and ther applcaton to the lnear regresson model, Fuzzy Sets and Systems, vol.27, pp.275 289, 1988. [22] Yang, M.S., and T.S. Ln, Fuzzy least-squares lnear regresson analyss for fuzzy nput-output data, Fuzzy Sets and Systems, vol.126, pp.389 399, 22.