ВІСНИК ХНТУ 3(6), 07 р., ТОМ UDC 004.4:59.37.B.PRKHODKO,.V.PRKHODKO, L.M.MAKAROVA, O.O.KUDI, T.G.MKODUB Admral Makarov atonal Unversty of hpbuldng COTRUCTIG O-LIEAR REGREIO EQUATIO O THE BAI OF BIVARIATE ORMALIIG TRAFORMATIO The technques for constructng equatons, confdence and predcton ntervals of non-lnear regressons on the bass of bvarate normalzng transformatons for non-gaussan data are proposed. Applcaton of the technques s consdered for the bvarate non-gaussan data set: actual effort (hours) and sze (adjusted functon ponts) from 33 mantenance and development software projects. Keywords: non-lnear regresson equaton, confdence nterval, predcton nterval, normalzng transformaton, bvarate non-gaussan data С.Б.ПРИХОДЬКО, Н.В.ПРИХОДЬКО, Л.М.МАКАРОВА, О.О.КУДІН, Т.Г.СМИКОДУБ Національний університет кораблебудування імені адмірала Макарова ПОБУДОВА НЕЛІНІЙНИХ РЕГРЕСІЙНИХ РІВНЯНЬ НА ОСНОВІ ДВОМІРНИХ НОРМАЛІЗУЮЧИХ ПЕРЕТВОРЕНЬ Запропоновано методи побудови рівнянь, довірчих інтервалів та інтервалів передбачення нелінійних регресій на основі двомірних нормалізуючих перетворень для негаусовських даних. Застосування методів розглядається для одного набору двомірних негаусовських даних: для фактичної трудомісткості (години) і розміру (скориговані функціональні точки) зі 33 проектів з підтримки та розробки програмного забезпечення. Ключові слова: нелінійне рівняння регресії, довірчий інтервал, інтервал передбачення, нормалізуюче перетворення, двовимірні негаусовські дані С.Б.ПРИХОДЬКО, Н.В.ПРИХОДЬКО, Л.Н.МАКАРОВА, О.А.КУДИН, Т.Г.СМЫКОДУБ Национальный университет кораблестроения имени адмирала Макарова ПОСТРОЕНИЕ НЕЛИНЕЙНЫХ РЕГРЕССИОННЫХ УРАВНЕНИЙ НА ОСНОВЕ ДВУМЕРНЫХ НОРМАЛИЗУЮЩИХ ПРЕОБРАЗОВАНИЙ Предложены методы построения уравнений, доверительных интервалов и интервалов предсказания нелинейных регрессий на основе двумерных нормализующих преобразований для негауссовских данных. Применение методов рассматривается для двух наборов двумерных негауссовских данных: для фактической трудоемкости (часы) и размера (скорректированные функциональные точки) из 33 проектов по поддержке и разработке программного обеспечения. Ключевые слова: нелинейное уравнение регрессии, доверительный интервал, интервал предсказания, нормализующее преобразование, двумерные негауссовские данные Problem formulaton A normalzng transformaton s often a good way to construct equatons, confdence and predcton ntervals of non-lnear regressons [-5], and t s often used for that purposes n emprcal software engneerng, nformaton technology, bometry, ecology, fnance, etc. However, well-known technques for constructng equatons, confdence and predcton ntervals of non-lnear regressons are based on the unvarate normalzng transformatons, whch do not take nto account the correlaton between random varables n the case of normalzaton of bvarate non-gaussan data. Ths leads to the need to use the bvarate normalzng transformatons, whch take nto account that correlaton to construct equatons, confdence and predcton ntervals of non-lnear regressons. Analyss of recent research and publcatons Transformatons are an extremely mportant part of regresson analyss, but the use of transformatons can be somewhat trcky []. Accordng [] transformatons are made for essentally four purposes, two of whch are: frst, to obtan approxmate normalty for the dstrbuton of the error term (resduals) or the dependent random varable, second, to transform the response and/or the predctor n such a way that the strength of the lnear relatonshp between new varables (normalzed varables) s batter than the lnear relatonshp between dependent and ndependent random varables. ow well-known normalzng transformatons are used to construct the equatons, confdence and predcton ntervals of non-lnear regressons. For that purposes, for example, t s known the applcaton of such normalzng transformatons as the decmal logarthm transformaton [-6], the Box-Cox transformaton [, 4], the Johnson translaton system [7, 8]. However, known technques for constructng equatons, confdence and predcton ntervals of non-lnear regressons are based on the unvarate normalzng transformatons, whch do not take nto account the correlaton between random varables n the case of normalzaton of bvarate non-gaussan data. 333
ВІСНИК ХНТУ 3(6), 07 р., ТОМ Purpose of the study The purpose of the study s to propose the technques for constructng the equatons, confdence and predcton ntervals of non-lnear regressons for bvarate non-gaussan data n general case, when t necessary to take nto account the correlaton between the response and the predctor (dependent and ndependent random varables) n the case of normalzaton of that varables. Presentaton of the man research materal We propose the technques for constructng the equatons, confdence and predcton ntervals of non-lnear regressons for bvarate non-gaussan data. As and n [9, 0] the technques consst of three steps. In the frst step, a set of bvarate non-gaussan data s normalzed usng a bjectve bvarate normalzng transformaton. In the second step, the equaton, confdence and predcton ntervals of non-lnear regresson for the normalzed data are bult. In the thrd step, the equatons, confdence and predcton ntervals of non-lnear regressons for bvarate non- Gaussan data are constructed on the bass of the equaton, confdence and predcton ntervals of lnear regresson for the normalzed data and the normalzng transformaton. The technques. Consder bjectve multvarate normalzng transformaton of non-gaussan random vector P T to Gaussan random vector T, X and the nverse transformaton for () T, s gven by T ψ P T P ψ. () The lnear regresson equaton for normalzed data accordng to () wll have the form [] where Ẑ ˆ ˆ b, (3) s predcton lnear regresson equaton result for values of regresson equaton parameter. The non-lnear regresson equaton wll have the form b ; ˆb () s estmator for lnear ˆ b. (4) The technque for constructng of confdence nterval s based on transformaton () and equaton CI ˆ t,. (5) where t, s a quantle of student's t-dstrbuton wth degrees of freedom and level; ; sgnfcance ;. The technque conssts of three steps. In the frst step, non-gaussan data s normalzed usng a bjectve normalzng transformaton (), and lnear regresson equaton (3) s bult on the bass of the normalzed data. In the second step, the confdence nterval for lnear regresson s detected. In the thrd step, the confdence nterval for nonlnear regresson s bult on the bass of the confdence nterval for lnear regresson and the normalzng transformaton. The confdence nterval for non-lnear regresson wll have the form CI ˆ t,. (6) The technque for constructng of predcton nterval s based on the transformaton () and equaton [] 334
ВІСНИК ХНТУ 3(6), 07 р., ТОМ PI ˆ t,. (7) Lke prevous the technque conssts of three steps, wth the dfference that nstead of the confdence ntervals, we defne the predcton ntervals. The predcton nterval for non-lnear regresson wll have the form PI ˆ t,. (8) The equatons (4), (6) and (8) are used for constructng the equatons, confdence and predcton ntervals of non-lnear regressons for bvarate non-gaussan data. The lnes of equatons, confdence and predcton ntervals of non-lnear regressons can also be bult by the nverse transformaton () of the values of varables PI from equaton (3), (5) and (7) respectvely. Bvarate normalzng transformatons. ome normalzng transformatons have been proposed for normalzng bvarate non-gaussan data, such as, transformaton on the bass of the Box-Cox transformaton, the Johnson translaton system and others. However, only a few normalzng transformatons are bjectve. uch bjectve transformaton s the transformaton of U famly of the Johnson translaton system. The Johnson normalzng translaton s gven by [] γ where, η, T, γ ηh λ X m 0 m, Ẑ, CI, (9) and λ are parameters of the Johnson normalzng translaton; γ T, ; dag, ; λ dag ; h, y h y h y T ; h (.) s one of the translaton functons, y, and η ; ln lny h Arsh y y, y y,, for for for for L B U (log normal) famly; (bounded) famly; (unbounded) famly; (normal) famly; s the covarance matrx. Here y x ; Arsh y ln y y. The nverse transformaton for the Johnson normalzng translaton (9) s gven by [] x λh η z γ. (0) Example. We consder the example of constructng the equaton, confdence and predcton ntervals of nonlnear regresson for the bvarate non-gaussan data set: actual effort (hours) and sze (adjusted functon ponts) from 33 mantenance and development projects [] after the cutoff of outlers by the technque for detectng bvarate outlers on the bass of the normalzng transformatons for non-gaussan data [3]. On Fg. the lnear regresson (sold lne), the borders of confdence (dot-dash lnes) and predcton (dotted lnes) ntervals ( 0. 05 ) of lnear regresson for normalzed data (ponts n the form of crcles) from 33 projects are presented. 335
ВІСНИК ХНТУ 3(6), 07 р., ТОМ Fg.. Equaton, confdence and predcton ntervals of lnear regresson for normalzed data from 33 projects These data s normalzed by B famly of the transformaton (9). In these case the pont estmates of parameters are such:.88055,.7334, 0.793776, 0.95403, 3.890, 96.5557, 768.509 and 870.7. The sample covarance matrx of the s used as the approxmate moment-matchng estmator of covarance matrx 0.9948 0.848. 0.848 0.9948 On Fg. the non-lnear regresson (sold lne), the borders of confdence (dot-dash lnes) and predcton (dotted lnes) ntervals ( 0. 05 ) of non-lnear regresson for non-gaussan data (ponts n the form of crcles) from 33 projects are presented. That non-lnear regresson, the confdence and predcton ntervals were bult on the bases of transformatons (9) and (0). Also the non-lnear regresson, the confdence and predcton ntervals were bult on the bases of the decmal logarthm transformaton. For that transformaton on Fg. the borders of predcton nterval (dotted lnes wth short dashes) are also presented. We note, n ths case (the decmal logarthm transformaton) at the maxmum value of the ndependent varable the wdth of predcton nterval s more by 60 percent compared to predcton nterval, whch constructed on the bases of transformaton (9). Fg.. Equaton, confdence and predcton ntervals of non-lnear regresson for non-gaussan data from 33 projects In our opnon, such a bg dfference s due to poor bvarate data normalzaton by the decmal logarthm transformaton. We note, Marda s multvarate kurtoss [4] equals 8 under bvarate normalty for our case. The values of pont estmate of kurtoss equal respectvely 8.00 and 6.93 for the normalzed data on Fg. and the data, whch normalzed by the decmal logarthm transformaton. These values ndcate that the necessary condton for bvarate normalty s practcally performed for the normalzed data by transformaton (9) only. 336
ВІСНИК ХНТУ 3(6), 07 р., ТОМ At the same tme the non-lnear regressons, whch were bult on the bases of transformatons (9) and the decmal logarthm transformaton, are approxmately smlar: the values of coeffcent of determnaton equal 0.5664 and 0.5759 respectvely. Conclusons From the examples we conclude that the proposed technques for constructng the equatons, confdence and predcton ntervals of non-lnear regressons for bvarate non-gaussan data are promsng. Applcaton of the technques s consdered for the bvarate non-gaussan data set: actual effort (hours) and sze (adjusted functon ponts) from 33 mantenance and development software projects. Accountng the correlaton between random varables n the case of normalzaton of that bvarate non-gaussan data leads to reducton of the wdth of confdence and predcton ntervals of the non-lnear regresson compared to the same ntervals, whch constructed on the bases of the decmal logarthm transformaton. In the future, we ntend to try other bvarate non-gaussan data sets. References. D.M. Bates and D.G. Watts. onlnear Regresson Analyss and Its Applcatons. Wley, 988, 384 p.. T.P. Ryan. Modern regresson methods. Wley, 997, 59 p. 3. G.A.F. eber and C.J. Wld. onlnear Regresson. John Wley & ons, Inc., 003, 79 p. 4. R.A. Johnson and D.W. Wchern. Appled Multvarate tatstcal Analyss. Pearson Prentce Hall, 007, 800 p. 5. Ian Pardoe. Appled regresson modellng. Wley, 0, 35 p. 6. Chatterjee and J.. monoff. Handbook of Regresson Analyss. John Wley & ons, Inc., 03, 36 p. 7..B. Prykhodko and A.V. Pukhalevch, Developng PC oftware Project Duraton Model based on Johnson transformaton, n Modern Problems of Rado Engneerng, Telecommuncatons and Computer cence, Proceedngs of the th Internatonal Conference, Lvv-lavske, Ukrane, 5 February - March, 04, pp. 4-6. 8..B. Prykhodko and A.V. Pukhalevch, Confdence nterval estmaton of PC software project duraton regresson based on Johnson transformaton Radoelectronc and Computer ystems, o, Vol. 66, pp. 04-07, 04. 9..B. Prykhodko, tatstcal anomaly detecton technques based on normalzng transformatons for non- Gaussan data, n Computatonal Intellgence (Results, Problems and Perspectves), Proceedngs of the Internatonal Conference, Kyv-Cherkasy, Ukrane, May -5, 05, pp. 86-87. 0..B. Prykhodko, Developng the software defect predcton models usng regresson analyss based on normalzng transformatons n Modern problems n testng of the appled software (PTTA-06), Abstracts of the Research and Practce emnar, Poltava, Ukrane, May 5-6, 06, pp. 6-7.. P.M. tanfeld, J.R. Wlson, G.A. Mrka,.F. Glasscock, J.P. Pshogos, J.R. Davs Multvarate nput modelng wth Johnson dstrbutons, n Proceedngs of the 8th Wnter smulaton conference WC'96, December 8-, 996, Coronado, CA, UA, ed..andradуttr, K.J.Healy, D.H.Wthers, and B.L.elson, IEEE Computer ocety Washngton, DC, UA, 996, pp. 457-464.. B. Ktchenham,.L. Pfleeger, B. McColl, and. Eagan, An emprcal study of mantenance and development estmaton accuracy, The Journal of ystems and oftware, 64, pp.57-77, 00. 3.. Prykhodko,. Prykhodko, L. Makarova, O. Kudn, T. mykodub and A. Prykhodko, Detectng bvarate outlers on the bass of normalzng transformatons for non-gaussan data n Advanced Informaton ystems and Technologes, Proceedngs of the V Internatonal centfc Conference, umy, Ukrane, May 7-9, pp. 95-97, 07. 4. K.V. Marda, Measures of multvarate skewness and kurtoss wth applcatons, Bometrka, 57, pp. 59 530, 970. R 337