SOFTWARE usually does not work alone. It must have

Proceedigs of the 203 Federated Coferece o Computer Sciece ad Iformatio Systems pp. 343 348 A method for selectig eviromets for software compatibility testig Łukasz Pobereżik AGH Uiversity of Sciece ad Techology, Cracow, Polad Abstract Moder software is developed to work with multiple software ad hardware architectures, to cooperate with various peer compoets ad ca be istalled i may differet cofiguratios. I order to test it, all possible workig eviromets eeds to be created. This requires software ad hardware resources like servers, etworks ad software liceses ad most importat: ma-hours of qualified egieers that will have to cofigure ad maitai them. Because resources are usually limited we have to choose a set of cofiguratios with highest impact o quality of software uder test. I this paper we preset a method of measurig effectiveess of give software eviromet for discoverig defects i software by itroducig eviromet sesitivity measure. We also show how it ca be used i simple algorithm used to select best cofiguratios by usig oly a selected subset of them ad progressively modifyig it thougout software developmet process. I. INTRODUCTION AND PROBLEM DESCRIPTION SOFTWARE usually does ot work aloe. It must have a eviromet that it works i. This eviromet ca be composed from may compoets like: servers, operatig systems, databases, remote services etc. Those compoets ca also have other compoets that they rely o. Eg. database might eed a operatig system to work o. Those depedecies create a Compoet Depedecy Graph (CDG) that describes a eviromet for Software uder Test (SUT). Example of such graph is give o Figure. This graph shows oly geeral structure of eviromet. Each compoet may also have a set of properties like type, versio umber, architecture type, permissios, locales etc. Database A Computer A AppServer Middleware B Computer B Cliet istalled o coected to Fig. : Example of Compoet Depedecy Graph (CDG) that shows depedecies betwee resources. Tree A represets eviromet for server applicatio. Tree B is eviromet for cliet applicatio. Lets take a simple use case: a applicatio workig o two operatig systems, with three database servers ad two applicatio servers. It will give about 2 3 2=2 differet eviromets to test. If we add aother variable: 32-bit or 64-bit architecture, it will double possible eviromet cofiguratios to at most 24. Addig ew cofigurable elemet to eviromet teds to icrease the umber of possible setups expoetially. Not all cofiguratios may be possible to create (for example some middle-ware may ot be available for all operatig systems), but it still is sigificat umber of variats to test. Problems of geeratig test eviromets ad possible solutios were metioed i our other article []. There has bee efforts to automate the process of creatig those eviromets based o sematic descriptio of CDG i [2] ad [3]. Authors of those articles proposed to use virtualizatio to costruct eviromets ad the use sapshots to cloe ad the modify them to build other eviromets. This techique ad additioal simplificatios allowed to reduce umber of separate cofiguratios from about 200 to 60. However this is still to may eviromets to be build ad maitaied for everyday regressios tests or cotiuous builds. I dissertatio [4] same author came also to this coclusio ad proposed a maual way to select subset of cofiguratios based o testers prefereces. Decisio o which cofiguratios test software is i that case solely based o testers expert kowledge, without support of ay aalytical tools. I our research we tried to establish a method to measure how good is give cofiguratio for testig ad a algorithm to choose the best of them. II. SELECTING BEST ENVIRONMENTS FOR TESTING As show above umber of possible eviromets ca be quite high. This meas that with limited resources we ca oly choose subset of them. Oe of the most popular methods is to use cofiguratios that are most widely used by customers. However whe umber of software users is high, diversity of cofiguratios may also be too high o ad must also be limited. We have to defie what meas that oe cofiguratio is better for testig purposes tha the other ad the create a algorithm for choosig the best of them. I our research we followed a commo pheomea observed by testers: some of the software eviromets are causig more problems tha the others - basically they fail more tests (or fail them more ofte). If cofiguratio A is more problematic tha cofiguratio B that usually meas that if we ru tests o cofiguratio A ad 978--4673-447-5/$25.00 c 203, IEEE 343

344 PROCEEDINGS OF THE FEDCSIS. KRAKÓW, 203 they will pass, so they will pass also o cofiguratio B (with high probability). This meas that we do ot eed to coducts tests o cofiguratio B so frequetly as o cofiguratio A. Coclusio is that cofiguratio A is better for testig tha cofiguratio B, because it allows to detect more eviromet related defects. I order to compare eviromets it is good to have a umerical metric that will allow to evaluate effectiveess of give cofiguratio. It is also a requiremet for may optimizig algorithms (especially evolutioary) to provide a fitess fuctio to compare solutios. A. Measure of eviromet sesitivity I order to compare two eviromets for software testig we eed to establish metric that would tell which cofiguratio is better. Let a T be a set of tests (test suite) cosistig sigle tests t i. Let T k be a vector of test results executed i k iteratio. Test ca be either (pass) or 0 (fail). T k = (t,t 2,t 3...t ),t i = 0 () E j is a eviromet j. Testig fuctio F T is a fuctio that assigs for each k iteratio a vector of tests results T k to eviromet C j. F T (k,e j ) = (t,t 2,t 3,...,t ) (2) We ca describe F for sigle iteratio k i more coveiet way as a matrix, where colums are tests ad rows are eviromets (lets ote umber of cofiguratios as m). t ji is a result of test t i o eviromet C j. F T (k) = t t 2 t 3 t 2 t 22 t 23 t 3 t 32 t 33 First step i calculatig sesitivity is to remove those tests that does ot brig ay iformatio about eviromet differeces. We remove those colums that satisfy coditio: (3) p [,m] x [,] y [,] : t xp = t yp (4) This meas that removed are oly those tests that passed or failed i all cofiguratios (remove colums of all or all 0). The for each row vector we calculate how may times give test failed ad ormalize it by umber of tests i vector (after removig some of them i first step). ( t jx ) x= Ses(C j ) = (5) Ses(C j ) [0,] (6) Sesitivity value close to 0 meas that give eviromet is ot good for fidig defects because all tests here pass. Whe sesitivity is meas that cofiguratio is a good cadidate for fidig software errors because all tests fail o it whereas o at least oe other cofiguratio they pass. Of course, if all tests fail o give cofiguratio we have to check if the problem is ot with tests itself - for example there is a defect i testig code. B. Properties of eviromet sesitivity Let s defie eviromet domiatio: eviromet A domiates eviromet B if: x : t Ax < t Bx y x : t Ay t By (7) I other word: there is at least oe test that failed o cofiguratio A but passed o cofiguratio B. This would mea that cofiguratio A foud a defect that was ot discovered by cofiguratio B. Eviromet sesitivity has this property that: A domiates B Ses(C A ) > Ses(C B ) (8) This property is result of eviromet sesitivity defiitio. Let sum iequalities i secod part of domiatio defiitio: k i t Ak k i t Bk (9) If we add t Ai < t Bi, weak iequality will become strog iequality: t Ak < t Bk (0) Note that sesivity calculatio requires removig tests that i every cofiguratio failed or passed. This will also covert weak iequality ito strog oe. If we multiply both sides by ad add : t Ak > t Bk () Because = the we ca rewrite equatio as: t Ak > ( t Ak ) > Now divide both sides by : ( t Ak ) > t Bk (2) ( t Bk ) (3) ( t Bk ) Ad ow usig eviromet sesitivity defiitio: (4) Ses(C A ) > Ses(C B ) (5) Itroductio of eviromet domiatio allows to us to use existig multi-criteria optimizatio techiques to fid Paretoefficiet solutios. This proof ca also be used to quickly compare results of tests ru o two cofiguratios without calculatig sesitivity itself. Of course it will oly itroduce order to the set of cofiguratios but would ot give ay idea how much they differ.

UKASZ POBERENIK: A METHOD FOR SELECTING ENVIRONMENTS 345 C. Algorithm to geerate cofiguratios Algorithm that will geerate ad evaluate eviromets must have several importat properties: ) Works o discrete solutio spaces. 2) A ability to search ukow solutio space (we have o additioal iformatio about local optima). 3) Iterative schema of work, similar to iterative test executio. Geeratig ew eviromet ad maitaiig it is a costly operatio. This meas that algorithm must work with small data sets - typically 8-2 cofiguratios. Tests should be ru frequetly, every code chage or at least daily. This meas that we may have eough iteratios util we reach optimal solutio. However we have to remember that each iteratio meas addig ew cofiguratio ad this is a costly operatio. Algorithm Simple algorithm for selectig most sesitive eviromets. E GeerateAvailableEviromets() {iitial cofiguratio pool} P RadomSubset(E, ) {P represets curret workig set} P {P represets ext set} repeat P P {P is set from previous iteratio} E E P R RuTests(P) S CalculateSesivity(R) S SortBySesivity(S) P for i = 0 to k do P P S[i] ed for P P RadomSubset(E, k) util E ot empty ad P P GeerateAvailableEviromets() creates a set of all possible eviromets we wat to execute tests o. Fuctio RadomSubset(X, ) geerates a radom subset from set X of size. Summarizig algorithm above: i each iteratio we ru test suite o selected eviromets (workig set). Usig test results we calculate sesitivity for each of them. After that we select k best of them. From iitial pool of eviromets we choose radom oes to fill up workig set so it will have cofiguratios agai. Procedure is repeated util all cofiguratios are used (iitial pool is empty) or i ext two cosecutive iteratios k best cofiguratios is the same. This algorithm teds to go though differet solutios util it coverge to optimal oe. For eviromet compatibility testig this meas that before we reach optimal set of cofiguratios we will test much more of them ad there is a possibility that we will fid eve more software defects, tha usig optimal set from the begiig. III. EXPERIMENTS A. Direct applicatio of eviromet sesitivity measure To verify properties of eviromet sesitivity a experimet was coducted. We used a simple cofiguratio with oe operatig system ad a web browser istalled o it. Operatig systems used were Liux, Widows ad Mac OS X. Browsers uder test were Iteret Explorer, Firefox, Chrome, Safari, Opera. Each browser was available i several differet versios (depedig o browser type). System used for experimet was built usig small web server (Jetty) that was ruig a static web page. This web page was based o a popular HTML5 compatibility test site (www.html5test.com) ad server both as test suite ad applicatio uder test. Existig scripts were modified to sed test results to data collectio servlet ruig o the same web server (origially they were displayed o the scree). Schematic diagram of system used for experimet is show o Figure 2. Whe the test page was loaded it executed 242 true/false tests ad set results usig JSON format back to server where they were stored i file alog with iformatio about browser ad operatig system type. The same page was ru o each eviromet ad test results were set usig separate script back to server that stored them for further aalysis. Differet cofiguratios were provided by web browser compatibility testig cloud service (browsershots.org). We collected results for 46 differet cofiguratios. Expected umber of eviromets should be higher, because some of the cofiguratios has bee ot executed at all by the cloud (which is a defect i cloud service). However umber of collected data is eough for aalysis. I real life testig setups there are usually o more tha several eviromets i costat use. Sesitivity measure by defiitio is calculated relatively to other eviromets i tested cofiguratio set. I most cases there is ot eough resources (computig power, time, machies) to perform tests (ad calculate sesitivity) for every eviromet i give cofiguratio space. We wated to check how much sesitivity measure will differ whe it is calculated for small subset of cofiguratio space agaist full cofiguratio space. From all 46 eviromets we chose radomly subsets of 5, 8 ad 0 eviromets ad calculated sesitivity for each cofiguratio i it. We observed that sesitivity measure does ot chage more tha 2 percet whe it is calculated for a reasoable subset of iitial test results (see Table I). If we use more tha 8 eviromets it seems that average differece is less tha 0 percet. We suppose that this behavior of measure is possible because the way eviromet reacts to tests is ot depedet o other eviromets. This makes this sesitivity measure good cadidate for fitess fuctio i evolutioary programmig. B. Sesitivity measure as a fitess fuctio As stated before sesitivity measure ca be used as fitess fuctio so the ext step was to check if proposed algorithm allows to quickly fid best eviromets. Populatios of sizes 8, 0 ad 2 were tested. Each time algorithm was ru 000

346 PROCEEDINGS OF THE FEDCSIS. KRAKÓW, 203 WEB SERVER TEST PAGE WEB PAGE WITH TESTS WEB TESTING CLOUD WEB BROWSER WEB BROWSER 2 RESULTS DATA COLLECTION SCRIPT TEST RESULTS WEB BROWSER Fig. 2: Architecture of the system used for eviromet compatibility experimet. TABLE I: Stadard deviatio of eviromet sesitivity calculated from radom subsets from iitial data. 5 eviromets 8 eviromets 0 eviromets 00 radom subsets 0.077 0.045 0.04 000 radom subsets 0.0 0.075 0.06 0000 radom subsets 0.2 0.09 0.08 TABLE II: Averaged results after 000 executio of eviromet selectio algorithm. Differece is calculated agaist sesitivity calculated for all cofiguratios together. Eviromet sesivity 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Populatio size Max differece Max iteratios = 8, k = 4 5% 4.32 iteratios = 0, k = 5 4.5% 4.09 iteratios = 2, k = 6 3.8% 3.3 iteratios times ad results were averaged. They are preseted i Table II. For populatio size of 0 algorithm delivered a stable set of eviromets i less tha 5 iteratios. I Table III we ca see browsers selected by algorithm i more tha 0% of cases alog with their average sesitivity. Colums Operatig system, Browser type ad Browser Versio defie eviromet. Frequecy shows percetage of times give cofiguratio was chose i top k results i 000 rus of algorithm (eg. 75% meas that is was chose i 750 times). Average sesitivity is arithmetic mea of sesitivity value calculated for eviromet i all rus. If we compare this table with sesitivity calculated for all eviromets i Appedix A Table IV they basically match each other. C. Strategies for selectig eviromets for tests Results also proved commo sese that better to test o older versios of software because ewer versios have lot of compatibility problems already fixed. This ca be see o Figure 3 ad 4 where sesitivity of eviromet is preseted versus browser versio. We had to ormalize versio umberig to [0, ] because of differet umberig schemes 0. 0 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Oldest Normalized browser versio Newest Fig. 3: Sesitivity of eviromet by browser versio for Mozilla Firefox. Versio umbers were ormalized to be from 0 (oldest) to (latest). You ca see sudde improvemet i HTML5 compatibility after third cosecutive versio. used by browser vedors. We ca cosider several strategies to reduce umber of eviromets used for tests. Simplest oe is to establish a cut off poit below that every eviromet is discarded. O Figure 3 we see that good cut off poit will be sesitivity with value 0.5 because it clearly separates set. However other good strategy will be to discard those some eviromets that have similar sesitivity value. From Figure 3 ad Table IV we see that Firefox from versio 6 to 5 have sesitivity betwee 0.3 ad 0.4. This meas that we ca choose oe or several of them based o ow preferece (or radom choice) because they behave more or less similarly durig tests. Other importat aspect is that eviromet sesitivity ca provide a order of tests. If we start with eviromets with highest sesitivity ad some tests will fail, we ca stop, fix defect ad start over agai. I our test case, testig complicated web pages o latest browser versios will likely be successful,

UKASZ POBERENIK: A METHOD FOR SELECTING ENVIRONMENTS 347 TABLE III: Average sesitivity for eviromets usig proposed algorithm (averaged after 000 rus) for cofiguratio set of size 0. Frequecy show how may times give eviromet was chose by algorithm i top k best. Oly those cofiguratios with frequecy more tha 0% are show. Operatig system Browser type Browser versio Frequecy Average sesitivity LINUX FIREFOX 2.0.0.7 77% 0.960 LINUX KONQUEROR 4.8 77% 0.889 MAC OS X CAMINO2 2..2 76% 0.802 LINUX FIREFOX.5.0.2 76% 0.983 WINDOWS FIREFOX 2.0.0.2 74% 0.963 MAC OS X SAFARI 4.0.5 50% 0.766 WINDOWS CHROME 3.0.82.2 3% 0.777 WINDOWS CHROME 4.0.223. 24% 0.609 WINDOWS OPERA 0.00 6% 0.388 LINUX FIREFOX 7.0. 2% 0.323 LINUX FIREFOX 6.0. 0% 0.326 Eviromet sesivity 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 0 0. 0.2 0.3 0.4 0.5 0.6 Oldest Normalized browser versio Newest Fig. 4: Sesitivity of eviromet by browser versio for Chrome. Versio umbers were ormalized to be from 0 (oldest) to (latest). You ca see improvemet i HTML5 compatibility i latest versios. However it is ot as steep as i Firefox browser. because they are more HTML5 compatible. So better strategy will be to test o older versios ad if they pass tests, the check o latest versios. IV. CONCLUSIONS AND FUTURE WORK It seems that itroduced eviromet sesitivity measure is a good way of measurig usefuless of eviromet for testig purposes. It provides aalytical way to compare cofiguratios ad allows to use existig optimizatio techiques. For more complicated eviromets (that have several odes i their CDG) we pla to use evolutioary algorithms. For preseted browser testig case, cross-over ad mutatio operatios were ot feasible because they produced cofiguratios that were ot available i testig cloud. Itroductio of eviromet domiatio (i Pareto sese) will allow to use existig methods used i multi-criteria optimizatio. Automated tests are usually ru frequetly i order to fid out regressio defects itroduced durig developmet. This causes tests to repeatedly oscillate betwee pass ad fail states. We are ow extedig sesitivity model by itroducig time lie to take those chages ito cosideratio ad utilize historical iformatio for more precise results. We are also ivestigatig possibility of usig machie learig to correlate chages i applicatio code base with historical test results to predict the best cofiguratio ad tests order to test o. This way whe a ew chage is beig itroduced to software we ca decide i which eviromet it should be tested i first place. I our research we are plaig to used multi-aget systems (See [5] ad [6]) that will automatically deploy eviromets ad optimize them for most efficiet testig i terms of quality ad resource cosumptio. Sesitivity is a useful measure to be used i algorithms that detect uusual behaviors like those metioed i [7] ad [8]. We are also cosiderig itroducig secod measure based o probability that will cooperate with eviromet sesitivity that will allow us to better describe eviromet behavior ad compare them i more tha oe category. REFERENCES [] L. Poberezik, Automatic geeratio ad cofiguratio of test eviromets, i Iformatio Systems Architecture ad Techology - Web Iformatio Systems Egieerig, Kowledge Discovery ad Hybrid Computig. Oficya Wydawicza Politechiki WrocÅĆawskiej, WrocÅĆaw, 20, p. 303. [2] I.-C. Yoo, A. Sussma, A. Memo, ad A. Porter, Direct-depedecybased software compatibility testig, i Proceedigs of the twety-secod IEEE/ACM iteratioal coferece o Automated software egieerig, ser. ASE 07. New York, NY, USA: ACM, 2007, pp. 409 42. [Olie]. Available: http://doi.acm.org/0.45/3263.32696 [3] I.-C. Yoo, A. Sussma, A. Memo, ad A. Porter, Effective ad scalable software compatibility testig, i Proceedigs of the 2008 iteratioal symposium o Software testig ad aalysis, ser. ISSTA 08. New York, NY, USA: ACM, 2008, pp. 63 74. [Olie]. Available: http://doi.acm.org/0.45/390630.390640 [4] I. Yoo, Compatibility testig for compoet-based systems, Ph.D. dissertatio, Uiversity of Marylad, 200, hdl.hadle.et/903/294.

348 PROCEEDINGS OF THE FEDCSIS. KRAKÓW, 203 [5] K. Cetarowicz ad V. Gruer, P. af Hilare, A formal specificatio of m- aget architecture, i Proc. Multi-Aget Systems CEEMAS 200, L. N. i. A. I. v.. S.-V. Keplicz B., Nawarecki E., Ed., Berli, Heidelberg, 2002, pp. 62 72. [6] K. Cetarowicz, From algorithm to aget, i Computatioal Sciece ICCS 2009, LNCS 5545 Spriger Verlag, 2009, pp. 825 834. [7] K. Cetarowicz ad G. Rojek, Behavior based detectio of ufavorable resources, i Proc. Computatioal Sciece - ICCS 2004, G. S. P. e. a. L. N. i. C. S. v.. S.-V. Bubak, M; VaAlbada, Ed., Berli, Heidelberg, 2004, pp. 607 64. [8] K. Cetarowicz ad G. Rojek, Behavior evaluatio with actios samplig i multi-aget system, i Proc. Multi-Aget Systems ad Applicatios CEEMAS 2005, P. V. L. N. i. C. S. v.. S.-V. Pechoucek, M; Petta, Ed., Berli, Heidelberg, 2005, pp. 490 499. APPENDIX I this sectio we preset a table with sesitivity values calculated for differet versios of popular browsers ruig o various operatig systems. Sesitivity was calculated at oce based o all test results from all available cofiguratios.. I productio it is usually ot possible to keep so may testig eviromets, so oly a small subset of them is used for daily testig ad more of them are added whe eeded (for example before product release). You ca compare results from this table with values from Table III. TABLE IV: Sesitivity values calculated for all cofiguratios (oly o-zero values are show). I this case sesitivity was calculated for all eviromets at oce. System Browser Browser versio Sesitivity LINUX FIREFOX.5.0.2.000 LINUX FIREFOX 2.0.0.7 0.97 WINDOWS FIREFOX 2.0.0.2 0.97 LINUX KONQUEROR 4.8 0.902 MAC OS X CAMINO2 2..2 0.82 MAC OS X SAFARI 4 0.202 WINDOWS CHROME 3.0.82.2 0.798 WINDOWS CHROME 4.0.223. 0.659 WINDOWS OPERA 0.00 0.445 LINUX FIREFOX 7.0. 0.405 LINUX FIREFOX 6.0. 0.405 MAC OS X FIREFOX.0 0.364 MAC OS X FIREFOX 2.0 0.364 MAC OS X FIREFOX 3.0. 0.364 MAC OS X FIREFOX 4.0. 0.364 MAC OS X FIREFOX 5.0. 0.358 WINDOWS FIREFOX.0 0.358 WINDOWS FIREFOX 6.0 0.34 LINUX SAFARI 5.0 0.32 WINDOWS CHROME 5.0.375.25 0.306 WINDOWS CHROME 6.0.453. 0.243 LINUX CHROME 6.0.472.63 0.24 MAC OS X SAFARI 6.0. 0.208 WINDOWS CHROME 7.0.57.44 0.73 LINUX CHROME 20.0.32.47 0.075 LINUX CHROME 22.0.229.94 0.052 MAC OS X CHROME 22.0.229.94 0.046 WINDOWS SAFARI 5.0 0.046 WINDOWS OPERA.64 0.07