Performance and Cost Optimization for Multiple Large-scale Grid Workflow Applications

Size: px

Start display at page:

Download "Performance and Cost Optimization for Multiple Large-scale Grid Workflow Applications"

Whitney Ford
6 years ago
Views:

1 Performance and Cost Optmzaton for Multple Large-scale Grd Worflow Applcatons Rubng Duan, Radu Prodan, Thomas Fahrnger Insttute of Computer cence, Unversty of Innsbruc Emal: ABTRACT chedulng large-scale applcatons on the Grd s a fundamental challenge and s crtcal to applcaton performance and cost. Largescale applcatons typcally contan a large number of homogeneous and concurrent actvtes whch are man bottlenecs, but open great potentals for optmzaton. Ths paper presents a new formulaton of the well-nown NP-complete problems and two novel algorthms that addresses the problems. The optmzaton problems are formulated as sequental cooperatve games among worflow managers. Expermental results ndcate that we have successfully devsed and mplemented one group of effectve, effcent, and feasble approaches. They can produce solutons of sgnfcantly better performance and cost than tradtonal algorthms. Our algorthms have consderably low tme complexty and can assgn,, actvtes to, processors wthn.4 second on one Opteron processor. Moreover, the solutons can be practcally performed by worflow managers, and the volaton of Qo can be easly detected, whch are crtcal to fault tolerance.. INTRODUCTION The Grd s a heterogeneous and geographcally dstrbuted computng envronment whch has dfferent access cost models and dynamcally varyng load avalablty condtons [3]. The executon of worflows n a Grd envronment must tae such resource varablty and economc factors nto account, whch maes dynamc performance and cost optmzaton one of the most essental ssues n attanng hgh performance. Most current Grd worflow executon envronments [, 2, 5, 6, 9,, 6, 2] focus on mprovng performance by redstrbutng worload, but they provde relatvely smple system models and lac effectve functonalty to support large-scale Grd applcatons. From an end user s perspectve, both mnmzng costs as well as executon tme are preferred functonaltes, whereas from system s perspectve, farness can be consdered as a good motvaton. To our best nowledge, so far no scheme deals wth all purposes n an ntegrated and effectve manner. Performance, farness, users cost, and optmzaton are not consdered by most of the current worflow executon envronments. nce potentally there are many worflows on the Grd whch are compettors for the use of avalable resources, several ssues arse and ought to be dealt wth: ( effcent resource allocaton Ths wor s partally funded by the European Unon through the IT-346 edutan@grd and FP CoreGRID projects. Permsson to mae dgtal or hard copes of all or part of ths wor for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. for dfferent worflows tang nto account ther dfferent needs and performance requrements; (2 the noton of farness; (3 the ablty to mplement the allocaton scheme n a dstrbuted manner wth mnmal maespan; (4 the ssue of cost f worflows are assgned to resources accordng to ( and (2 above. In ths paper, we address these four ssues by proposng two optmzaton schemes for a class of scentfc Grd worflows characterzed by a large number of homogeneous actvtes. The frst scheme ams to mnmze the expected executon tme of worflow applcatons whch we formulate as an NP-complete maespan mnmzaton problem [23] and propose a more effcent, effectve, and feasble soluton than exstng algorthms [8, 2, 4, 9]. The second scheme can mnmze the cost of executon whle guaranteeng the user-specfed deadlne whch we solve n three steps: deadlne assgnment, worflow parttonng, and cost optmzaton. We compare the performance of our algorthms wth sx heurstcs (n ecton 4: Opportunstc Load Balancng ( [4, 9], Mnmum Executon Tme (MET [4, 9], Mnmum Completon Tme ( [9], [8, 2], Mn-mn [8, 2], and Max-mn [8, 2]. Expermental results ndcate that our algorthms are superor n effcency, performance, farness and cost to these heurstcs. However, our algorthms may not wor well when a schedulng problem cannot be properly formulated as a typcal and solvable game, comprsed of phases whch can be specfcally defned so that game players can bargan wthout dependences between them. Our contrbutons are both theoretcal and expermental: More effcent algorthms: Our novel algorthms can assgn one mllon actvtes to thousand processors wthn.4 seconds on one Opteron 2.4GHz machne (see ecton 4; More effectve solutons: The solutons are closer to the optmal schedule. ometmes, optmal solutons can be obtaned n smple cases (see example n ecton 3; More realstc Grd models: Exstng approaches [8, 2, 4, 9] assume drect access to ndvdual processors upon schedulng decsons, whch acts aganst the ndependence of local admnstraton polces of each ste. Our approach uses the local queung system as the entry negotaton pont to each ste; More feasble resource management scheme: The schedulng and the reschedulng solutons can be easly executed by the worflow managers. The rest of ths paper s structured as follows. In ecton 2, we present the abstract worflow and Grd models, and defne the multworflow performance and cost optmzaton problems. ecton 3 descrbes the solutons for the performance and cost optmsaton schemes. ecton 4 presents a comparatve study of our algorthms wth the related wor. In secton 5 we present a revew of related wor. ecton 6 concludes the paper wth a summary and a short dscusson of future research. C 7 November -6, 27, Reno, Nevada, UA (c 27 ACM /7/ $5..

2 2. MODEL Ths secton descrbes our abstract worflow and Grd models, and defnes the optmzaton problems addressed. 2. Worflow model We focus our wor on large-scale worflow applcatons that are characterzed by a hgh (thousands to mllons number of ndependent, concurrent, and homogeneous actvtes that domnate performance of applcatons. For example, Fgure depcts three real applcatons that we use as case study n our wor: ATRO (astronomy [5], WIEN2 (chemstry [3], and MeteoAG (meteorology [29]. The detals of these applcatons are descrbed n ecton 4. The sources of ther performance bottlenecs are ths nd of actvtes, for example, poten and pgroups n ATRO, lapw and lapw2 n WIEN2, and CaseInt, RamsMaevfle, RamsInt, or Raver n MeteoAG. Currently, most related wor only consders worflows wth tens or few hundreds of actvtes, whch are not realstc large-scale applcatons. In ATRO for nstance, the number of grd cells (.e. number of pgroups and poten actvtes of a real smulaton s In WIEN2, the number of lapw and lapw2 parallel actvtes may be of several thousands for a good densty of states. In MeteoAG, the number of parallel actvtes (e.g. CaseInt, etc. could be nfnte, because there s no lower lmt to the doman or the mesh cell sze of the models fnte dfference grd. In contrast, sequental actvtes are relatvely trval n these applcatons, hence they can be served and scheduled on demand on the fastest avalable processor, snce a few sequental actvtes do not really affect the performance of large-scale worflows. Based on above motvaton, we propose the followng abstract worflow defnton to fulfll the requrements of our plot largescale worflows. Defnton 2. (Worflow Let W F = (AC, { CF D, DF D denote a worflow applcaton, where AC = AC ( =,..., K } s the set of so called actvtes classes. We defne an actvty class AC (, ( =,..., K as a set of actvtes whch have the same actvty type and can be concurrently executed. The term actvty type refers to a functonal descrpton of actvtes such as matrx multplcaton, a Fast Fourer Transform, or poten, pgroups, lapw, lapw2, etc. as shown n Fgure. In( other words, AC ( s a set of homogeneous and parallel actvtes AC ( = {A (,j j =,..., n }, where n s the number ( of actvtes n actvty class AC ( N = K n. Each actvty A (,j AC ( (j =, 2,..., n s dstngushed by the actvty class AC ( and an dentfer j wthn the class. An atomc or sequental actvty s an actvty class of cardnalty one. Fnally, CF D = {(AC source c AC sn AC source, AC sn AC} s the set of control flow dependences; DF D = { (AC source d AC sn AC source, AC sn AC } s the set of data flow dependences. 2.2 Optmzaton problem The problems of mult-worflow performance and cost optmzaton can be transformed nto a classcal NP-complete problem the problem of schedulng ndependent jobs on heterogeneous computatonal resources. In the followng defntons, we assume that the expected executon tme of actvtes s avalable from a performance predcton servce [8]. The expected executon tme conssts of two components: data preparaton tme and CPU executon tme. A Performance Predcton servce [8] based on a tranng phase and statstcal methods supples the data preparaton tmes and CPU executon tmes requred by our algorthms. Usually, for our plot CaseInt RamsMaevfle RamsInt Raver N Yes? Y RamsHst RevuDump poten tage n GeodataInt tage Out tage n nbody galaxyformaton tage Out pgroups hydro ParallelForEach CaseInt RamsMaevfle RamsInt Raver N Yes? Y RamsHst RevuDump MeteoAG ATRO tage n Kgen lapw lapw lapw lapw Lapw2 FERMI lapw2 lapw2 lapw2 sumpara lcore mxer converged tage Out Actvty class (performance bottlenec Legend Fgure : Real world worflow examples. WIEN2 Actvty class of cardnalty applcatons, the data preparaton tme s small compared to the duraton of the CPU executon tme. Defnton 2.2 (Performance (Maespan Optmzaton Problem uppose we have W worflows consstng of a set of N actvtes whch can be categorzed nto K dfferent actvty classes, where the expected executon tme of actvtes n each class s p ( = {p (,..., p( }, and p( s the expected executon tme of actvty class on ste, {,..., }, {,..., K}. uppose we have a set of Grd stes, where each ste has m processors and the processors on one Grd ste are homogeneous. The objectve s to fnd a soluton x to assgn the jobs to the Grd stes so that the overall maespan t(x of all worflows s mnmzed. Defnton 2.3 (Cost Optmzaton Problem uppose we have the same nput as n Defnton 2.2 ncludng the followng extra constrant: each ste has a prce ϕ. The objectve s to fnd a soluton x to assgn the jobs so that the cost s mnmzed and deadlne s guaranteed, whch s expressed n the followng formulatons: guarantee the deadlne of each worflow w {,..., W }: t w (x < Deadlne w, mnmze the overall cost: Cost(x = K = t( ϕ where t ( remanng executon tme of actvty class on ste. 2.3 Grd model In ths secton, we defne a more reasonable Grd model, whch maes our resource and worflow management scheme more controllable and montorable. As we now, Grd stes are not fully controllable for outsde users, as jobs are submtted to local resource manager le Portable Batch ystem (PB, Load harng Faclty (LF, Condor, or W-GRAM. However, most schedulng algorthms just schedule jobs to processors, whch s not realstc and leads to unpractcal solutons. In contrast, we schedule and manage jobs based on processng rates of actvtes on each ste, whch s controlled through approprate job submsson at runtme. Characterzng Grd resource access behavor for worflow managers n ths way has four advantages. Frst and most mportantly, t allows

3 a, Fgure 2: ystem model. the user or worflow manager to effectvely control worflow executon on the Grd, because the worflow manager can control the processng rate by submttng the same number of actvtes as the number of allocated resources to the Grd ste. econd, the frst advantage n turn enables the adaptaton of worflows based on allocated resources, especally when the computng envronment changes and rescheduled solutons need to be performed. The rescheduled soluton can easly be performed by adjustng the number of job submssons. Thrd, t allows the executon of worflows that have other constrants on performance, cost and resource requrements. For example, our approaches allow users to flter out unwanted Grd stes, or set deadlnes for some worflows. Last, t allows more accurate predcton of worflow executon due to precse mathematcal characterzaton of solutons n the model. We consder a Grd envronment as shown n Fgure 2. The Grd has stes connected wth a communcaton means. Actvtes arrvng at each ste ( =, 2,..., may belong to W worflows. Worflow managers, whch control the executon of worflows, are dstrbuted on the Grd and compete wth each other for resources. Each Grd ste has m ( =, 2,..., processors, and the sum of processors of all stes s M = = m. In ths model, each worflow manager has queues, where each queue corresponds to one Grd ste. Queues are used by each worflow manager to control the processng of actvtes scheduled on each ste. We use the termnology and notatons smlar to [4] and also ntroduce some addtonal notatons as follows. δ ( : Current length (number of actvtes of the queue of actvty class on ste ; δ ( = = δ( : Current length (number of actvtes of the queue of actvty class ; p ( : Expected executon tme of actvty class on ste ; β ( = θ( : Job processng rate of actvty class on ste, p ( s the number of avalable processors for actvty where θ ( class on ste ; β ( = = β( : Job processng rate of actvty class ; t ( ste ; = δ( : Remanng executon tme of actvty class on β ( t = n δ ( : Remanng executon tme on ste ; β ( { } t ( = max t (, t( 2,..., t( : Remanng executon tme of actvty class ;,b λ a,b : Mean bandwdth from ste a to ste b. 3. OPTIMIZATION ALGORITHM In ths secton we descrbe our performance and cost optmzaton algorthms. 3. Performance optmzaton Frst of all, n ecton 3.., we formulate the performance optmzaton problem as a cooperatve game among the worflow managers, whch can theoretcally generate the optmal soluton. However, the optmal soluton s hard to acheve due to the hgh problem complexty. Therefore, we observe that the problem can be further formulated and solved as a sequental cooperatve game, and we present the complete algorthm and one smple example n ecton Formulaton and soluton Before startng to formulate our problem, we present the most mportant defnton n game theory what s a game? game = players + strateges + specfcaton (of payoff. When players, strateges and specfcaton of payoff are properly defned, the fnal soluton can be obtaned smoothly. In followng paragraphs, the problem s frst formulated as a cooperatve game. We consder a K player game where the K worflow managers, whch are players, attempt to mnmze the executon tme of ther own actvty class t (, whch depends on the number of actvtes n the actvty class (δ ( and the processng rate of actvty class (β (. For clarty, we assume that each worflow manager handles the executon of one actvty class. The objectve functon for each manager can be expressed as: f ( = t ( = δ( β ( = δ ( = θ ( p (, =, 2,..., K, where s a matrx of actvty dstrbuton (δ ( K n whch the actvty dstrbutons are strateges, and θ ( s the number of processors whch are allocated to actvty class on ste, whch s the embodment of payoff n our cooperatve game: θ ( δ ( = m p ( K x= δ(x w ( p (x w (x, ( where m s the number of processors on ste, and w ( weght of Grd ste for actvty class : w ( = { mn p ( } x x {} p ( y= mn x {} {p( x } p ( y = p ( y= p ( y s the. (2 We use ths weght w ( to enhance the farness of allocaton because one Grd ste has a dfferent sutablty for dfferent actvtes for many reasons, for example, the localty of data, the sze of memory, etc. Intutvely, f the executon tme t ( on Grd ste for actvty class s much shorter than on other Grd stes, we are supposed to set a hgher prorty for actvty class on ste, and allocate more resources for ths actvty class on ste. The specfc utlzaton of ths weght s explaned when we ntroduce the noton of sequental game. When the deal load balance of actvty class s acheved, the objectve functon can also be defned as: p ( f ( = t ( δ ( θ ( = θ ( θ ( <.

4 Based on the allocaton of resources and the rato of processng rate on ste to the total processng rate of the actvty class, we can defne the actvty dstrbuton as follows: δ ( = δ ( β ( β θ ( p ( = δ( ( = Accordngly, we have followng defnton. θ ( p (. (3 t( ( t ( ( K ( Weght eq.( t( ( t( ( ( ( K ( t( t( ( t ( st tage Game ( K ( ( t( ( t ( ( ( K t( ( t(l ( t l t(l ( t ( ( K t ( ( ( ( K ( ( l ( l t( l t( l ( ( l-th tage Game ( K ( ( t ( ( K t( l ( t( l ( l t( l Defnton 3. (The cooperatve optmzaton game The cooperatve optmzaton game conssts of: Managers of K actvty class as players. The set of strateges defned by followng constrants: δ ( y ( δ ( K y( = δ( δ ( =, f y ( < = m = δ ( For each player, =, 2,, K, the objectve functon f (. The goal s to mnmze smultaneously all f ( For each player, =, 2,, K, the ntal value of objectve functon f (, where s a matrx K rows by columns flled wth ntal dstrbuton of actvtes (δ ( K. For the cooperatve optmzaton game defned above, the soluton s determned by solvng followng problem: mnmze f (, subject to the constrants defned n eq. 4. In order to present the hgh complexty of ths game, we ntroduce the Lagrangan for the optmzaton problem, whch s a typcal method for fndng the extremum of a functon of some varables subject to one or more constrants. Let L(δ, η, ς, ι denote the Lagrangan where η, ς, and ι denote the Lagrange multplers. The Lagrangan s: L(δ, η, ς, ι = f ( + η( θ ( m = + ς( δ ( δ ( + (4 ι δ (. (5 Unfortunately, the exact and drect soluton (whch s also optmal to ths optmzaton problem s n general dffcult to obtan. Because the problem has hgh complexty and K varables; the soluton depends on the dstrbuton of actvtes n the same class on dfferent stes, and the dstrbuton of actvtes n dfferent classes on the same ste. In other words, the change of one varable mpacts the values of all other varables. To crcumvent ths dffculty, we derve an approxmate soluton by further formulatng ths problem as a sequental game [27] n whch players select a strategy followng a certan predefned order, and n whch some players can observe the moves of the players who preceded them. Although the optmal soluton s not achevable drectly from eq. 5, we can derve an ntermedate soluton whch s comprsed of a set of game stages based Fgure 3: Data flow of nput and output usng sequental gamebased allocaton strategy. on followng decreasng sequence: f t( ( t( f t(l ( t(l 2 f t(2 ( t(... f t(l ( t(l f (, where t denotes the stage of sequental game, t(l s the l th stage game, and denotes the optmal soluton. At each stage, the players (managers of actvty classes provde a set of strateges (dstrbuton of ther own actvtes based on the allocaton of resources of the last stage, and then the new allocaton of resources s generated by usng eq.. The frst step n the sequental game s to ntalze the dstrbuton of actvtes t(. At the ntal stage t(, every actvty class assumes that all processors are avalable for t and s allocated an amount of processors on the bass of processng rate on each ste by usng the followng equaton: δ ( = δ ( β ( β m p ( = δ( ( y= m y p ( y The resource allocaton of the ( l th stage (Θ t(l, where Θ s a matrx of actvty dstrbuton, s calculated based on θ ( K the actvty dstrbuton of last stage ( t(l. Accordngly, the actvty dstrbuton of the l th stage ( t(l s calculated based on the resource allocaton of l th stage (Θ t(l. These steps fully embody the dea of a sequental cooperatve game. From eqs. and 3, as shown n Fgure 3, we have: Θ t(l = Θ( t(l ; (6 t(l = (Θ t(l. (7 In the followng, we explan why the weght defned n eq. 2 s mportant and how we utlze t. The man dea of our method s to accumulate the optmzaton effects wth many game stages untl achevng a certan load balance among actvty classes. Due to the demand of the cumulatve effects, we need a weght to generate postve mpacts on the results of every game stage. Ths weght of one actvty ought to be comparable wth the weght of actvtes n the same actvty class on dfferent stes and the weght of actvtes n dfferent actvty classes on the same ste. Based on above noton, we have the defnton of mportance weght n eq. 2, whch s the normalzed value of expected executon tmes. Furthermore, how to utlze the mportance weght s also nnovatve, because the cumulatve effects need to be transferred to the next stage and affects the fnal optmzaton results; hence we need a ntermedate varable to accept, preserve, and transfer these effects. In our case, the ntermedate varable s the resource allocaton matrx Θ, whch accepts the effects from the mportance weghts and transfers the effects to the actvty dstrbuton matrx..

5 Algorthm Performance optmzaton algorthm (Game-quc schedulng algorthm Input: W F, p (, δ (, m, constrants Output: t(l - Dstrbuton of actvtes, Θ t(l - Allocaton of resources tep. Intalze and weght of actvty classes, and apply constrants. For each wf W F do 2. For each AC ( wf to be scheduled next accordng to the control flow dependences do 3. add AC ( to the set of game players 4. For each Grd ste do 5. calculate w ( by applyng eq calculate δ ( by applyng eq. 3 to buld 7. End for End for End for tep 2. earch the fnal dstrbuton of actvtes and the allocaton of resources 8. do 9. For each Grd ste do. calculate θ ( by applyng eq. 6 to buld Θ t(l. End for 2. For each Actvty class do 3. calculate δ ( by applyng eq. 7 to buld t(l 4. End for 5. Whle K (t( ( t(l t ( ( t(l > ɛ tep 3. Verfy the constrants, ncrease resource allocaton for unsatsfed actvty classes, and repeat tep ( and (2 tep 4. Remove completed actvty classes from queues, and repeat teps ( and (2 untl all wf W F are completed To explan why we acheve better performance, the relaton between the so called aggregated executon tme (AET and maespan s presented n ths paragraph. When we acheve load balance, the maespan can approxmated by dvdng the AET of all actvtes wth the total number of processors: maespan = t m =, m where t denotes the remanng executon tme on ste, and m denotes the number of processors on ste. At each stage, our algorthm produces even load dstrbutons of actvtes and, therefore, when AET s decreased by our algorthm, the maespan s decreased n proportonally. In other words, our algorthm ndrectly reduces the maespan by reducng the AET, as also shown n our expermental results Game-quc Algorthm In the followng, we explan our optmsaton algorthm called Game-quc by usng one smple example. Fgure 4 presents a scenaro n whch Game-quc outperforms other heurstcs. The frst matrx presents the expected executon tmes for four actvtes: {A, A, A 2, A 3 } on four machnes: {M, M, M 2, M 3 }. In ths partcular case, Game-quc gves a maespan of 8, whch s also optmal n ths case, whle Mn-mn gves a maespan of 2, Maxmn and a maespan of 9, and performs the worst and gves a maespan of 25. Intutvely, MET assgns all tass to the fastest machne M 3 and gves the worst maespan of 49, hence, we do not show the mappng of MET. Fgure 5 presents the ntermedate data generated by Game-quc for ths scenaro. Algorthm shows the pseudo-code for plannng the executon of a worflow. tep. After acqurng the nformaton about actvtes and resources (e.g. the matrx of expected executon tmes n Fgure 4, we generate an ntal dstrbuton of actvtes and a weght matrx (see lnes 7, as shown n Fgure 5. In ths smple case, these two matrces are dentcal because we have one processor on each cluster and one actvty n each actvty class. In tep, users are also allowed to set performance constrants, or flter out some unwanted Grd stes. Ths functonalty s supported effcently and effectvely by our algorthm. To flter out some unwanted Grd stes for certan worflows, we can smply set the weghts of the worflows on those stes to zero and not dstrbute any actvtes to those stes, wth no further schedulng steps requred. To assure that all constrants are satsfed, constrants can be verfed agan n tep 3. tep 2. Every teraton of the Whle loop (see lne 8 5 s one game stage, where every stage s comprsed of sub-games. In other words, there s one sub-game on one Grd ste. In each subgame, all actvty classes compete for resource allocaton, and the actvty classes wth relatvely heaver weght wn the sub-game on one ste and obtan more resources at the next stage. These actvty classes, however, cannot wn everywhere due to the defnton of weght (the sum of weghts of one actvty class s. Therefore, wners of the sub-game on one Grd ste must be losers on other Grd stes. mlarly, losers lose resources on one Grd ste, but become wnners on other stes and acheve compensaton from somewhere else. Ths process s repeated untl no more performance can be ganed. The further processng of the algorthm depends on the evaluaton result at lne 5: K (t( ( t(n t ( ( t(n > ɛ, where ɛ can be used to control the number of stages and the degree of optmzaton. For the experments n ths study, we set ɛ to zero. The nput and output data flow of each game stage s shown n Fgure 3. Every stage of the game gets results from the last stage, and sends new results to the next stage for further optmzaton. pecfcally, as shown n Fgure 5, we apply eq. 6 at lne to generate the frst resource allocaton matrx Θ t(. Based on Θ t(, we use eq. 7 at lne 3 to generate the frst actvty dstrbuton t(. Thereafter, we repeat the teraton untl we reach the upper lmt of optmzaton. In addton, we can use ɛ to control the number of stages. For example, n ths case when ɛ s set to., the algorthm completes at stage 29. When ɛ s set to zero, the algorthm completes at stage 73. tep 3. In ths step we verfy deadlne constrants and ncrease the allocated resources for unsatsfed actvty classes. tep 4. In ths step we elmnate the earlest completed actvty classes. To utlze the released resources by the completed actvty classes, we repeat teps and 2 to recompute the dstrbuton of the rest of uncompleted classes untl all worflows are completed. In order to recompute a new dstrbuton and allocaton, we need an teratve algorthm whch must be run perodcally when the envronment parameters change. In terms of feasblty and predctablty, the soluton provded by our approach can be practcally performed and the performance of worflows can easly be predcted, because our approach explctly ndcates the number of allocated processors of each actvty class. Therefore, the worflow manager can easly control the processng rate of each actvty class, and precsely predct the executon tme based on that. Moreover, the load mbalance can easly be detected and solved by the worflow manager, whch s an mportant feature for Grd computng snce load mbalance s a man source of performance bottlenecs. In contrast, t s not practcal and predctable that other algorthms just produce the executon sequence of actvtes. On the one hand, Grd stes are not fully controllable to outsde users (ncludng wor- M M M2 M3 A A A A Expected executon tme M A 8 A A 2 A 3 9 M M M M2 A 4 A 2 A 2 8 A 3 M M M3 8 Game-quc (Optmal Mn-mn (maespan=8, sum=57 (maespan=2, sum=6 M M M2 M M 3 M M2 M3 M M M2 M3 A A A A A A A A2 A2 A A3 A3 Max-mn (maespan=9, sum=67 (maespan=9, sum=6 (maespan=25, sum=7 Fgure 4: A smple example that llustrates the stuaton where the Game-quc outperforms other algorthms.

6 Weght End t( eq.( eq.( st tage Game t(73 t(73 t( t(73 eq.( 73st tage Game End t(29. t(29 t(2 t( nd tage Game 29th tage Game Fgure 5: Intermedate data of Game-quc t( flow managers and we cannot decde whch processor wll be used for the next actvty, hence the executon process cannot be the same as the scheduled plan. On the other hand, t s hard to predct the executon tme of worflows, snce the executon process s a mxture of all worflows (or actvty classes. Hence, our approach performs better on achevng feasblty and predctablty Algorthm analyss The tme complexty of the Game-quc algorthm s O(l K and the space complexty s O(K, where l s the number of stages of the sequental game, K s the number of actvty classes, and s the number of Grd stes. Table contans the measured length of game stage l, and the algorthm executon tmes of Game-quc and Mn-mn for dfferent problem szes measured on a machne wth Dual Core Opteron GHz processors and GB of RAM. From the table, we can notce that Game-quc scales well, snce the number of game stages does not ncrease n proporton to the number of processors and actvtes. Even when there are 6 actvtes and 4 processors, the algorthm just needs 593 stages and.36 seconds to complete the optmzaton. The convergence processes of performance optmzaton are very fast, as shown n Fgure 6. In ths experment, we randomly generated fve examples to assgn 2 4 actvtes to 2 2 processors. Wth about 3-4 stages, more than 9% optmzaton has been completed, and the entre optmzaton processes need about 6 stages for ths problem sze. The reason why the convergence process s fast s that, to some extent, every actvty class s a wnner on certan stes, and all of them can acheve performance mprovement. pecfcally, at the begnnng of a game, every actvty class moves ther wor load to the stes whch are more effcent for them, and bargan for resources. If they cannot successfully bargan and acheve more resources, they move worloads to the less mportant stes to them. Fnally, all actvty classes reach a balance pont, and Executon tme (sec 2.2e+7 2.e+7 2e+7.9e+7.8e+7.7e+7.6e+7.5e+7.4e+7.3e+7.2e Game stage Example Example 2 Example 3 Example 4 Example 5 Fgure 6: Convergence process of performance optmzaton. no more mprovement can be acheved. 3.2 Cost optmzaton We ntroduce a cost optmzaton algorthm, named Game-cost, based on a smlar dea as the Game-quc algorthm. The frst step s to assgn deadlnes to actvty classes and partton worflows nto sub-worflows accordng to assgned deadlnes. Then we apply our cost optmzaton algorthm on each sub-worflow Worflow parttonng Parttonng a worflow nto smaller sub-worflows and assgnng them to dfferent games for further optmzaton are the two ey steps n the desgn of tme-constraned algorthms. Our partton s based on the deadlne of actvty classes. In ths paper we use a statc deadlne assgnment method called Effectve Deadlne (ED ntroduced n [25], n whch the deadlne of any actvty s the overall worflow deadlne mnus the total expected executon tme of ts subsequent actvtes. Fgure 7 presents one example of deadlne assgnment and parttonng of a worflow consstng of four actvty classes. Accordng to the user-specfed deadlne and wor amount of each actvty class, the four deadlnes {d, d 2, d 3, d 4} are assgned to the parttons {P, P 2, P 3, P 4} by usng the ED method. Thereafter, we sort the deadlnes and dentfy game phases between two adjacent deadlnes. In ths example, the optmzaton process s dvded nto three game phases, where each game phase s assocated to a color n Fgure 7 (.e. phase one between and d, phase two between d and d 2, and phase three between d 2 and d 3 (d 3 = d 4. Our cost optmzaton algorthm s appled to solvng the cost optmzaton problem on each game phase, where dfferent game phases are ndependent of each other Formulaton and soluton To optmze the cost of worflows, we consder a K player sequental cooperatve game, where K worflow managers try to mnmze ther costs whle guaranteeng a deadlne. Each manager cond 3 d 4 Grd ste Actvty class tage Game-quc Mn-Mn Processor Actvty (mlsec. (mlsec , , , > hour , 65 > hour , 856 > hour P 3 P 2 P 3 P 2 P 4 P GamePhase 3 d 2 GamePhase 2 d GamePhase Table : Game stages and algorthm executon tmes. Fgure 7: Partton of cost optmzaton game.

7 trols the executon of one actvty class. The game s comprsed of a set of stages, where t denotes the stage of the sequental game and t(l s the l th stage game. The objectve functon for each manager ( =, 2,..., K can be expressed as: f ( = c ( = = p ( δ ( ϕ, (8 where c ( s the cost of actvty class and ϕ s the prce of ste. When we acheve the best prce/performance rato, the followng deadlne constrant, whch s for the dstrbuton of actvty on ste (δ ( and the resource allocaton of actvty on ste (θ (, s supposed to be satsfed: d phase δ( p ( θ (, (9 where d phase s the deadlne of current phase. The resource allocaton for each actvty class s defned by the followng equaton: θ ( δ ( ( = m p ( K x= δ(x cw ( p (x cw (x, ( where cw ( s the mportance weght of ste for actvty class : cw ( = ϕ p ( y= ϕ y p y (. ( We use ths mportance weght to mprove the farness of resource allocaton. For ths cooperatve cost optmzaton game, the soluton s determned by solvng the followng optmzaton problem: mnmze subject to the constrant: max {t ( } = max (c ( δ ( = θ ( p ( d phase. Let L(δ, η denote the Lagrangan where η denotes the Lagrange multplers. The Lagrangan s: L(δ, η = = p ( δ ( ϕ + η( δ ( = θ ( p ( d phase. Unfortunately, the drect and exact (optmal soluton to ths problem s dffcult to obtan too. Based on a smlar dea as n the Gamequc algorthm, we have the followng decreasng sequence: c t( ( t( c t(l ( t(l 2 c t(2 ( t(... c t(l ( t(l c (. (2 The termnaton condton of the sequental cooperatve game s judged as follows: c t(l+ ( t(l c t(l ( t(l. (3 whch means that Game-cost cannot reduce costs any more. Algorthm 2 Game-cost optmzaton algorthm Input: subw F, p (, δ (, m, ϕ, d phase, constrants Output: t(l - Dstrbuton of actvtes, Θ t(l - Allocaton of resources tep. ort Grd ste for each actvty class by ncreasng performance/prce rato tep 2. Intalze t( and the weghts of actvty classes, and apply constrants. For each wf subw F do 2. For each AC ( wf n ths game phase to be scheduled next accordng to the worflow parttons do 3. add AC ( to the set of game players 4. For each sorted Grd ste for AC ( do 5. calculate cw ( by applyng eq. 6. assgn δ ( by applyng m d phase p ( to buld t(, accordng to the sort order of resources 7. End for End for End for tep 3. earch the fnal dstrbuton of actvtes and the allocaton of resources 9. do. For each Actvty class do. For each sorted Grd ste do 2. calculate θ ( 3. calculate δ ( by applyng eq. 4 to buld Θ t(n by applyng eq. 5 to buld t(n 4. If all actvtes are allocated then 5. brea; 6. End f End for End for 7. Whle K (c( ( t(n c ( ( t(n > ɛ tep 4. If the deadlne are not satsfed, apply Game-quc and repeat tep (3 Accordng to the analyss mentoned above, a new allocaton Θ t(l can be acheved based on the dstrbuton of last stage t(l. Wth the new allocaton Θ t(l n the same stage, the new dstrbuton t(l can be generated for evaluaton. From eqs. 9 and, we obtan: Θ t(l = Θ( t(l ; (4 t(l = (Θ t(l = d phase θ ( p (. (5 Algorthm 2 shows the pseudo-code of the algorthm for plannng the worflow executon. After acqurng the nformaton about actvtes and resources, we can sort the resources for every actvty class, and generate an ntal dstrbuton of actvtes and ntal allocaton of resources. At the begnnng, actvty classes are compettors on the ste whch has the hghest prce/performance rato. After one stage competton, wnners get more processors from one resource. In the next stage, losers compete for resources whch have the second hghest prce/performance rato. The dfference between Game-quc and Game-cost s that the compettors of Game-quc contend for resources on all Grd stes, but the compettors of Gamequc compete for resources from the Grd stes wth the hghest prce/performance rato to the ones wth lowest prce/performance rato. Ths process s repeated untl no more costs can be reduced, whch s determned by the condton at lne 8. In addton, due to tght deadlnes and non-bactracng nature of the algorthm, sometmes t mght not be possble to meet the deadlnes for all actvty classes. If ths happens, our Game-cost nvoes Game-quc to meet the deadlne frst, and then optmze costs based on the results of Game-quc. Ths has been proved very effectve when we have tght deadlnes, because other heurstcs n these cases are not able to return a complete schedule. The tme complexty of the cost optmzaton algorthm s O(l K and the space complexty s O(K, where l s the number of game stages, K s the number of actvty classes, and s the number of Grd stes. The convergence process of cost optmzaton s very fast, as shown n Fgure 8. The number of stages for cost optmzaton are much fewer than for performance optmzaton, because the deadlne constrants lmt the convergence processes. In addton, the ntalzaton process n Game-cost s dfferent from the Game-quc, hence the convergence process s much shorter. In ths experment, we randomly generate 5 examples to assgn 2 4 actvtes to 2 2 processors. After about 2 3 stages, optmzaton has almost been completed, and the entre optmzaton processes need about 5 stages for ths problem sze.

8 Cost.2e+9 e+9 8e+8 6e+8 4e+8 2e Game stage Example Example 2 Example 3 Example 4 Example 5 Fgure 8: Convergence process of cost optmzaton. 4. EXPERIMENTAL REULT In ths secton, we frst compare the tme and space complexty of dfferent approaches, and then show the schedulng results of two real applcaton on Austran Grd to explan the advances of our algorthms. To ensure the completeness of our experments, we also evaluate and compare dfferent algorthms over a complex smulated system and large amount of actvtes based on dfferent machne and actvty heterogenety. All measurements were performed on a machne wth Dual Core Opteron GHz processors and GB of RAM. 4. Complexty and executon tmes The computatonal tme complexty s an mportant measure for comparson of dfferent algorthms. We have mplemented [4, 9], MET [4, 9], [9], [8, 2], Mn-mn [8, 2], Max-mn [8, 2] n our system, and modfed them to wor on classes of actvtes nstead of actvtes. The executon tme of Game-quc and Game-cost algorthms s dstnctly less than all other algorthms. The tme complexty s only related to the number of actvty classes (K and the number of clusters (. When we assgn 5 actvtes to 3 processors, the executon tme of our algorthm s less than.4 seconds, as shown n Table 2, whle other algorthms may need several hours to generate comparable solutons. MET, whch has asymptotc complexty of O(M +N, executes for less than second, where M s the number of processors and N s the number of actvtes. However, the results of MET have serous problems, because MET schedules most actvtes to the fastest Grd stes. and have asymptotc complexty of O(M N, but ther results are much worse than our algorthms. wth asymptotc complexty of O(M N ω(ω N executes for an average of 2 3 seconds. There s no performance dfference between -C and. Mn-mn and Max-mn (and Duplex have asymptotc complexty of O(M N 2 and an average executon tme of 2 3 seconds. There are some other algorthms to whch we do not compare such as Wor Queue (WQ [2], Heterogeneous Earlest Fnsh Tme (HEFT [7], Genetc Algorthms (GA [7, 22], or A* [3]. WQ, Algorthm Tme Tme pace Complexty (seconds Complexty Game-quc, Game-cost O(l K <.4 O(K MET O(M + N < O(M + N, O(M N 2 3 O(M + N,-C O(M N ω 2 3 O(M + N Mn-mn, Max-mn O(M N O(M + N Duplex, HEFT O(M N O(M + N GA-based solutons scales poorly >> 2 3 O(M + N A exponental >> 2 3 exponental Table 2: Comparson of tme complexty and executon tme of algorthms when we assgn 5 actvtes to 3 processors. however, s just for homogeneous parallel machnes. HEFT degrades to Mn-mn for large-scale applcatons. GA-based solutons and A* scale poorly as the number of actvtes and processors ncreases, and ther executon tmes are sgnfcantly hgher than other algorthms, though they can decrease the maespans of Mn-mn by 5%-% accordng to related wor [7]. Other algorthms are smlar to mplemented algorthms, or not practcal for large-scale worflows due to reasons mentoned. Therefore, we dd not mplement and compare them wth our algorthms. Although the space complexty s not as mportant as the tme complexty because most schedulng algorthms are not memory ntensve, t s stll worth to menton that, for large-scale applcatons (usually M >> K, the space complexty of Game-quc and Game-cost O(K s much lower than that of other algorthms O(M + N or exponental. Compared wth exstng algorthms, our Game-quc and Gamecost algorthms are the most effcent for large-scale applcatons characterzed by large number of homogeneous actvtes. For such applcatons, the schedulng problem can be easly formulated a typcal and solvable game, although we cannot exclude the possblty that there wll be large-scale applcatons wth tens of thousands of dfferent types of actvtes. olvng ths latter problem needs further research on game parttonng technques. 4.2 Real applcatons In the followng, we evaluate our proposed methods usng two real world scentfc worflow applcatons executed n a natonal Grd nfrastructure. WIEN2 [3] s a program pacage for performng electronc structure calculatons of solds usng densty functonal theory based on the full-potental (lnearzed augmented plane-wave ((LAPW and local orbtal (lo method. We have ported the applcaton onto the Grd by splttng the monolthc code nto several course-gran actvtes coordnated n a worflow, as already llustrated n Fgure. The lapw and lapw2 actvty classes can be solved n parallel by a fxed number of homogeneous -ponts. A fnal actvty converged appled on several output fles tests whether the problem convergence crteron s fulflled. AstroGrd [5] s an astronomcal applcaton llustrated n Fgure whch solves numercal smulatons of the movements and nteractons of galaxy clusters usng an N-Body system. The computaton starts wth the state of the unverse at some tme n the past and s done to the current tme. Galaxy potentals are computed for each tme step, and then the hydrodynamc behavor and processes are calculated and descrbed. We executed these applcatons on a subset testbed of the Austran Grd nfrastructure consstng of a set of parallel computers and worstaton networs accessble through the Globus toolt and local job queung systems as separate Grd stes. For the sae of clarty, our expermental testbed conssts of two clusters, one at the Unversty of Innsbruc and the other at the Unversty of Lnz, and we just use 4 processors on each Grd ste. The characterstcs of the machnes are gven n Table 3. In ths experment, we evaluate the performance of Mn-mn (the best of the other heurstcs and Game-quc by comparng maespan, AET (see ecton 3.. and farness of these two applcatons. We quantfy the farness by usng the Jan s farness ndex [24]: farness = ( W 2 w= T w W W w= T 2 w where W s the number of worflows, and T w s the executon tme te ze GHz Archtecture Mgr. Locaton hc-ma.ub 4 (8 2.2 EM64,COW GE Innsbruc altx.ju 4 (64.6 I2,ccNUMA PB Lnz Table 3: The Austran Grd testbed.,

9 35 P4 farness= altx hc-ma P3 P2 P P4 P3 P2 sum=622.5 maespan=24 Algorthm executon tme(ms MET Mn-mn Max-mn Gamequc HH HLo LoH LoLo P hc-ma altx P4 P3 P2 P P4 P3 P2 P Tme (second (a Mn-mn Tme (second (b Game-quc. farness= sum=585 maespan=225 Fgure 9: chedulng results of two real applcatons (WIEN2 and ATRO where the Game-quc gves a shorter maespan and AET, and better farness than Mn-mn. of worflow w. The farness value ranges from zero to one, where farness = ndcates the worst farness, and farness = the best farness. Fgure 9 presents a scenaro n whch Game-quc outperforms Mn-mn. In ths partcular case, as shown n Fgure 9(a, Mnmn gves a maespan of 24, an AET of 622.5, and a farness of.862. Comparng the results of Game-quc (see Fgure 9(b wth the results of Mn-mn, we notce that Game-quc mproved the maespan of the worflows by 6.67%, the AET by 2.36%, and the farness by 6%. Notce that the farness of Game-quc s almost perfect ( Moreover, we can ntutvely observe that t s hard to predct the executon tme of each worflow for the executon plan n Fgure 9(a, because the worflows are nterleaved and form a mxture of actvtes of dfferent worflows. Contrarly, Game-quc yelds an executon plan n whch each worflow s executed based on the control of processng rates, hence actvtes of one worflow can be consdered to be executed on some dedcated processors. Therefore, we can acheve better maespans and AET from Game-quc, obtan almost perfect farness, and more precsely predct the worflow executon tmes. 4.3 Performance optmzaton For the completeness of our experments and the unversalty of the expermental results, we evaluate and compare dfferent algorthms for dfferent scenaros. We frst ntroduce our smulaton envronment, and then the maespans, farness and AET of dfferent algorthms are compared based on the classfcaton of actvty and machne heterogenety. Fgure : Algorthm executon tmes for assgnng more than 5 actvtes to about 3 processors. cenaro No. of No. of No. of Actvty Actvty Machne Procs. Clusters Actvtes Classes Heterog. Heterog. HH [, ] [, ] HLo [, ] [, ] LoH [, ] [, ] LoLo [, ] [, ] Table 4: Consstent computng envronment. We use ETC (expected tme to compute [7] matrx to evaluate all algorthms. ETC matrx categorzes actvtes and resources by the degree of ther heterogenety. Machne heterogenety represents the varaton that s possble among executon tme for a gven actvty class across all the machnes, whle actvty heterogenety s defned as the amount of varance among the executon tmes of actvty classes for a gven machne [7]. We assume that actvtes n one actvty class are homogeneous, and machnes n one cluster are homogeneous. To smulate the real computng envronment, we use consstent and nconsstent matrces. Consstent denotes f whenever a machne a executes any actvty faster than machne b, then machne a execute all actvtes faster than machne b; Inconsstent matrces characterze the stuaton where machne a may be faster than machne b for some actvtes and slower for others [7]. We evaluate the algorthms for four dfferent scenaros: Hgh actvty and hgh resource heterogenety (HH, Hgh actvty and low Resource heterogenety (HLo, Low actvty and hgh Resource heterogenety (LoH, and Low actvty and low Resource heterogenety (LoLo. Tables 4 and 5 present the detals of the smulated computng envronment. Expected executon tmes of actvtes are generated based on actvty and machne heterogenety, whch are selected from a unform dstrbuton n the specfed ranges. Hgh machne heterogenety (n the range of [, ] causes a sgnfcant dfference n a actvty s executon tme among Grd stes. Hgh actvty heterogenety (n the range of [, ] ndcates that expected executon tmes of dfferent actvtes have great dfference. We assume that the number of actvtes are randomly generated from a unform dstrbuton n the range of [, 2], and the number of processors on one Grd ste n the range of [64, 28]. The algorthm executon tmes for consstent and nconsstent metrcs are smlar, and shown n Fgure. The relatve executon tme of algorthms from the best to worst was: ( Game-quc, (2 MET, (3, (4, (5 Mn-mn, (6 Max-mn (sometmes, and (7 (sometmes Max-mn. Not only are the tme complextes much lower than that of other algorthms, but Game-quc algorthm gves the best maespans and machne utlzaton. For all cases, the relatve performance order of algorthms from the best to worst was: ( Game-quc, (2 Mnmn, (3, (4, (5 Max-mn, (6, and (7 MET. In terms of farness, Game-quc always acheved almost perfect farness (see Fgure (g and Fgure 2(g. On average, the farness value of Game-quc s.99.

10 cenaro No. of No. of No. of Actvty Actvty Machne Procs. Clusters Actvtes Classes Heterog. Heterog. HH [, ] [, ] HLo [, ] [, ] LoH [, ] [, ] LoLo [, ] [, ] Table 5: Inconsstent computng envronment Consstent heterogenety Table 4 presents the four nput scenaros and Fgure llustrates our results. We wll dscuss the performance of all algorthms n the order from the slowest to fastest. For all four consstent cases, MET gves the worst results, because t maps all actvtes to the fastest machne, thus, MET does not appear n the fgures. usually performs the second worst. Ths s because there s no cooperaton between dfferent actvty classes, and the resources are selected based on ther avalablty wthout consderng the actvty executon tme. In many cases, maps actvtes to the worst machnes. Max-mn gves a poor result because t only fts the stuaton when some actvtes are much larger than the others. At the begnnng of executon, Max-mn helps the executon of larger actvtes, but smaller actvtes are gnored. Ths very specal stuaton s seldom encountered, thus, Max-mn s not a good opton. In addton, there s no farness to smaller actvtes, hence Max-mn performs worse than most algorthms. performs qute well for hgh machne heterogenety scenaros, because there s a hgher lelhood that t selects the fastest machne for actvtes, especally for larger actvtes. Ths algorthm performs poorly for low machne heterogenety scenaros because t does not compare the actvty executon tme and only consders the completon tme. On the other hand, under these stuatons, the dfferences between faster and slower machnes are blurred. Therefore, t s less lely to select the fastest machne for the actvty and cannot acheve better performance for low machne heterogenety scenaros. performs qute smlar to for hgh machne heterogenety, but performs 5%-% better than for low machne heterogenety scenaros. Ths s because maes more ntellgent decsons by consderng the actvty executon tme. Theoretcally, performs better than Mn-mn f the executon tme of one actvty on a certan machne s much slower than anywhere else (.e. actvty suffers f not run on a specfc machne. For the heterogeneous envronments consdered n ths study, ths type of specal case never occurs. Mn-mn performs well, gvng the second best results n each case. In contrast to Max-mn, at the begnnng of executon t only handles the smallest actvtes and gnores larger ones. In the mdst of executon, Mn-mn stll handles smaller actvtes more frequently than larger actvtes, whch mples that smaller actvtes have hgher prortes than larger ones. There s not enough farness among actvty classes, and that s why Mn-mn loses performance. Game-quc schedulng algorthm provdes the best performances for all four scenaros, because t can mae the best globally ntellgent decsons. It performs about % better than Mn-mn for LoH scenaro, and 5% better for the other three scenaros. We can observe that when farness s ensured, effcency s mproved. Accordng to related wor [7], GA was able to mprove upon the Mn-mn soluton by 5%-%, whch means Game-quc performs as well as GA, but wth much shorter algorthm executon tme Inconsstent heterogenety Table 5 presents the four nput scenaros and Fgure 2 llustrates the results. For all four consstent cases, MET gves the worst results, because t maps most actvtes to a few fastest clusters. MET could perform better than, when the fastest clusters for dfferent actvty classes are dstrbuted evenly n the computng envronment, but ths specal case rarely occurs. Therefore, MET s stll the worst heurstc., Max-mn,, and perform worse for nconsstent than for consstent scenaros, because nconsstent computng envronments are more complex than consstent ones. Faster machnes do not always perform better than slower machnes. Based on the algorthm desgns, most exstng algorthms cannot effectvely handle the hgh heterogenety of machnes, whch results n poor maespans. Therefore, t s more lely that, and assgn more actvtes to slower machnes. In contrast, Mn-mn performs better for nconsstent than for consstent scenaros, because the fastest machnes are allocated evenly; thus, Mn-mn s able to assgn more actvtes to the fastest machnes, though t does not ntentonally handle the change of envronment ether. Game-quc stll provdes the best mappng for the nconsstent cases. For the same reason as for consstent scenaros, Game-quc s able to acheve better performance than others. subsectoncost optmzaton Fgure 3 llustrates n percentage the cost of each algorthm normalzed aganst the cost of Game-cost. We do not present as many results as prevous experments for Game-quc because most exstng algorthms are not comparable to Game-cost, even f we have optmzed the heurstcs for ths problem. From Fgure 3(a t can be observed that all algorthms need more than twce the cost of Game-cost. Consequently, we modfed the orgnal heurstcs to ncorporate cost and deadlne control nto,, Mn-mn, Maxmn, and. The optmzed algorthms are mared wth *, for example, *, *, Mn-mn*, Max-mn*, and *, as shown n Fgure 3(b. MET s not shown n the fgures because t cannot meet the deadlne. The relatve cost order of algorthms from the best to worst was: ( Game-cost, (2 *, (3 *, (4 Mn-mn*, (5 Maxmn*, (6 Game-quc, and (7 *. In ths case, Game-cost fnds mappngs whose costs are better than * by 27%, * by 45%, and better than other algorthms by at least 5%. Moreover, Game-cost algorthms can be combned wth Game-quc to avod the problem that deadlnes cannot be met. Ths problem frequently occurs when we set a relatvely short deadlne, because f Percent (% Percent (% (a Percentage of cost. * * Mn-mn* Max-mn* * Gamequc Mn-mn Max-mn Gamequc Gamecost Game-cost (b Percentage of cost (optmzed heurstcs. Fgure 3: Cost optmzaton results.

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,