A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems

Proceedngs of the Internatonal Conference on Parallel and Dstrbuted Processng Technques and Applcatons, PDPTA 2008, Las Vegas, Nevada, USA, July 14-17, 2008, 2 Volumes. CSREA Press 2008, ISBN 1-60132-084-1 A Genetc Algorthm Based Dynamc Load Balancng Scheme for Heterogeneous Dstrbuted Systems Bbhudatta Sahoo 1, Sudpta Mohapatra 2, and Sanay Kumar Jena 1 1 Department of Computer Scence & Engneerng, NIT Rourkela, Orssa, Inda 2 Department of Electroncs & Electrcal Communcaton Engneerng, IIT Karagpur, Inda Abstract - Load balancng s a crucal ssue n parallel and dstrbuted systems to ensure fast processng and optmum utlzaton of computng resources. Load balancng strateges try to ensure that every processor n the system does almost the same amount of work at any pont of tme. Ths paper nvestgates dynamc loadbalancng algorthm for heterogeneous dstrbuted systems where half of the processors have double the speed of the others. Two ob classes are consdered for the study, the obs of frst class are dedcated to fast processors. Whle second ob classes are generc n the sense they can be allocated to any processor. The performance of the scheduler has been verfed under scalablty. Some smulaton results are presented to show the effectveness of genetc algorthms for dynamc load balancng. Keywords: Heterogeneous dstrbuted system, dynamc load balancng, makespan, genetc algorthm. 1 Introducton Dstrbuted heterogeneous computng s beng wdely appled to a varety of large sze computatonal problems. These computatonal envronments are conssts of multple heterogeneous computng modules, these modules nteract wth each other to solve the problem. In a Heterogeneous dstrbuted computng system (HDCS), processng loads arrve from many users at random tme nstants. A proper schedulng polcy attempts to assgn these loads to avalable computng nodes so as to complete the processng of all loads n the shortest possble tme. The resource manager schedules the processes n a dstrbuted system to make use of the system resources n such a manner that resource usage, response tme, network congeston, and schedulng overhead are optmzed. There are number of technques and methodologes for schedulng processes of a dstrbuted system. These are task assgnment, load-balancng, load-sharng approaches [7, 9, 10]. Due to heterogenety of computng nodes, obs encounter dfferent executon tmes on dfferent processors. Therefore, research should address schedulng n heterogeneous envronment. In task assgnment approach, each process submtted by a user for processng s vewed as a collecton of related tasks and these tasks are scheduled to sutable nodes so as to mprove performance. In load sharng approach smply attempts to conserve the ablty of the system to perform work by assurng that no node s dle whle processes wat for beng processed. In load balancng approach, processes submtted by the users are dstrbuted among the nodes of the system so as to equalze the workload among the nodes at any pont of tme. Processes mght have to be mgrated from one machne to another even n the mddle of executon to ensure equal workload. Load balancng strateges may be statc or dynamc [1, 3, 7]. To mprove the utlzaton of the processors, parallel computatons requre that processes be dstrbuted to processors n such a way that the computatonal load s spread among the processors. Dynamc load dstrbuton (also called load balancng, load sharng, or load mgraton) can be appled to restore balance [7]. In general, loadbalancng algorthms can be broadly categorzed as centralzed or decentralzed, dynamc or statc, perodc or non-perodc, and those wth thresholds or wthout thresholds [3, 7, 11]. We have used a centralzed loadbalancng algorthm framework as t mposes fewer overheads on the system than the decentralzed algorthm The load-balancng problem, am to compute the assgnment wth smallest possble makespan (.e. the completon tme at the maxmum loaded computng node). The load dstrbuton problem s known to be NP-hard [4, 5] n most cases and therefore ntractable wth number of tasks and/or the computng node exceeds few unts. Here, the load balancng s a ob schedulng polcy whch takes a ob as a whole and assgn t to a computng node [2].Ths paper consders the problem of fndng an optmal soluton for

load balancng n heterogeneous dstrbuted system. The rest of the paper s organzed as follows. The next secton dscusses Heterogeneous dstrbuted computng system (HDCS) structure and the load-balancng problem. Secton 3 descrbes the dfferent dynamc load dstrbuton algorthms. We have smulated the behavor of dfferent load balancng algorthm wth our smulator developed usng Matlab, where each task t s wth the expected executon tme e and expected completon tme c, on machne M. The results of the smulaton wth scalablty of computng nodes and tasks are presented n Secton 4. Fnally, conclusons and drectons for future research are dscussed n Secton 5. 2 System and problem model 2.1 Heterogeneous dstrbuted computng system Heterogeneous dstrbuted computng system (HDCS) utlzes a dstrbuted sute of dfferent hgh-performance machnes, nterconnected wth hgh-speed lnks, to perform dfferent computatonally ntensve applcatons that have dverse computatonal requrements. Dstrbuted computng provdes the capablty for the utlzaton of remote computng resources and allows for ncreased levels of flexblty, relablty, and modularty. In heterogeneous dstrbuted computng system the computatonal power of the computng enttes are possbly dfferent for each processor as shown n fgure 1[1, 3, 4]. A large heterogeneous dstrbuted computng system (HDCS) conssts of potentally mllons of heterogeneous computng nodes connected by the global Internet. The applcablty and strength of HDCS are derved from ther ablty to meet computng needs to approprate resources [2, 3, 9]. Resource management sub systems of the HDCS are desgnated to schedule the executon of the tasks that arrve for the servce. HDCS envronments are well suted to meet the computatonal demands of large, dverse groups of tasks. The problem of optmally mappng also defned as matchng and schedulng. µ 1 λ Job arrvals Resource Manager Fgure: 1 Dstrbuted Computng System µ 2 µ m We consder a heterogeneous dstrbuted computng system (HDCS) conssts of a set of m {M 1, M 2, Mm} ndependent heterogeneous, unquely addressable computng entty (computng nodes). Let there are n number of obs wth each ob has a processng tme t are to be processed n the HDCS wth m nodes. Hence the generalzed loadbalancng problem s to assgn each ob to one of the node M so that the loads placed on all machne are as balanced as possble [5]. 2.2 Mathematcal model for load balancng Ths secton presents a mathematcal model for load balancng problem based on mnmax crteron. Obectve of ths formulaton s to mnmze the load at the maxmum loaded processor. Let A() be the set of obs assgned to machne M ; hence the machne M needs total computng tme T = t, whch s otherwse known as (L ) A() load on machne M. The basc obectve of load balancng s to mnmze makespan[11]; whch s defned as maxmum loads on any machne ( T = max T ). Ths problem can be expressed as lnear programmng problem, wth the obectve to Mnmze L (load of the correspondng assgnment) Mnmze L x = t, for all A() x L, for all M x {0, t } x = 0, t, for all A(), M { } x = 0, for all A(), M Where M M; set of machnes to whch the ob can be assgned. The problem of fndng an assgnment of mnmum makespan s NP-hard [5]. The solutons to ths can be obtaned usng a dynamc programmng algorthm Ο(n L m ), where L s the mnmum makespan. Due to the complexty of load balancng problem, most of researchers proposed heurstc algorthms, whle optmal algorthm are developed for only restrcted cases or for small problems[4]. Genetc algorthms (GAs) are evolutonary optmzaton approaches whch are an alternatve to tradtonal optmzaton methods. GA s most approprate for complex non-lnear models where locaton of the global optmum s a dffcult task. Hence genetc algorthms have been used to solve hard optmzaton problem. In ths paper we have analyze the performance HDCS where half the total processors have double speed than others.

3 System Model and Methodology 3.1 System and Workload Models Typcally, a load dstrbutng algorthm has four components: () a transfer polcy that determnes whether a node s n a sutable state to partcpate n a task transfer, () a selecton polcy that determnes whch task should be transferred, () a locaton polcy that determnes to whch node a task selected for transfer should be sent, and (v) an nformaton polcy whch s responsble for trggerng the collecton of system state nformaton [1, 3, 7, 13]. When a new ob arrves at the node (Fgure 3.1) the transfer polcy looks at the node s ob queue length. The ob s allowed to execute at the node f the ob queue length s less than a predetermned threshold. Otherwse Job s assgned to the central scheduler. Job transferred from other nodes Job arrvals Job completed Computng node Job transferred to other nodes Fgure: 3.1 Job flow at computng node Schedulng of tasks n a load balancng dstrbuted system nvolves decdng not only when to execute a process, but also where to execute t. Accordngly, schedulng n a dstrbuted system s accomplshed by two components: the allocator and the scheduler. The allocator decdes where a ob wll execute and the scheduler decdes when a ob gets ts share of the computng resource at the node. In ths paper we have used the computng resource model as dscussed n [6 ] Fgure 3.2: Central scheduler Queung Model Each heterogeneous computng node s multtaskng, can accommodate maxmum K no of obs for some acceptable QoS. The heterogeneous dstrbuted computng system addressed here can be expressed by Kendall notaton[14] lke M/M/m/K/n, where: () Frst M: represents exponental nter arrval tmes between obs(tasks) dstrbuton (Posson process), () Second M: represents exponental executon tme of obs dstrbuton, () m: represents number of heterogeneous computng nodes,(v) K: represents maxmum number of tasks that can be n a computng Node under the multtaskng, and (v) n: represents number of obs. Let λ be the arrval rate of obs at computng node, Hence the arrval rate at resource manager s λ, where λ = ( λ 1 + λ 2 + λ 3 + Λ + λ m ) m We have assume that the servce rate of all m heterogeneous computng nodes are dfferent,.e. μ μ for any two computng node. In ths paper we have use a heterogeneous dstrbuted computng system, wth two dfferent type of computng nodes connected va a hgh-speed network as shown n fgure 1. Half of the computng nodes (nodes) execute at double the speed of the others. The obs assgned for the executon are assumed to be hghly ndependent. That means when a ob s scheduled for executon, no ob ever deally wats for communcaton wth any other obs. Ths system can be modeled as an open queung network [1,6]. Let M F and Ms be the number of fast or slow computng nodes (machne), so that M F = Ms = m/2. We have assumed that the obs are classfed nto types as dedcated and generc obs wth nter-arrval tme λ G, and λ D respectvely. The obs of frst class are dedcated to fast processors and second class obs are generc n the sense that can be allocated to any processor. There s one arrval stream for dedcated obs and one for generc obs. Model of the system s shown n fgure 3.2. The generc obs arrve at a rate λ G, and can process by any of the computng node. We shall assume that all arrval streams are Posson process. All obs have dentcally dstrbuted servce requrements. One allocated to a partcular computng node, a ob can not be reassgned and must be process to completon by that node. The dedcated obs are mostly the local loads of the computng nodes; f a

computng node s loaded above a threshold t s not avalable for genercs obs for a perod of tme. 3.2 Dynamc load dstrbuton algorthms A dynamc load dstrbuton algorthm must be general, adaptve, stable, fault tolerant and transparent to applcatons. Load balancng algorthms can be classfed as () global vs. local, () centralzed vs. decentralzed, () Non-cooperatve vs. cooperatve, and (v) adaptve vs. nonadaptve[7,13]. In ths paper we have used centralzed load balancng algorthm, a central node collects the load nformaton from the other computng nodes n HDCS. Central node communcates the assmlated nformaton to all ndvdual computng nodes, so that the nodes get updated about the system state. Ths updated nformaton enables the nodes to decde whether to mgrate ther process or accept new process for computaton. The computng nodes may depend upon the nformaton avalable wth central node for all allocaton decson. The schedulng polces can be probablstc, determnstc and adaptve. In probablstc case, the dedcated obs are dspatched randomly to the frst processor wth equal probablty whle the generc obs are randomly dspatched to the slow processors. In determnstc case the routng decson s based on system state. Two dfferent polces are examned for ths case. In both polces, the dedcated obs on the shortest of the fast processor queues. However, the frst polcy requres the generc obs on the shortest queue of the slow processors whle the second polcy assgns generc obs to the (slow or fast) processor expected to offer the least ob response tme. However, when a generc ob s assgned to a fast processor, ob start tme depends on an agng factor. In adaptve case, ob mgraton from slow to fast processors employed. Ths s a recever-ntated load sharng method employed to mprove the performance of generc obs. The polcy s ntated when a generc ob s queued on a slow processor and a fast processor becomes dle. Only the mgraton of non-executng obs s consdered. Executng obs are not elgble for transfer because of complexty ssues. When a ob s transferred from a slow to fast processor for remote processng, the ob ncurs addtonal communcaton cost. Only obs that are watng n the queues are transferred. The beneft of mgraton depends on mgraton cost[6,13]. We have referred the workload model that s characterzed by three parameters: The dstrbuton of ob arrval The dstrbuton of processor servce tme The dstrbuton of the mgraton overhead. 3.3 Job Schedulng Polces Here we examned only the non-preemptve schedulng polces only wth a assumpton that the scheduler has perfect nformaton on () The length of all processor queue, qnd () The queung tme of dedcated obs n the fast processor queues. We have used the schedulng strategy used by Karatza et al.[6]. The schedulng strateges used for load balancng decson are Least expected response tme for generc obs maxmum wat for dedcated obs (LERT-MW) LERT-MW wth mgraton havng dea about executon tmes LERT-MW: In ths polcy also dedcated obs are dspatched to the fast processor whch s havng the least queue length, and generc ob wll sent to ether fast or slow processor expected to offer the least ob response tme. The mnmum ob response tme (makespan) s based on the user s vew of how to mprove performance. Ths algorthm needs global nformaton on queue lengths for the generc and dedcated obs, and also t requres addtonal nformaton about the tme-dedcated obs watng n a queue. LERT-MWM: In the above method we don t have pror knowledge about the executon tmes. So, we can t evenly dstrbute the load among all the nodes. The results some processors reman dle, whle others are overloaded. Ths requres the mgraton of obs form overloaded processors to dle processors. By ths process mgraton overhead may be more for small obs & results lower processors utlzaton. So we are gong for GA, whch wll use the LERT-MW n the phase of schedulng. 3.4 GA based Load Balancng Method In ths secton, we detal our schedulng algorthm whch utlzes GA for load balancng n HDCS. Genetc algorthms work wth a populaton of the potental solutons of the canddate problem represented n the form of chromosomes. Each chromosome s composed of varables called genes. Each chromosome (genotype) maps to a ftness value (phenotype) on the bass of the obectve functon of the canddate problem. The algorthm we have developed us based upon one developed by Zomalya et al.[11, 12]. Jobs arrve at unknown ntervals for processng and are placed n the queue of unscheduled tasks from whch tasks are assgned to processors. Each task s havng a task number and a sze. GA follows the concept of soluton evoluton by stochastcally developng generatons of soluton populatons usng a gven ftness statstc. They are partcularly applcable to problems whch are large, non-

lnear and possbly dscrete n nature, features that tradtonally add to the degree of complexty of soluton. Due to the probablstc development of the soluton, GA do not guarantee optmalty even when t may be reached. However, they are lkely to be close to the global optmum. Ths probablstc nature of the soluton s also the reason they are not contaned by local optma. The proposed algorthm for load balancng s presented n fgure 3.3. A fxed number of tasks, each havng a task number and a sze, s randomly generated and placed n a central task pool from whch tasks are assgned to dfferent computng nodes (processors). As load balancng s performed by the centralzed GA-based method, the frst thng to do s to ntalze a populaton of possble solutons [11, 12]. Ths can be acheved usng the sldng wndow technque. The wndow sze s fxed, wth the number of elements n each strng equal to the sze of the wndow. As load-balancng s performed by the centralzed GAbased method, the frst thng to do s to ntalze a populaton of possble solutons. Every tme when a ob arrved at queue of unscheduled tasks (task pool), the ob s scheduled by usng LERT-MW method and placed n correspondng queue. After a nterval of tme we wll apply GA and apply the obs to the correspondng processors. If we apply GA at every arrval of task the overhead wll be more. So that we applyng GA after a random nterval of tme. Now the obs n the correspondng queues wll be appeared as a two dmensonal array, to facltate the cross over operaton the task wth sze s represented as one dmensonal array. The ntal populaton s created by swappng the tasks order randomly for some fxed number of tmes. Here we are generatng 6 populatons for our problem. After generatng the populaton we have to perform the selecton operaton. Ths operaton can be performed by usng ftness functon.. ALGORITHM: GA_Loadbalancng [1] Intalzaton() [2] Load chekng() [3] Repeat through step 6 untl task queue s empty. [4] Strng_evaluaton() [5] Genetc_operaton a. Mutaton() b. Reproducton() c. Crossover() [6] request_message_evaluaton() [7] End Fgure 3.3: Genetc algorthm framework for load balancng An obectve functon s the most mportant component of any optmzaton method, ncludng a GA, as t s used to evaluate the qualty of the solutons. The obectve functon here s to arrve at task assgnments that wll acheve mnmum executon tme, maxmum processor utlzaton, and a well-balanced load across all processors. Then, the obectve functon s ncorporated nto the ftness functon of the GA. Ths ftness functon wll then be used to measure the performance of the strngs n relaton to the obectves of the algorthm. The frst obectve functon for the proposed algorthm s the makespan as descrbed n secton 2.2. Consderng the fact that a computng node M may not always be dle, The total task completon tme can be expressed as sum of current load of M (CL ) and new load of M (NL ). T = CL + NL For smplcty the computng nodes are referred as sngle processor, however a sngle node may have more than one processor as dedcated computng unt. We have use average node (processor) utlzaton as one of metrc to study the performance of load balancng algorthm. As hgh average processor utlzaton mples that the load s well balanced across all nodes(processors). By keepng the processors hghly utlzed, the total executon tme should be reduced. The expected utlzaton of each processor based on the gven task assgnment must be calculated. Ths s acheved by dvdng the task completon tmes of each processor by the makespan value. The utlzaton of the ndvdual processors ( UM ) can be gven by: UM = T makespan The overall task assgnment beng evaluated may have a small makespan and hgh average processor utlzaton.. However, assgnng these tasks to the processors may stll overload some of the processors. Therefore, the thrd obectve s to optmze the number of acceptable node queues. Each node queue s checked ndvdually to see f assgnng all the tasks on the node queues wll overload or under-load the processors. Whether a processor queue s acceptable or not s determned by the lght and heavy thresholds used [12]. Low Threshold: Average Load * 0.8 Hgh Threshold: Average Load * 1.2 To facltate the desgn of genetc algorthm for load balancng, the three obectves dscussed above are ncorporated nto a sngle ftness functon and gven by the followng equaton:

Ftness = 1 UM acceptable_ queue_ sze makespan m m The ftness functon s used to evaluate the qualty of the task assgnments usng strng_evaluaton() as shown n fgure 3.3. Instead of watng for the GA to converge, t wll be allowed to run for a fxed number of k cycles (k=10 n ths paper). The decson was made because solutons generated n less than k generatons may not be good enough. On the other hand, runnng the GA for more than k generatons may not be very feasble, as too much tme wll be devoted to genetc operatons. When the GA s termnated after k cycles, the fttest strng n the pool wll be decoded and used as the task schedule. We have analyzed the centralzed dynamc loadbalncng mechansm usng a dscrete event smulator developed by us usng Matlab 6.0. 4 Performance analyss The followng results summarze the overall model performance. Here we are smulatng the model by usng the metrcs lke throughput, number tasks watng n the queue wth n nterval. We have used the M/M/m/K/n queung model for the smulaton. From the results n fgure: 4.1, t concludes that the LERT-MWM wth executon tmes method s best when compared to LERT-MW whch s not havng the pror nformaton about the executon tme of the obs. So f we know the executon tmes of all the obs we can effectvely dstrbute the load that can be showed n fgure 4.1. Fgure 4.1: Comparson of LERT-MWM and LERT- MW wth and wth out knowng the executon tmes. 4.1 Changng the Number of Tasks: Default values were used for all the parameters except for the number of tasks to be assgned. The number of tasks was vared from 500-2000 and the effects on the total completon tme and throughput are gven below. The Fgure: 4.2 show that the total tme taken for all three algorthms ncreased lnearly as the number of tasks was ncreased. It was also noted that the GA performed better among the three algorthms. When comparng the results of the GA and the LERT-MWM algorthm, one can observe that the gap between these two curves was wdenng as the number of tasks was ncreased. Ths shows that the GA actually reduced the total completon tme by a consderable amount (greater speedup) n comparson to the LERT- MWM algorthm as the number of tasks ncreased. Ths also ndcates relable performance of the GA_loadbalancng when the number of tasks ncreases. Agan we compared our GA wth another GA technque usng normal schedulng, means assgnng obs sequentally(frst Come Frst Serve) to the processors one by one. For all the cases the proposed GA shows better performance. The next experment compares the LERT-MWM and Genetc Algorthm usng the LERT-MW method n schedulng phase. These comparsons are shown n the below Fgures 4.2 and 4.3. The test runs were based on a set of default values: number of teratons: 500, number of processors: 50, number of generaton cycles: 3, populaton sze: 6, maxmum sze of each task: 100, Hgh Threshold multpler: 1.2, and Low Threshold multpler: 0.8. The performance comparsons were done n two types. Fgure 4.2: Comparson of GA, Wth Normal Schedulng and LART-MWM by Fxng the Number of Processors. 4.2 Changng the Number of processors: Here we have studed the performance of load balancng algorthms aganst the scalablty of computng nodes (processors). In smulaton the number of processors was vared from 10-160 and the effects on the total completon tme and throughput are shown n fgure 4.3.

[4] Gamal Attya & Yskandar Hamam, Two phase algorthm for load balancng n heterogeneous dstrbuted systems, Proc. 12th IEEE EUROMICRO conference on Parallel, Dstrbuted and Network-based processng, Coruna, Span 2004, 434-439. [5] Jon Klenberg & Eva Tardos, Algorthm Desgn (Pearson Educaton Inc. 2006). [6] Helen D. Karatza, & Ralph C. Hlzer, Load sharng n heterogeneous dstrbuted systems, Proceedngs of the Wnter Smulaton Conference, 1, San Dego Calforna, 2002 Page(s): 2002, 489 496. Fgure 4.3: Comparson of GA, GA wth Normal Schedulng and LART-MWM by Fxng the Number of Tme Unts The total obs completed wth n an nterval by varyng the total number of processors ncreased lnearly frst and after that t stablzes at some pont. In most cases, the GA out performed the other two algorthms n terms of processor utlzaton. Hence f we know the executon tmes of the obs we can effectvely balance the loads among all the nodes. 5 Conclusons Ths paper studes performance of genetc algorthm based approach to solve dynamc load balancng n heterogeneous computng system. Smulaton results ndcate that the performance of best method depends on system load. We analyzed the system performance and scalablty of computng nodes wth load balancng. The smulaton result shows GA based algorthm works better when the numbers of tasks are large. As dstrbuted systems contnue to grow n scale, n heterogenety, and n dverse networkng technology, they are presentng challenges that need to be addressed to meet the ncreasng demands of better performance and servces for varous dstrbuted applcaton. 6 References [1] Svarama P. Dandamud, Senstvty evaluaton of dynamc load sharng n dstrbuted systems, IEEE Concurrency,6(3), 1998, 62-72. [2] Je L, & Hsao Kameda, Load balancng problems for multclass obs n dstrbuted/parallel computer systems, IEEE Transactons on Computers, 47(3), 1998, 322-332. [7] Je Wu, Dstrbuted system desgn,(crc press, 1999) [8] Y.Zhang, H.Kameda & S.L.Hung, Comparson of dynamc and statc load-balancng strateges n heterogeneous dstrbuted systems, IEE proceedngs n Computer and Dgtal Technques,144(2), 1997, 100-106. [9] Bora Ucar, Cevdet Aykanat, Kamer Kaya, & Murat Iknc, Task assgnment n heterogeneous computng system, Journal of parallel and Dstrbuted Computng, 66, 2006, 32-46. [10] Marta Beltran, Antono Guzman, & Jose Lus Bosque, Dealng wth heterogenety n load balancng algorthm, Proc. 5 th IEEE Internatonal Symposum on Parallel and Dstrbuted Computng, Tmsoara, Romana, 2006, 123-132. [11] A. Y. Zomaya, C. Ward, & B. Macey, Genetc Schedulng for Parallel Processor Systems: Comparatve Studes and Performance Issues, IEEE Transacton Parallel and Dstrbuted Systems, 10(8), 1999, 795-812. [12] A. Y. Zomaya, & Y. H. Teh, Observatons on usng genetc algorthms for dynamc load-balancng, IEEE Transactons on Parallel and Dstrbuted Systems, 12(9), 2001, 899-911. [13] B. A. Shraz, A. R. Hurson, & K. M. Kav, Schedulng and load balancng n parallel and dstrbuted systems, CS press, 1995. [14] K. S. Trved, Probablty and statstcs wth relablty, queung and computer scence applcatons, Prentce Hall of Inda, 2001. [3] Veeravall Bharadwa, Debassh Ghose, Venkataraman Man, & Thomas G. Robertazz, Schedulng Dvsble Loads n Parallel and Dstrbuted Systems (Wley-IEEE Computer Socety Press, 1996).