Parallel Branch and Bound Algorithm - A comparison between serial, OpenMP and MPI implementations

Journal of Physcs: Conference Seres Parallel Branch and Bound Algorthm - A comparson between seral, OpenMP and MPI mplementatons To cte ths artcle: Luco Barreto and Mchael Bauer 2010 J. Phys.: Conf. Ser. 256 0018 Vew the artcle onlne for updates and enhancements. Related content - Prostate brachytherapy treatmentplan optmzaton W D D'Souza, R R Meyer, B R Thomadsen et al. - Towards a Resource Reservaton Approach for an Opportunstc Computng Envronment Elza Gomes and M A R Dantas - Treatment plannng for brachytherapy: an nteger programmng model, two computatonal approaches and experments wth permanent prostate mplant plannng Eva K Lee, Rchard J Gallagher, Davd Slvern et al. Recent ctatons - Mohand Mezmaz et al Ths content was downloaded from IP address 148.251.232.83 on 16/04/2018 at 13:40

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 Parallel Branch and Bound Algorthm - A comparson between seral, OpenMP and MPI mplementatons Luco Barreto and Mchael Bauer Department of Computer Scence Mddlesex College - The Unversty of Western Ontaro London, ON Canada {lbarret6, bauer}@csd.uwo.ca Abstract. Ths paper presents a comparson of an extended verson of the regular Branch and Bound algorthm prevously mplemented n seral wth a new parallel mplementaton, usng both MPI (dstrbuted memory parallel model) and OpenMP (shared memory parallel model). The branch-and-bound algorthm s an enumeratve optmzaton technque, where fndng a soluton to a mxed nteger programmng (MIP) problem s based on the constructon of a tree where nodes represent canddate problems and branches represent the new restrctons to be consdered. Through ths tree all nteger solutons of the feasble regon of the problem are lsted explctly or mplctly ensurng that all the optmal solutons wll be found. A common approach to solve such problems s to convert sub-problems of the mxed nteger problem to lnear programmng problems, thereby elmnatng some of the nteger constrants, and then tryng to solve that problem usng an exstng lnear program approach. The paper descrbes the general branch and bound algorthm used and provdes detals on the mplementaton and the results of the comparson. 1. Introducton Integer Programmng problems (IP) are specal cases of optmzaton problems where the varables can only assume nteger values. Mxed Integer Programmng problems (MIP) are specal cases where only some of the varables are restrcted to nteger values. Optmzaton problems wth nteger varables can also be lnear or nonlnear, dependng on the terms of ther objectve functon and ther constrants. However, n general, the terms IP and MIP are almost always assocated wth problems that have lnear features. The optmzaton algorthms consdered n ths wor focus on mxed nteger programmng, where some varables assume nteger values and whle other varables tae on contnuous values. In consderng only problems wth lnear constrants and lnear objectve functons, there are mportant dfferences between IP and MIP n relaton to Lnear Programmng (LP): LP there are necessary and suffcent condtons of optmalty proved theoretcally that can be used to effcently test whether a gven feasble soluton s an optmal soluton. These condtons are used to develop algebrac methods such as the smplex method. IP/MIP there are no nown optmalty condtons to test whether a gven feasble soluton s optmal. It s necessary to perform the mplct or explct comparson of all feasble solutons of the problem. c 2010 IOP Publshng Ltd 1

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 Approaches to solve IP and MIP problems are numerous, usually specalzed n accordance wth a partcular applcaton. We can dvde these approaches nto two broad famles, wth dstnct characterstcs: Heurstc Optmzaton may get good results for problems n whch classcal optmzaton methods mght fal (such as many nteger varables and very complex constrants), but there s no guarantee of optmalty. Examples of such approaches are genetc algorthms [5][8] and tabu search [3][4]. Classcal Optmzaton for convex problems t s guaranteed that the soluton s optmal. In ths famly there are the methods of enumeraton, such as the zero-one enumeraton algorthm [1] and the branch-and-bound algorthm [2], used n ths wor. We focus specfcally on usng a branch-and-bound approach for fndng an optmal soluton to MIP problems, ncludng beng able to fnd all optmal solutons when there are multple ones. The branch-and-bound algorthm s an enumeratve optmzaton technque, where fndng a soluton to an MIP problem s based on the constructon of a tree where nodes represent canddate problems (subproblems) and branches represent the new restrctons to be consdered. Through ths tree, all nteger solutons of the feasble regon of the problem are lsted explctly or mplctly ensurng that all the optmal solutons wll be found. A common approach to solve such problems s to convert the sub-problems that arse n solvng nteger lnear programmng problems to lnear programmng problems, that s, wth no nteger constrants, and to solve those problems usng an exstng lnear programmng solver. The resultng soluton can be used to set bounds on the possble solutons or to help select varables to create restrctons that can be used to create sub-problems. The paper presents a comparson of a branch and bound algorthm mplemented wth three dfferent approaches: seral, shared memory model (OpenMP) and dstrbuted memory model (MPI). The mplemented branch and bound algorthm uses the GNU/GLPK [9] as the Lnear Programmng (LP) solver to fnd optmal solutons to mxed nteger lnear problems. The remnder of the paper s organzed as follows. Secton 2 descrbes some mportant theoretcal aspects of branch-and-bound. Secton 3 descrbes the branch-and-bound algorthm tself. Secton 4 descrbes the computatonal mplementaton. Secton 5 shows a comparson of the approaches. Secton 6 presents conclusons about ths wor and fnally Secton 7 dentfes future wor. 2. Theoretcal Aspects The branch-and-bound algorthm s an enumeratve technque, n whch a soluton s found based on the constructon of a tree n whch nodes represent the problem canddates and branches represent the new restrctons to be consdered. Through ths tree, all nteger solutons of the problem feasble regon are lsted explctly or mplctly ensurng that all the optmal solutons wll be found. The overall structure of the branch-and-bound algorthm has three ey elements, separaton, relaxaton and prunng [2]. Separaton uses the tactc of dvde and conquer n order to solve the problem (P). In order to fnd the soluton of P, t s decomposed nto two or more descendant subproblems, generatng a lst of canddate problems (CP). At a subsequent step n the algorthm, a canddate s selected from the lst of canddate problems and the algorthm tres to solve that problem. If a soluton cannot be found to that problem, that problem s agan decomposed and ts descendants are added to the lst of canddate problems. If the selected problem can be solved, then a new soluton s obtaned. The objectve functon value of ths new soluton s then compared wth the value of the ncumbent soluton, whch s the best feasble soluton nown so far. If the new soluton s better than the ncumbent soluton, t becomes the new ncumbent. Then, the algorthm returns to the lst and selects the next canddate. Ths procedure s repeated untl the lst s empty, and the soluton of the problem s taen as the fnal ncumbent soluton. The usual way to carry out separaton of an nteger programmng problem s through contradctory constrants n a sngle nteger varable (separaton varable or branchng varable). Thus, from the 2

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 orgnal problem (called node zero), two new descendant sub-problems are created, whch are easer to solve than the orgnal one, snce a constrant was added to the separaton varable. Each generated node has an assocated canddate sub-problem and each branch ndcates the addton of a constrant related to the varable used n the separaton. Therefore, as the algorthm moves down n the tree, the vable regon of the generated descendants becomes more restrcted. The most common approach to relaxaton s the elmnaton of the ntegral constrants (nteger varables) where the nteger or mxed nteger problem s converted to a general LP. Relaxaton assumes that the orgnal nteger varables may tae fractonal values and then the resultng LP problem s solved. The obtaned optmal soluton usually has several varables wth non-nteger values. Among these varables, one must be selected for separaton. Once the separaton and ncluson of new descendants n canddates lst s completed, the algorthm must select from among the stored canddates the next to be evaluated and, f necessary, separated successvely untl the lnear problem soluton becomes nteger, or unfeasble or worse than the ncumbent, meanng that the canddate sub-problem can be removed from the lst (pruned), not producng any more descendants. Ths procedure repeats untl the lst of canddates sub-problems becomes empty. 3. Branch and bound Algorthm Consder the MIP (P) whose general form s gven by: (P) mnmze subject to T z = c x (1) Ax b x nteger I x j real j J (2) Where c s the vector of costs, x s the vector of varables ntegers (x ) and real (x j ), A s the constrant coeffcent matrx, b s the vector of rght sde of the nequaltes, I s the set of nteger varables and J s the set of contnuous varables. The branch-and-bound algorthm to solve (P) has the followng steps: 1. Start-up: set the number of actve nodes n = 0, set the frst ncumbent and ntalze the lst of canddate sub-problems wth the orgnal problem (P). 2. Convergence test: f the canddate lst s empty then ths means that the process s over and the current ncumbent soluton s the optmal soluton of the problem. Otherwse, contnue. 3. Canddate selecton: among the canddate sub-problems not yet pruned, choose the one that wll be the next to be evaluated and remove t from the lst. Solve the LP problem related to the relaxed CP and store the optmal soluton as a lower bound for all ts descendants, selected problem ( R ) * znf = z ( CP R ). 4. Prunng tests: the sub-problem ( CP ) may be pruned f t meets one of the followng condtons: a) f ( CP R ) has no feasble soluton; * * b) f znf > z, where z s the actual ncumbent value; c) f the optmal soluton of ( CP R ) s nteger and feasble n ( CP ). In ths case, f the optmal value s lower than the ncumbent value, mae * z = znf and apply the prevous test (b) for all canddates sub-problems not yet pruned. CP was pruned, return to the Step 2. If the canddate sub-problem ( ) 3

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 5. Separaton: from the sub-problem ( CP ), select a varable for separaton from those that are nteger and stll have contnuous value. For the chosen varable x, whose current value s x *, generate two new descendant sub-problems and add them to the canddate lst. The new sub- problems are generated by addng to ( ) CP the followng restrctons: n+ 1 * ( PC ): x x (3) n+ 2 * ( PC ): x x + 1 (4) where * x s the largest nteger not greater than * x. Set n = n + 2 and return to Step 3. 3.1. Performance Enhancement The algorthm effcency s drectly related to the method of selectng the next canddate subproblem that wll be evaluated (Step 3) and the separaton varable (Step 5). Moreover, the exstence of a good ntal ncumbent ncreases the effectveness of the prunng test (b), thereby reducng the number of canddate sub-problems that need to be evaluated. Although there s no systematc technque to determne whch one of the canddate sub-problems leads more qucly to the soluton, some heurstc rules can be used, such as search by creaton order, by depth, by the relaxed soluton value, by the soluton estmated value, among others. A very common technque used to select the canddate sub-problem s the LIFO rule (Last In, Frst Out), whch produces a depth search. Ths strategy allows that the descendant sub-problem be solved from the prevous problem (because they dffer n only by a sngle varable) and mnmzes the memory requrements to store the canddates nformaton. The way whch these selectons are made drectly nfluences the number of nodes that need to be evaluated, determnng, thus, the computatonal effort. On the other hand, there are methods that use estmates of the objectve functon value to select from among all canddates the most promsng sub-problem. Ths rule mnmzes the total number of problems to be evaluated, but at the same tme can drastcally ncrease the memory requrements, snce the successve LP problems may not have the same smlarty that exsts n the depth search. 3.2. Varable Selecton J When the soluton of the relaxed canddate ( PC R ) has several nteger varables wth contnuous value, one of them must be selected to be separated. An nadequate varable choce mples evaluaton of several descendants sub-problems that could be elmnated by prunng ther predecessor. There s no systematc technque to dentfy the optmal separaton varable, but there are emprcal rules that ndcate whch varables are most attractve. If smplcty s the goal, the separaton varable can be selected from a predetermned sequence usng the objectve functon coeffcents, hghest to lowest cost, for example, or beng more specfc dependng on the problem s features. A more effectve alternatve s based on the search for the varable that has the hghest value n terms of the estmated ncrease of the objectve functon, whch can be obtaned from one of the followng technques: [MAX;MAX]: always choose the varable whch causes the greatest degradaton n the objectve functon n order to qucly obtan a descendant sub-problem that mght be pruned. For the canddate sub-problem node, the selecton method for the separaton varable j follows the expresson below: j + j max{ max( P f, P (1 f ))} 4

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 [MAX;MIN]: choose the varable whch the mnmum varaton causes the maxmum mpact. Ths ensures that both descendants contrbute to prunng the sub-problem. For the canddate sub-problem node, the selecton method for the separaton varable j follows the expresson below: j + j max{ mn( P f, P (1 f ))} [MIN;MAX]: choose the varable whch the maxmum varaton causes the mnmum mpact. Smlarly, both descendants contrbute to prunng ths sub-problem. For the canddate sub-problem node, the selecton method for the separaton varable j follows the expresson below: j + j mn{ max( P f, P (1 f ))} [MIN;MIN]: always choose the varable whch causes the lowest degradaton n the objectve functon n order to qucly obtan a vable soluton. Ths s the most conservatve method. For the canddate sub-problem node, the selecton method for the separaton varable j follows the expresson below: j + j mn{ mn( P f, P (1 f ))} Another ssue nvolved n varable selecton s the possblty of separatng a varable whch was assgned an nteger value. It often occurs n a branch-and-bound executon that a specfc varable gets an nteger value prematurely,.e., wthout any lmts to mpose that nteger value. In ths case, accordng to Step 5 of the branch-and-bound algorthm, ths varable would not be selected for separaton (because t s nteger), and f all others varables were also ntegers, the soluton would be consdered vable, not generatng descendants. Ths would generate a fal n order to guarantee the reachng of all alternatve solutons to the problem. By separatng the varables wth nteger values whch stll have a lower bound mnor than ts upper bound, the algorthm guarantees that all nteger varables nvolved n the problem wll be evaluated n an mplct or explct way, exhaustng all possble separaton opportuntes and allowng that all alternatve solutons to the problem may be found. 3.3. Multple Optmal Solutons Some problems may have multple optmal nteger solutons, such as optmzaton models that descrbe the power networs expanson problem for both transmsson [] and dstrbuton [6][7]. In ths case, obtanng all the solutons s very mportant. Although the objectve functon value s the same for all solutons, one soluton may be greater than the other when tang nto account other factors that were not explctly ncluded n the optmzaton model. If these other factors have some relevance, one may also want to retan sub-optmal solutons. Ths can be easly accomplshed when solutons found are stored n an ordered lst, as used n ths wor. To ensure that all solutons are found, t s necessary to mae changes n the prunng (Step 4) and separaton (Step 5) processes. In Step 4 (c), one soluton s consdered nteger when all the nteger varables cannot assume other values,.e., when they all have nequalty constrants wth upper and lower bounds both ntegers and dentcal. Thus, the Step 4 (c) of the conventonal algorthm should be replaced by: c) f the optmal soluton of ( ) CP and all nteger varables n the optmal CP R s feasble n ( ) lb ub lb ub soluton of ( CP R ) have no degree of freedom.e. x = x, I, where x and x are the lower and the upper bound vectors of nteger varables of canddate problem ( CP R ), respectvely. In ths case, f the optmal value s lower than the ncumbent value, mae 5

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 * z = znf and apply the prevous test (b) for all canddates sub-problems not pruned yet. Otherwse, f the optmal value s equal to the ncumbent value, store the new alternatve soluton. Notce that the components of vectors x lb and x ub are always nteger values generated by the separaton process, accordng to equatons (3) and (4). In Step 5, n addton to usng the varables that have contnuous values, t s also necessary to consder separatng the nteger varables that have some lb ub degree of freedom,.e., wth x < x. 4. Computatonal Implementaton The branch-and-bound algorthm was mplemented n C++ and was adapted to three dfferent versons: one seral and two parallel. The two parallel mplementatons respectvely used OpenMP [11] and MPI [10]. The parallel versons are based on the seral approach, just mang use of the necessary parallel calls. However, all approaches use the same programmng logc. In the system, several parameters may be confgured, such as: maxmum sze of the branch-and-bound tree; dfferent tolerances for nteger and equalty; selecton crtera of the separaton varable; crtera to select a node; mnmum depth to open the tree 1 ; occupaton percentage of the avalable space n the tree to exchange sort order of the actve nodes n order to preserve avalable memory; possblty of separaton n nteger varables. The central data structure used n the branch-and-bound algorthm stores all the nformaton that characterzes each canddate sub-problem (actve node). The nformaton assocated wth each node of the soluton tree s dvded nto two groups: actve nodes nformaton related to the canddate sub-problems; tree nodes nformaton of the predecessors nodes that were separated (usng a contradctory constrant) generatng the actve nodes. In ths wor, each node stores only the nformaton relatng to the last separaton; the rest of the nformaton s assocated wth the tree nodes. Ths procedure avods data redundancy. The canddate sub-problems, when generated, are nserted nto the lsts of actve nodes; several lsts are mantaned dependng on a partcular sort crteron. The avalable orderng possbltes on actve nodes are as follows: node number (LIFO, Last In Frst Out); node depth; node relaxed soluton (LP value); node estmated soluton (usng pseudo-costs); node estmated soluton weghted wth depth (deepest nodes are more mportant). 1 Before adoptng the chosen strategy for selectng the canddate node, the program wll expand the tree breadth-frst, untl all actve nodes have the same depth as the mnmum chosen n the confguraton. 6

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 Snce all lsts are crcularly-lned, the algorthm can select the node that has any of the extreme values (mnmum or maxmum, n each of the crtera) drectly. Moreover, as all lsts are smultaneously updated, t s possble to swtch between any crtera durng the soluton process wthout the need for reorderng, whch have hgh computatonal cost. Whenever a canddate node s pruned (satsfyng some of the crtera presented n Step 4 of the algorthm n Secton 3) t s removed from the lst of actve nodes, together wth all ts predecessors that do not possess actve descendants, recursvely, from the node that was pruned towards the source node of the tree. As an example, consder the problem P: mn v = 5n + 2n13 + 2n23 + β s. a. 350n + 400n13 + β 400 350n + 210n23 + β 200 ( P) β 0 nj 0 j n, n13 e n23 ntegers The branch-and-bound algorthm for ths problem can be represented by Fgure 1, where each node s represented for a crcle wth a number that ndcates ts generaton s order (ndex of the algorthm). In the sde of each node there s the relaxed soluton and f t that soluton s nteger or worse than the ncumbent, t s related to a pruned node, represented by a colourful crcle. Fgure 1. Branch-and-bound ntermedate tree. 7

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 The constrants that are added for each separaton are shown nsde the rectangles n the respectve branch. Thus, for any node, the assocated problem can be determned gong bac on the tree towards the root node (node 0) and addng the correspondent constrants for that path. For example, the canddate problem represented by node 7 s the orgnal problem (P) wth the addton of the followng constrants: n n 23 0 0 The tree starts wth Node 0 through the lnear problem resoluton (P R ), whch s obtaned from the relaxaton of the nteger constrants of the expresson 1. ( P ) R mn s. a. v = 5n 350n 350n β 0 n j + 2n + 400n + 210n 0 j 13 + 2n 13 23 23 + β + β 400 + β 200 The lnear problem soluton presents two varables wth contnuous values: n 4 = 7 and n 13 = 0. 5. Thus, t does not satsfy the prunng crteron and the process contnues wth the separaton of the problem. The frst non-nteger varable ( n ) s chosen, as the separaton varable. In ths case, the two descendants nodes are generated by the constrants n 0 (Node 1) and n 1 (Node 2). In ths part of the process there are two canddates that may be chosen to be evaluated, represented by Nodes 1 and 2. Supposng that the selectng crteron s to evaluate the last generated node. So, Node 2 wll be the chosen one. The equvalent relaxed problem soluton relaxed problem wth the constrant n 1 stll has a varable wth contnuous value ( n 13 = 0. 5 ). Then, the problem of the Node 2 should be separated and, n ths case, only one varable can be selected, ( n 13 ). After the generaton of the Node 2 descendants (Nodes 3 and 4), Node 4 s selected and evaluated. Thus, the frst nteger soluton for the orgnal problem s obtaned and ths soluton becomes the * 4 4 ncumbent ( v = v = vnf = 7 ). As the ncumbent value was changed, the remaned canddates (Nodes 1 and 3) are evaluated, but t s not possble to mae any prunng. Agan, the last generated canddate s selected (Node 3) and ts varable n s separated. Both descendants of the Node 3 (Nodes 5 and 6) have nteger solutons that are dscarded snce they are worse than the exstng 5 5 6 6 ncumbent ( v = vnf = 55 and v = vnf = 10 ). The algorthm now goes bac to ts canddates lst and selects the only avalable node (Node 1) whch s separated n ts varable n 23 generatng Nodes 7 and 8. Tang the last one to be evaluated, * 8 8 we obtaned an nteger soluton better than the ncumbent, whch s updated to v = v = vnf = 4. Next, Node 7 s evaluated and dscarded (wth ts descendants) snce ts lower bound s greater than 7 * the ncumbent ( v nf = 201 > v ). Nodes 9 and 10 are represented only to complete the enumeraton. In the way that the process evolved, t was necessary to solve 9 lnear problems to guarantee the enumeraton of all possble alternatves. The optmal soluton for the orgnal problem was obtaned after 8 LPs: 8

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 v * * n * 13 n n * 23 = 4 = 0 = 1 = 1 An mportant ssue regardng performance n enumeraton algorthms s related to how good the ntal ncumbent soluton s. A good ntal soluton sgnfcantly ncreases the effcency of the prunng test (4b) and so, reduces the number of canddates that need to be evaluated. For the example of Fgure 1, an ntal ncumbent v * = 5, would be enough to mmedately prune Node 2 (and all ts descendants), reducng then the total number of evaluatons. Consderng the data structures used and the crtera for node orderng and selecton of the separaton varable, the seral branch-and-bound algorthm was mplemented accordng to the flowcharts of Fgures 2 and 3. A summary of the parallel mplementaton can be seen n Fgure 4. 4.1. Parallel Implementaton 4.1.1. MPI In the MPI mplementaton, the master node ntalzes the program, creates a base tree wth a lst of actve nodes and sends those actve nodes to the slaves whenever they need a problem to wor on. The master also controls the process, ncludng lst of solutons, statstcs, etc. Whle the master s creatng a base tree, the slaves also create a copy of the same base tree. Each slave receves an ndex for a node from the master, then ntalzes ts own tree and starts the B&B. All the calculaton s done by the slaves. Once a soluton s found, t s sent to the master who updates all slaves. The master node wats for any message from the slaves and t does no other calculatons durng the B&B process. Whenever a slave fnshes ts job, t sends a message to the master asng for a new node and process contnues. As the slaves wor n dfferent nodes wth ther own tree, there s no possblty of conflct n accessng the same part of the memory. 4.1.2. OPENMP In the OpenMP mplementaton, there s no concept of master and slaves. The cpu0 node s the one responsble for ntalzng the program and creatng a common tree wth a lst of actve nodes. In ths mplementaton, the nodes have a flag ndcatng whether or not they are beng used. After ths creaton, all nodes start the B&B requestng the frst avalable node from the lst of actve nodes. So, each slave wors n dfferent part of the same tree. Of course, as the memory s shared, once somethng happens, the varables are automatcally updated. Wth both mplementatons there are possbltes for further mprovements and those wll be dscussed n the Sectons 6 and 7. 9

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 Fgure 2. Seral branch-and-bound algorthm Part 1 Fgure 3. Seral branch-and-bound algorthm Part 2. Fgure 4. Summary of the parallel branch-and-bound algorthm In the next secton we present results comparng the three mplementatons for solvng the same MIP. 10

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 5. Results The ntal comparson of the algorthms to date made use of an optmzaton problem prevously presented [6][7]. The problem s a smplfed three-phase electrcal dstrbuton networ consstng of 18 nodes (2 substatons and 16 nodes wth loads) and 24 branches operatng under 13800V. The topology of ths networ s shown n Fgure 5 n whch rectangles denote the substatons and crcles are the nodes where loads are concentrated. Branches drawn as contnuous lnes denote the ntal networ (those wth a sngle lne are part of the fxed networ and those wth double lnes are canddates for replacement), and branches drawn as dashed lnes are canddates for addton (and are 118 not part of the ntal networ). The search space has approxmately 2 combnatons. The results are llustrated n Fgure 6. The problem s fully detaled n the above reference. Fgure 5: Dagram of the 18-node networ. [7] Fgure 6: 18-node networ - Soluton wth multstage expanson. [7] 11

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 The results for the tests were carred out on two dfferent systems: - HP Lnux cluster runnng XC 3.1 wth 267 nodes, 4 cores each, connected va Myrnet 2g (narwhal.sharcnet.ca) for the MPI and Seral Implementaton. - SUSE Enterprse 10 Lnux Cluster runnng wth 8 nodes, 1 core per socet, NUMA (sly.sharcnet.ca) for the OpenMP. Even though these are dfferent systems wth dfferent archtectures, the goal of the experment was to compare the wall tme of the parallel mplementatons wth the seral one to support the dea that classc optmzaton methods can stll be used n modern dstrbuted systems for many dfferent areas. Table 2: Results for the Electrcal Dstrbuton Networ Seral Approach #LPs Wall Tme (s) 70240 7324 Table 3: Results for the Electrcal Dstrbuton Networ Parallel Approaches #Proc #LPs Wall Tme (s) Speedup MPI OpenMP MPI OpenMP MPI OpenMP 2 930 65321 6155 6654 1.2x 1.1x 5 86745 47652 4623 6236 1.6x 1.2x 10 84451 39567 2757 57 2.7x 1.3x 50 72344 26541 976 3854 7.5x 1.9x 100 65645 21454 806 3562 9.1x 2.1x Table 2 summarzes the executon of the soluton to the problem for the seral mplementaton. The results show the requred number of lnear programmng problems (sub-problems) (#LPs) and the total tme requred n seconds. The tme was measured by the algorthm and corresponds to the total wall cloc tme. Table 3 summarzes the executon for both parallel mplementatons: MPI and OpenMP. The results show the number of processors used (#Proc), the requred number of lnear programmng problems (sub-problems) (#LPs) for each approach related to #Proc, as well as the total tme spent across all processors and the calculated speedup. For the tests we used the [MAX;MAX] separaton varable selecton strategy. The dfference between the number of LPs n the parallel executons s not related to a dfferent strategy, but s due to several other factors. In the MPI mplementaton, as the memory s dstrbuted and the processors have to exchange messages, each node s creatng ts own tree and because of that t could tae more tme to fnd a good ncumbent soluton. Many unnecessary nodes are evaluated before fndng a good one, snce the algorthm has to ensure the enumeraton of all possble solutons wthout prunng anythng that could be promsng. On the other hand, n the OpenMP approach, fewer nodes were evaluated because the algorthm always found good solutons earler than the MPI (for ths partcular problem). Also, the OpenMP mplementaton s more smlar to the seral one. Unfortunately, other ssues wth the OpenMP mplementaton were found and are reported n the conclusons. 6. Conclusons Based on the ntal expermental results some conclusons can be drawn: The seral mplementaton wors reasonably well.

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 The OpenMP mplementaton always generated fewer sub-problems than the seral, but the speedup was not as good as the MPI mplementaton. Unfortunately, the chosen LP solver (GLPK) [9] was not thread safe, whch made t mpossble to run a shared memory model approach wthout the use of some explct mechansms of control n the mplementaton. Ths drastcally decreased the performance. Also, the use of a flag n order to control the lst of actve nodes to avod race condton has a cost that decreases performance, as well. We plan to address those problems n future wor. The MPI mplementaton seemed to provde a more robust way to splt the branch and bound search. Though t generated more sub-problems for some cases, t too the least tme n all executons and always ran perfectly. It s mportant to menton here that before the parallel computaton starts, the algorthm prepares a base tree for the MPI mplementaton and the nodes are sent to the slaves whenever they need. Ths base tree s ntally statc, whch means that there are a specfc number of nodes avalable to be evaluated. When ths number s reached, no more nodes are avalable to be sent and the allocated processors become dle, watng the others to fnsh the job. Ths s a problem that wll be addressed n future wor. As ndcated, these results are based on our partcular problem and, whle complex, addtonal experments need to be run to better compare the algorthms and approaches to parallel mplementaton as well as to help mprove the mplementaton. Our ntal mpresson s that the MPI approach wors reasonable well and wth much less headache than the OpenMP one. 7. Future Wor Our next steps nclude fndng a better way to evaluate nodes n parallel and fndng a better way to eep all processors worng full tme. We also want to test the algorthm usng others LP solvers that are thread-safe and faster. Fnally, we ntend to compare the mplementatons to several other problems ncludng some from other domans, testng the dfferent separaton varable choce that we had mplemented as well. References [1] Balas, E. (1965). An addtve algorthm for solvng lnear programs wth zero-one varables, Operatons Research, Vol. 13, No. 4, pp. 517 546. [2] Geoffron, A.M. and R.E. Marsten (1972). Integer Programmng Algorthms: A Framewor and State-of-the-Art Survey, Management Scence, Vol. 18, No. 9, pp. 465 491. [3] Glover, F. (1989). Tabu Search Part I, ORSA Journal on Computng, Vol. 1, No. 3, pp. 190 206. [4] Glover, F. (1990). Tabu Search Part II, ORSA Journal on Computng, Vol. 2, No. 1, pp. 4 32. [5] Goldberg, D.E. (1989). Genetc Algorthms n Search, Optmzaton and Machne Learnng, Addson-Wesley Professonal, 432 p. [6] Haffner, S., L.F. Perera, L.A. Perera and L. Barreto (2008a). Multstage model for dstrbuton expanson plannng wth dstrbuted generaton - Part I: problem formulaton. IEEE Transactons on Power Delvery, Vol. 23, No. 2, pp. 915 923. [7] Haffner, S., L.F. Perera, L.A. Perera and L. Barreto (2008b). Multstage model for dstrbuton expanson plannng wth dstrbuted generaton - Part II: numercal results. IEEE Transactons on Power Delvery, Vol. 23, No. 2, pp. 924 929. [8] Holland, J.H. (1992). Adaptaton n Natural and Artfcal Systems: An Introductory Analyss wth Applcatons to Bology, Control, and Artfcal Intellgence, The MIT Press, 228 p. [9] Mahorn, A., (2001). GLPK Lnear Programmng Kt Manual GLPK documentaton, 13

Hgh Performance Computng Symposum (HPCS2010) Journal of Physcs: Conference Seres 256 (2010) 0018 IOP Publshng do:10.1088/1742-6596/256/1/0018 Moscow Avaton Insttute, Moscow, Russa, February 2001, avalable n http://www.gnu.org/software/glp/glp.html. [10] MPI-forum - http://www.mp-forum.org/ [11] OpenMP - http://www.openmp.org [] Romero, R., Montcell, A., Garca, A. e Haffner, S. (2002). Test systems and mathematcal models for transmsson networ expanson plannng, IEE Proc.-Gener. Transm Dstrb., Vol. 1491, No. 1, pp. 27-36. 14