A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology, Taihung 413, Taiwan, R.O.C. Tel: +886-4-333000 E-mail: nwang@mail.yut.edu.tw Chih-Ping Chu Λ Department of Computer Siene and Information Engineering National Cheng Kung University, Tainan 701, Taiwan, R.O.C. Tel: +886-6-757575 Ext. 657 E-mail: hup@sie.nku.edu.tw Tzung-Shi Chen Department of Information Management Chang Jung University, Tainan 711, Taiwan, R.O.C. Tel: +886-6-78513 Ext. 053 E-mail: hents@mail.ju.edu.tw Abstrat Multiast is an important olletive ommuniation operation on multiomputer systems, in whih the same message is delivered from a soure node to an arbitrary number of destination nodes. The star graph interonnetion network has been reognized as an attrative alternative to the popular hyperube network. In this paper, we first address a dual-hamiltonian-path-based (DHPB) routing model with two virtual hannels based on two hamiltonian paths and a network partitioning strategy for wormholerouted star graph networks. Then, we propose three effiient multiast routing shemes on basis of suh a model. All of the three proposed shemes are proved deadlok-free. The first sheme, networkseletion-based dual-path routing, selets subnetworks that are onstruted either by the first hamiltonian path or by the seond hamiltonian path for dual-path routing. The seond one, optimum dualpath routing, selets subnetworks with optimum routing path for dual-path routing. The third sheme, two-phase optimum dual-path routing, inludes two phases, soure-to-relay and relay-to-destination. Finally, experimental results are given to show our proposed three routing shemes outperform the uniast-based, the hamiltonian-path, and the single-hamiltonian-path-based (SHPB) dual-path routing shemes signifiantly. Keywords: Multiast, path-based routing, star graphs, virtual hannels, wormhole routing. Λ To whom all orrespondene should be addressed 1

1 Introdution Multiast is an important olletive ommuniation operation on multiomputer systems, in whih the same message is delivered from a soure node to an arbitrary number of destination nodes. There are many efforts paid to designing multiasting algorithms on a variety of interonnetion networks suh as hyperube, mesh and torus networks, and those multiasting algorithms are based on the swithing tehniques ontaining store-and-forward, virtual ut-through, or wormhole routing methods [5, 10]. In general, the multiasting problem an be modeled by three routing shemes: tree-based, uniastbased, and path-based routing [9]. The tree-based multiasting relies on finding a tree from the underlied network arhiteture and the soure messages are sent to eah destination along the paths on the onstruted tree [6, 13]. The uniast-based multiasting sends the messages from soure node to destination nodes via intermediate nodes reursively [4, 8]. In the path-based routing, the soure node sends the message to all destination nodes based on the onstruted path [7, 1]. The star graph [1, ] interonnetion network has been reognized as an attrative alternative to the popular hyperube network. This may aount for that star graph is with symmetri and hierarhial struture, and lower degree and smaller diameter as opposed to the hyperube. In our previous work on wormhole star graph networks routing [3], we addressed a path-based routing model and proposed four effiient deadlokfree multiast routing shemes. In this paper, we first address a dual-hamiltonian-path-based (DHPB) routing model with two virtual hannels based on two hamiltonian paths and a network partitioning strategy for wormhole-routed star graph networks. Then, we propose three effiient multiast routing shemes on basis of suh a model. All of the three proposed shemes are deadlok-free. The first sheme, network-seletion-based dual-path routing, selets subnetworks that are onstruted either by the first hamiltonian path or by the seond hamiltonian path for dual-path routing. The seond one, optimum dual-path routing, selets subnetworks with optimum routing path for dual-path routing. The third sheme, two-phase optimum dual-path routing, inludes two phases, soure-to-relay and relay-to-destination. In soure-to-relay phase, the multiasting uses optimum dual-path routing, whereas, in relay-to-destination phase, the multiasting uses high-hannel routing (whih is a speial ase of the optimum dual-path routing). We will show that our proposed routing shemes are superior to the uniast-based, the hamiltonian-path, and the single-hamiltonian-path-based (SHPB) dualpath routing shemes by experimental results. The rest of this paper is organized as follows. Preliminaries are presented in Setion. In Setion 3, we propose three path-based multiast algorithms with two virtual hannels. Simulation results of these algorithms are presented in Setion 4. Finally, onluding remarks are drawn in Setion 5. Preliminaries.1 System Model In the following, we first introdue some definitions and notations related to the star graphs. A permutation of n distint symbols from the set f1; ; ;ng is represented as p = s 1 s s n, where s i ;s j f1; ; ;ng;s i 6= s j for i 6= j; 1» i; j» n. Given a permutation p = s 1 s s n, let the generator g i be the funtion of p that interhanges the symbol s i with the symbol s 1 in p for» i» n. Thus, g i (p) = s i s s i 1 s 1 s i+1 s n. An undireted star graph with dimension n is denoted as S n =(V n ;E n ), where the set of verties V n is defined as fvjv = s 1 s s n ;s i ;s j f1; ; ;ng;s i 6= s j for i 6= j; 1» i; j» ng

and the set of edges E n is defined as f(v p ;v q )jv p ;v q V n ;v p 6= v q ; suh that v q = g i (v p ) for» i» ng. In other words, any two nodes v p and v q are onneted by an undireted edge if and only if the orresponding permutation to the node v q an be obtained from that of v p by interhanging the symbol s i of v p with the symbol s 1 of v p for» i» n. We also use the notation S n to represent an n-dimensional star graph, alled n-star, in this paper. Notie that star graphs are edge and vertex symmetri. Moreover, S n is a regular graph with degree n 1, n! verties, and (n 1)n! edges. A 3-star and a 4-star are shown in Figure 1. a 314 134 134 341 431 431 31 13 13 314 314 341 341 b d 134 431 31 13 31 341 413 a 431 143 413 143 134 d 413 314 143 413 143 b (a) (b) Figure 1: The topology of star graphs: (a) 3-star; (b) 4-star. The interonnetion network system is omposed of nodes, eah node is a omputer with its own proessor, loal memory, and ommuniation links; eah link onnets two neighboring nodes through network [7]. A ommon omponent of nodes in a new-generation multiomputer is a router. It an handle the entering, leaving, and passing through the node of message. Figure shows the arhiteture of a generi node. A router is usually onneted to the loal proessor/memory by one or more pairs of internal hannels. One hannel of eah pair is for input, the other for output. Several pairs of external hannels onnet the router to neighboring routers. The interonnetion of external hannels among routers defines the network topology. Loal Proessor and Memory internal injetion hannels...... internal injetion hannels external hannels from neighbors... Router (Crossbar Swith)... external hannels to neighbors Figure : A generi node arhiteture. 3

. Path-Based Multiast Routing Model In our previous work on wormhole star graph networks routing [3], we addressed a path-based routing model, derived a node labeling formula based on a single hamiltonian path (HP), and proposed four effiient deadlok-free multiast routing shemes: dual-path, shortut-node-based dual-path, multipath, and proximity grouping. Generally, the dual-path sheme is simple and effiient. The multiasting in the dualpath routing inludes two independent paths (toward high label nodes and low label nodes, respetively) and the next traversed node is the neighboring node with the label nearest to that of the next unvisited target node. The onept of the path-based routing model is desribed below...1 Hamiltonian Paths and Channel Networks The path-based routing method for meshes developed by Lin et al. [7] is based on a HP. In [3], we used the strategy in [11] to define a HP on the star graph. Beause a star graph is embedded with more than one HP, the routing methods proposed in [3] is simply on basis of a partiular HP of all possible HPs. In an n-star, the number of nodes is N = n! and eah node s is with a label `(s), where 0» `(s)» N 1 and `() is the node labeling funtion [3]. The labeling of a 4-star based on a HP is shown in Figure 3. For example, in a 4-star, `(134) = 0, `(413) = 6, `(431) = 13, `(431) = 3, and so forth. 134 0 431 3 a 314 5 134 1 341 431 18 314 4 314 341 1 341 19 b d 134 3 431 0 341 1 413 11 a 431 13 143 17 413 6 143 10 134 14 413 16 143 7 413 9 d b 314 15 143 8 Figure 3: The labeling of a 4-star based on a hamiltonian path. Aording to the node labels, we an onstrut a HP, i.e., from the node with label 0, following the nodes with labels 1; ;, to the node with label N 1. When node labeling is ompleted, we an divide the network into two subnetworks, high-hannel network and low-hannel network. The high-hannel network ontains all diretional hannels with nodes labeled from the lower to the higher, and the low-hannel network ontains all diretional hannels with nodes labeled from the higher to the lower. Then, a message routing an be performed along two legal paths, one along high-hannel network and the other along lowhannel network... Hamiltonian-Path and Single-Hamiltonian-Path-Based (SHPB) Dual-Path Multiast Routing The uniast-based, the hamiltonian-path, and the dual-path routing strategies an be adopted in a lot of wormhole-routed interonnetion networks. The uniast-based routing sheme uses one-to-one ommuni- 4

ation to ahieve multiast, whih requires startup lateny in eah intermediate node [8]. The disadvantage of this approah lies in that signifiant transmission lateny is resulted from the required number of ommuniation startup steps for multiast. In the hamiltonian-path routing, the soure node sends the message to all destination nodes based on the onstruted hamiltonian path. In this sheme, the multiast is divided into two submultiasts and that an be ompleted in parallel by two independent routing paths (one for highhannel routing and the other for low-hannel routing). The disadvantage of this approah is that it always traverses nodes following the fixed path (hamiltonian-path) that requires more traverse links for multiast [7]. In the dual-path routing, the multiasting is similar to the hamiltomian-path routing exept eah router tries to find a shortut node (the node with label losest to that of the next unvisited target node) for routing to redue the average length of multiast paths [7]. A sample multiast is denoted as the multiasting set R = f134 3 ; 134 1 ; 143 8 ; 341 1 ; 143 17 ; 341 19 ; 341 1 g, where the first element of R is the soure node and the others are the destination nodes in arbitrary order. Notie that the soure node is underlined, the label `(u) of eah node u in R is shown as a supersript to the node representation. In the hamiltonian-path and the SHPB dual-path routing, the multiasting set R an be ompleted by two submultiasting sets, R h for high-hannel routing and R l for low-hannel routing, i.e., R h = f134 3 ; 143 8 ; 341 1 ; 143 17 ; 341 19 ; 341 1 g and R l = f134 3 ; 134 1 g,inr h and R l the first elements are soure nodes and the others are destination nodes with label values higher and lower than soure nodes and in asending and desending orders, respetively. If we use hamiltonian-path routing, the total number of hannels traversed is 18+=0, and the maximum distane is max(18,)=18. If we use SHPB dual-path routing, the total number of hannels traversed is 14+=16, and the maximum distane is max(14,)=14. The hold-and-wait property of wormhole routing is partiularly suseptible to deadlok, and thus most wormhole-routed systems avoid messages routing to reah yles of hannel dependeny. Deadlok an be eliminated by the routing algorithm. By ordering network resoures, suh as nodes, and aessing resoures aording to a stritly monotoni order irular wait for resoures will not our and deadlok an be avoided [7]. 3 Dual-Hamiltonian-Path-Based (DHPB) Multiast Routing For SHPB dual-path routing, the performane is unstable espeially for large multiast sizes. That is, if the traversed node number of high-hannel routing is nearly idential to that of low-hannel routing, then dualpath routing performs very well; otherwise the performane of dual-path routing depends on longer traversed node number of either high-hannel routing or low-hannel routing. Therefore, we use the property that a star graph always exists a hamiltonian yle (HC) to find better routing paths for both high-hannel and low-hannel routing. 3.1 Dual-Hamiltonian-Path-Based (DHPB) Multiast Routing Model We use the strategy in [11] to find a HC on the star graph. The HC is onstruted as follows. We first findahpinan-star, suppose HP = (y 0 ;y 1 ; ;y k ), where y j is a node, 0» j» k, and k = n! 1. In a HP of a star graph, the end node y k is always adjaeny to the start node y 0. Then, we an obtain HC = HP [ (y k ;y 0 ). That is HC =(y 0 ;y 1 ; ;y k ;y 0 ). In a star graph, there exist more than one HC, the method desribed in this paper is only for a partiular HC of all possible HCs. Aording to the HC, we an obtain two HPs, HP 1 = HP =(y 0 ;y 1 ; ;y k ) and HP =(y t ;y t+1 ; ;y k ;y 0 ; ;y t 1 ), where t = n!. For eah HP in an n-star the number of nodes is N = n! and the label (value) of a node ranged from 5

0toN 1 is dependent on its position in HP 1 or HP. Before we introdue the routing algorithms subsequently, two node labeling strategies are needed. The first node labeling strategy of a 4-star is based on HP 1 that starts at node 134 and ends at node 431 using the same method that is onstruted in Setion. The seond node labeling strategy of a 4-star is based on HP that starts at node 341 and ends at node 413. Those two labeling strategies are based on two node labeling funtions `1 and `. For eah node p in a star graph two labels `1(p) and `(p) are assigned, where `1() and `() are the node labeling funtions. `1() value an be obtained aording to the order in the HP desribed in Setion... (or using the formula derived in [3]). So, `1(p) =`(p) where p is a node. `() value an be obtained by `(p) = (`1(p) + N ) mod N. The two different labeling results of 4-stars are shown in Figure 4. For example, in a 4-star, `1(134) = 0, `(134) = 1, `1(431) = 13, `(431) = 1, and so forth. a 314 5 134 0 134 1 341 431 3 431 18 a 314 17 134 1 134 13 341 10 431 11 431 6 314 4 314 341 1 341 19 b d 134 3 431 0 314 16 314 14 341 9 341 7 b d 134 15 431 8 341 1 413 11 a 431 13 143 17 413 6 143 10 341 0 413 3 a 431 1 143 5 413 18 143 134 14 d 413 16 143 7 413 9 b 134 d 413 4 143 19 413 1 b 314 15 143 8 314 3 143 0 (a) (b) Figure 4: The two different node labeling of a 4-star based on two hamiltonian paths: (a) the first node labeling strategy; (b) the seond node labeling strategy. Aording to the two node labeling strategies, we an onstrut two HPs respetively, i.e., from the nodes with label 0, following the nodes with labels 1; ;, to the node with label N 1. When node labeling is ompleted, we an divide the network into four subnetworks, first-high-hannel network N h, 1 first-low-hannel network N l, seond-high-hannel network 1 Nh, and seond-low-hannel network Nl. N h 1 ontains all diretional hannels with labeling nodes, based on first node labeling strategy, from the lower to the higher; N l 1 ontains all diretional hannels with labeling nodes, based on first node labeling strategy, from the higher to the lower; N h ontains all diretional hannels with labeling nodes, based on seond node labeling strategy, from the lower to the higher; N l ontains all diretional hannels with labeling nodes, based on seond node labeling strategy, from the higher to the lower. Then, a message routing an be performed along two legal paths, one along subnetwork N h (or subnetwork 1 Nh ) and the other along subnetwork N l (or subnetwork 1 Nl ). The routing is based on network partitioning strategy with two virtual hannels. The two virtual hannels are used to avoid deadlok. Before we introdue the proposed routing algorithms, let us first define two routing funtions RF 1 and RF. Definition 1 (The routing funtions RF 1 and RF ). Let V, p, q, and `() be the node set, the soure node, the destination node of a star graph, and the node labeling funtion [3], respetively. The routing funtion RF i, where 1» i», is defined to be RF i : V V! V and RF i (p; q) = x, and if `i(p) < `i(q), then `i(x) = maxf`i(u)j`i(p) < `i(u)» `i(q), and u is adjaent to pg; if`i(p) > `i(q), then `i(x) = 6

minf`i(u)j`i(p) >`i(u) `i(q), and u is adjaent to pg. 3. Dual-Hamiltonian-Path-Based (DHPB) Dual-Path Multiast Routing In this subsetion, we propose three effiient DHPB dual-path multiast routing shemes: network-seletionbased, optimum, and two-phase optimum shemes. 3..1 Network-Seletion-Based (NSB) Dual-Path Routing The NSB routing sheme selets subnetworks that are onstruted either from the first hamiltonian path or from the seond hamiltonian path for dual-path routing. The NSB dual-path sheme inludes three steps. First, by the following two ases, the destination node set is divided into two subsets and then the destination nodes in eah subset are sorted. Case 1 ( 1 N» 4 `1(s) < 3 N): The destination node set is divided into two 4 subsets, D 1 and D, where every node k in D 1 has a higher `1(k) value than that of the soure node s, and every node k 0 in D has a lower `1(k 0 ) value than that of s aording to the node labeling funtion. Then, the destination nodes in D 1 are sorted aording to their `1() values in asending order and the destination nodes in D are sorted aording to their `1() values in desending order, respetively. Case (`1(s) < 1 N 4 or `1(s) 3 N): The destination node set is divided into two subsets, D 4 1 and D, where every node k in D 1 has a higher `(k 0 ) value than that of the soure node s, and every node k 0 in D has a lower `(k 0 ) value than that of s aording to the node labeling funtion. Then, the destination nodes in D 1 are sorted aording to their `() values in asending order and the destination nodes in D are sorted aording to their `() values in desending order, respetively. Seond, we onstrut two messages, M 1 and M, where M 1 ontains D 1 as part of the header and M ontains D as part of the header. Finally, multiast messages from s will be sent to the destination nodes aording to the following two different ases. Case 1 ( 1 N» 4 `1(s) < 3 N): 4 The message is sent to the destination nodes in D 1 using the high-hannel network based on subnetwork N h, and to the destination nodes in 1 D using the low-hannel network based on subnetworks N l. Case 1 (`1(s) < 1 N or 4 `1(s) 3 N): The message is sent to the destination nodes in D 4 1 using the high-hannel network based on subnetwork N h, and to the destination nodes in D using the low-hannel network based on subnetwork N l. The next traversed nodes from the soure node for routing both messages M 1 and M are the nodes that have the nearest label to that of the next unvisited target nodes of their neighboring nodes, respetively. In the following, we use an example to demonstrate the better multiast performane of the NSB dualpath routing when ompared with the hamiltonian-path and the SHPB dual-path routing. The sample multiast is denoted as the multiasting set R = f134 (3;15) ; 134 (1;13) ; 143 (8;0) ; 341 (1;0) ; 143 (17;5) ; 341 (19;7) ; 341 (1;9) g, where the first element of R is the soure node and the others are the destination nodes in an arbitrary order. Notie that the soure node is underlined, the label `1(u) of eah node u in R is shown as the first omponent of the supersript to the node representation and the label `(u) of eah node u in R is shown as the seond omponent of the supersript to the node representation. In the sample multiast, beause `1(134) = 3 < 1 N =6, the multiasting selets the subnetworks 4 Nh and N l for routing. Thus, the multiasting set R an be ompleted by two submultiasting sets, R h and R l, where R h = f134 (3;15) ; 143 (8;0) g and R l = f134 (3;15) ; 134 (1;13) ; 341 (1;9) ; 341 (19;7) ; 143 (17;5) ; 341 (1;0) g.inr h and R l, the first elements are soure nodes and the others are destination nodes with higher and lower label values than soure nodes in asending `() and desending `() value orders, respetively. R h routes the message using high-hannel routing based on subnetwork N h. Rl routes the message using low-hannel routing based on subnetwork N l. Figure 5 shows the sample multiast example using NSB dual-path routing, the total number of hannels traversed is 5+11=16, and the maximum distane from the soure to a destination 7

a 314 17 134 1 134 13 341 10 431 11 431 6 314 16 314 14 341 9 341 7 b d 134 15 431 8 341 0 413 3 a 431 1 143 5 413 18 143 soure node destination node routing hannel 134 d 413 4 314 3 143 19 413 1 143 0 b Figure 5: The sample multiast using NSB dual-path routing. is max(5,11)=11. So, the total number of hannels traversed of NSB dual-path routing is smaller than that of hamiltonian-path routing and equal to that of dual-path routing. The maximum distane of NSB dual-path routing is smaller than that of hamiltonian-path and SHPB dual-path routing. Now, let us disuss the time omplexity of the NSB dual-path algorithm. Suppose n is the dimension of star graph, d is the number of destination nodes, and N = n! is the number of nodes of star graph. In the destination-nodes partition and sorting step, the time omplexity is O(d) + O(d log d) = O(d log d). In the message preparation step, the time omplexity is O(1). In the routing step, the time omplexity is O( 3 N 4 1) = O( 3 N ) in the worst ase. So, the total time omplexity of the NSB dual-path algorithm in the 4 worst ase is O(d log d)+o(1) +O( 3 N 4 )=O( 3 n!+d log d). For omparison, the total time omplexity of 4 the hamiltonian-path and the SHPB dual-path algorithms are O(d log d) + O(1) + O(N ) = O(n! + d log d) in the worst ase. To verify the orretness of the NSB dual-path routing algorithm, we derive the following lemmas and theorems. Lemma 1. For two arbitrary distint nodes p and q in a star graph with two HPs (HP 1 and HP ), the path from p to q seleted aording to the routing funtion RF i, where 1» i», always exists. Proof. Suppose p and q are two arbitrary nodes in a star graph, without loss of generality, it an be assumed that `i(p) < `i(q), where 1» i». Let the node represent the soure node or the intermediate node loated in between soure node p and destination node q on HP i. Assume the next traversed node is x, aording to the routing funtion RF i, x = RF i (; q), where `i(x) = maxf`i(u)j`i() < `i(u)» `i(q), and u is adjaent to g. So, x is on HP i from to q (inluding q) and adjaent (onneted) to. Then, the path from p to q seleted aording to the routing funtion RF i is (y 0 ;y 1 ; ;y k ), where y 0 = p, y j = RF i (y j 1 ;q) for 0 <j» k, and y k = q. So, the path from p to q seleted aording to the routing funtion RF i always exists. Lemma. The message routing using DHPB dual-path algorithm, based on subnetworks N h 1 (using RF 1 for high-hannel routing) and N l 1 (using RF 1 for low-hannel routing), in a star graph with two HPs (HP 1 and HP ) an always be ompleted. Proof. Based on Lemma 1, it is obvious. Lemma 3. The message routing using DHPB dual-path algorithm, based on subnetworks N h (using RF for high-hannel routing) and N l (using RF for low-hannel routing), in a star graph with two HPs (HP 1 8

and HP ) an always be ompleted. Proof. Based on Lemma 1, it is obvious. Theorem 1. The message routing using NSB dual-path algorithm in a star graph with two HPs (HP 1 and HP ) an always be ompleted. Proof. The message routing using NSB dual-path algorithm is proeeded via either (i) subnetworks N h and 1 N l or (ii) subnetworks 1 Nh and N l. Aording to Lemmas and 3, message routing on both subnetworks an be ompleted. So, the message routing using NSB dual-path algorithm an always be ompleted. Theorem. The NSB dual-path multiast routing is deadlok-free. Proof. Beause a yle dependeny among resoures is a neessary ondition for deadlok [5], the multiasting algorithm may be proven deadlok-free by showing that there annot exist suh a dependeny among the hannels. In NSB dual-path routing, at the soure node the multiasting is proeeded via either one of the following two ases: (1) the NSB dual-path algorithm divides the networks into two disjoint subnetworks N h and 1 N l; () the NSB dual-path algorithm divides the networks into two disjoint subnetworks 1 Nh and N l. Beause N h 1 N l 1 N h N l = ;, the NSB dual-path multiast routing is deadlok-free at eah of the four subnetworks. Then, let us prove that messages delivered in subnetwork N h 1 are deadlok-free. Messages delivered in N h an only take high-hannels in 1 Nh 1. Sine eah opy of the message is routed entirely within a single subnetwork and monotoni order of requested hannels is guaranteed, there annot exist a dependeny yle within subnetwork N h 1 and thus no yli dependeny an be reated among the hannels. Similar proofs an be applied to the subnetworks N l, 1 N h, and N l. This thus proves the theorem. 3.. Optimum Dual-Path Routing The optimum routing sheme selets subnetworks with optimum routing path for dual-path routing. This routing sheme inludes three steps. First, by the following two ases, the destination node set is divided into two subsets and then the destination nodes in eah subset are sorted. Case 1 (`1(s) < 1 N): The destination node set is divided into two subsets D 1 and D. D 1 ontains the destinations d i where `1(s)» `1(d i ) < `1(s) + 1 N;d i D; 1» i» num d, where num d is the number of destinations. D ontains the destinations d j where d j D D 1. Then, the destination nodes in D 1 are sorted aording to the `1() values in asending order and the destination nodes in D are sorted aording to the `() values in desending order, respetively. Case (`1(s) 1 N): The destination node set is divided into two subsets D 1 and D. D 1 ontains the destinations d i where `1(s) 1 N» `1(d i )» `1(s);d i D; 1» i» num d, where num d is the number of destinations. D ontains the destinations d j where d j D D 1. Then, the destination nodes in D 1 are sorted aording to the `1() values in asending order and the destination nodes in D are sorted aording to the `() values in desending order, respetively. Seond, we onstrut two messages, M 1 and M, where M 1 ontains D 1 as part of the header and M ontains D as part of the header. Finally, multiast messages from s will be sent to the destination nodes aording to the following two different ases. Case 1 (`1(s) < 1 N): The message is sent to the destination nodes in D 1 using the high-hannel network based on subnetwork N h, and to the destination nodes in 1 D using the low-hannel network based on subnetworks N l. Case (`1(s) 1 N): The message is sent to the destination nodes in D 1 using the low-hannel network based on subnetwork N l, and to the destination nodes in 1 D using the high-hannel network based on subnetwork N h. In the sample multiast, beause `1(134) =3< 1 N =1, the multiasting selets the subnetworks N h and 1 N l for routing. Thus, the sample multiasting set R an be ompleted by two submultiasting sets, R h and R l, where R h = f134 (3;15) ; 143 (8;0) ; 341 (1;0) g and R l = f134 (3;15) ; 134 (1;13) ; 341 (1;9) ; 9

a 314 5 134 0 134 1 341 431 3 431 18 a 314 17 134 1 134 13 341 10 431 11 431 6 314 4 314 341 1 341 19 b d 134 3 431 0 314 16 314 14 341 9 341 7 b d 134 15 431 8 soure node 341 1 413 11 a 431 13 143 17 413 6 143 10 341 0 413 3 a 431 1 143 5 413 18 143 destination node 134 14 d 413 16 143 7 413 9 b 134 d 413 4 143 19 413 1 b routing hannel 314 15 143 8 314 3 143 0 (a) (b) Figure 6: The sample multiast using optimum dual-path routing: (a) the multiasting of D 1 using highhannel routing based on subnetwork N h 1 ; (b) the multiasting of D using low-hannel routing based on subnetwork N l. 341 (19;7) ; 143 (17;5) g. Figure 6 shows the same multiast example of Figure 5 using optimum dual-path routing, the total number of hannels traversed is 9+10=19, and the maximum distane is max(9,10)=10. So, the total number of hannels traversed of optimum dual-path routing is smaller than that of hamiltonianpath routing but larger than that of SHPB dual-path routing. The maximum distane of optimum dual-path routing is smaller than that of hamiltonian-path and SHPB dual-path routing. Now, let us analyze the time omplexity of the optimum dual-path algorithm. Suppose n is the dimension of star graph, d is the number of destination nodes, and N = n! is the number of nodes of star graph. The time omplexity of the first two steps of the optimum dual-path algorithm is the same as that of the NSB dual-path algorithm. In the routing step, the time omplexity is O( 1 N ) in the worst ase. The total time omplexity of the optimum dual-path algorithm in the worst ase is O(d log d) +O(1) + O( 1 N ) = O( 1 n!+dlog d). In the following, we first desribed two related lemmas, and then verify the optimum dual-path routing an always be ompleted and the routing is deadlok-free. Lemma 4. The message routing using DHPB dual-path algorithm, based on subnetworks N h 1 (using RF 1 for high-hannel routing) and N l (using RF for low-hannel routing), in a star graph with two HPs (HP 1 and HP ) an always be ompleted. Proof. Similar to Lemma. Lemma 5. The message routing using DHPB dual-path algorithm, based on subnetworks N l 1 (using RF 1 for low-hannel routing) and N h (using RF for high-hannel routing), in a star graph with two HPs (HP 1 and HP ) an always be ompleted. Proof. Similar to Lemma. Theorem 3. The message routing using optimum dual-path algorithm in a star graph with two HPs (HP 1 and HP ) an always be ompleted. Proof. Similar to Theorem 1. 10

Theorem 4. The optimum dual-path multiast routing is deadlok-free. Proof. Similar to Theorem. 3..3 Two-Phase Optimum Dual-Path Routing The two-phase optimum routing sheme inludes two phases, soure-to-relay and relay-to-destination. In soure-to-relay phase the optimum dual-path routing is used and in relay-to-destination phase the highhannel routing is used. This routing sheme inludes four steps. First, in an n-star S n, S n an be partitioned into n disjoint (n 1)-stars S n 1 (1), S n 1 (),, S n 1 (n) aording to the nth symbol (the last dimension) of the nodes in S n. The nodes in destination node set D are olleted into a set. is partitioned into n subsets 1,,, n aording to the nth symbol (the last dimension) of those destination nodes. In this way, the nodes of the same subset are loated on the same (n 1)-star. Seond, for eah subset i,we an find a relay node r i, that is seleted out of S n 1 (i), whih is the node with the smallest label value in the (n 1)-star S n 1 (i). Then, the message routing is proeeded by two phases: soure-to-relay and relay-to-destination. In the soure-to-relay phase, the message of the soure node s is sent to the relay nodes r i of i aording to the optimum dual-path routing. In the relay-to-destination phase, for eah subset i, the message reeived by relay node r i via high-hannel routing based on subnetwork N h 1 (whih is a speial ase of the optimum dual-path routing) is sent to all destination nodes in i. For the sample multiast, the two-phase optimum dual-path routing is proeeded as follows. In the destination-nodes partition step, we first define the destination node set as: = f134 (1;13) ; 143 (8;0) ; 341 (1;0) ; 143 (17;5) ; 341 (19;7) ; 341 (1;9) g. Then, is partitioned by the 4th symbol (the last dimension) of eah node into four subsets: 1 = f134 (1;13) g, = f143 (8;0) g, 3 = f341 (1;0) ; 143 (17;5) g, and 4 = f341 (19;7) ; 341 (1;9) g. In the relay-nodes finding step, for eah subset i we an find a relay node r i whih owns the smallest label in the 3-star S 3 (i). Thus, we obtain the relay nodes r 1 = 134 (0;1), r = 413 (6;18), r 3 = 341 (1;0), and r 4 = 431 (18;6), respetively. Then, the multiasting is proeeded by following two phases. In the soure-to-relay phase, the soure node s routes a message to eah of the relay nodes r i. That is, the soure node 134 (3;15) sends a multidestination message to the relay nodes 134 (0;1), 413 (6;18), 341 (1;0), and 431 (18;6). In this phase, the multiasting set is R 0 = f134 (3;15) ; 134 (0;1) ; 413 (6;18) ; 341 (1;0) ; 431 (18;6) g. In the multiasting set R 0, beause `1(134) = 3» 1 N = 1, the multiasting selets the subnetworks Nh and 1 N l for routing. Thus, R0 is ompleted by two submultiasting sets, R 0h and R 0l, where R 0h = f134 (3;15) ; 413 (6;18) ; 341 (1;0) g and R 0l = f134 (3;15) ; 134 (0;1) ; 431 (18;6) g.inr 0h the message is transmitted via high-hannel routing based on subnetwork N h, and in 1 R0l the message is sent through low-hannel routing based on subnetwork N l. In the relay-to-destination phase, eah relay node r i routes a message to destination nodes in the subset i. That is, the relay nodes 134 (0;1), 413 (6;18), 341 (1;0), and 431 (18;6) send a multidestination message to the destination nodes in eah respetive subset. In this phase, a multiasting set R 00 is divided into four multiasting subsets: R 00h 1 = f134 (0;1) ; 134 (1;13) g, R 00h = f413 (6;18) ; 143 (8;0) g, R 00h 3 = f341 (1;0) ; 143 (17;5) g, and R 00h 4 = f431 (18;6) ; 341 (19;7) ; 341 (1;9) g. For this multiast example, if we use two-phase optimum dual-path routing, the total number of hannels traversed is (5+6)+(1++1+3)=18, and the maximum distane is max(5,6)+max(1,,1,3)=9. So, the total number of hannels traversed of twophase optimum dual-path routing is smaller than that of hamiltonian-path routing but larger than that of SHPB dual-path routing. The maximum distane of two-phase optimum dual-path routing is smaller than that of hamiltonian-path and SHPB dual-path routing. Conerning the time omplexity of the two-phase optimum dual-path algorithm. Let n be the dimension of star graph, d be the number of destination nodes, and N = n! be the number of nodes of star graph. In destination-nodes partition step, the time omplexity is O(d). In relay-nodes finding step, the 11

time omplexity is O(n) in the worst ase. In soure-to-relay phase, in the worst ase the number of relay nodes is n, the time omplexity is O(n log n) +O(1) + O(n) =O(n log n) in the worst ase. In relay-todestination phase, the message is routed in the (n 1)-star, the time omplexity is O((n 1)! + d log d) in the worst ase. The total time omplexity of the two-phase optimum dual-path algorithm in the worst ase is O(d) +O(n) +O(n log n) +O((n 1)! + d log d) =O((n 1)! + d log d). In Theorem 5 and Theorem 6, we prove that multiasting based on the two-phase optimum dual-path algorithm an always be ompleted and the routing is deadlok-free. Theorem 5. The message routing using two-phase optimum dual-path algorithm in a star graph with two HPs (HP 1 and HP ) an always be ompleted. Proof. Based on Theorem 3, it is obvious. Theorem 6. The two-phase optimum dual-path multiast routing is deadlok-free. Proof. Based on Theorem 4, it is obvious. In all our proposed DHPB multiasting algorithms with two virtual hannels, we use the hannel subnetworks that have been desribed in previous subsetion. Beause the subnetworks are disjoint and ayli, no yli resoure dependeny an our [7]. Thus, the proposed routing algorithms developed based on those subnetworks are deadlok-free. 4 Simulation Results In this setion, the experimental results are presented and disussed. We ompare our proposed shemes with the uniast-based, the hamiltonian-path, and the SHPB dual-path shemes. To evaluate the performane of the multiast shemes in an interonnetion network, there are some parameters that must be onsidered: the multiast size, the message length, the startup lateny, the link lateny, and the router lateny. The multiast size d is the number of destination nodes, and the message length f is the number of flits in a message. The message startup lateny t s inludes the software overhead for buffers alloating, messages oping, router initializing, et. The link lateny t l is the propagation delay of message through a link of network. The router lateny t r is the delay inside the router for handling multidestination messages. We first give our assumptions to the parameters of system arhiteture in the simulations. All simulations were performed for a 70-node (6-dimension) star graph network. We examined the routing performane of our proposed shemes under various multiast sizes, startup latenies, and message lengths. The soure node and the destination nodes for eah multiasting were randomly generated. For all simulation experiments, we assumed system parameters representing the urrent simulation trend in tehnology [7, 1]. The large message startup lateny t s is set to be 10:0 miroseonds (5.5 miroseonds for message sending lateny, 4.5 miroseonds for message reeiving lateny), and the small message startup lateny t s is 1:0 miroseond (550 nanoseonds for message sending lateny, 450 nanoseonds for message reeiving lateny). The small message startup latenies were usually used for advaned network interfae to improve the effiieny of lateny time. The link propagation lateny t l is 5:0 nanoseonds. The router lateny for handling multidestination messages t r is 40:0 nanoseonds; however, it is set to 0:0 nanoseonds in uniast-based routing. For all of the multiasting, the message sizes of 6, 10, and 400 flits were simulated. 1

4.1 Performane under Different Multiast Sizes Figure 7 and Figure 8 present the performane of the various multiast shemes on a 6-star network with small and large message latenies, respetively. Results are shown for message lengths of 6,10, and 400 flits, respetively. It is observed that, the performane of all path-based algorithms is superior to that of the uniast-based algorithm. This is beause the uniast-based algorithm is a multiple-phase multiasting that needs more startup lateny for proessing. 30 60 700 Multiast lateny (in miroseonds) 5 0 15 10 5 0 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 10 40 360 480 600 70 Multiast lateny (in miroseonds) 50 40 30 0 10 0 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 10 40 360 480 600 70 Multiast lateny (in miroseonds) 600 500 400 300 00 100 0 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 10 40 360 480 600 70 Number of destinations Number of destinations Number of destinations (a) (b) () Figure 7: Multiast lateny in a 6-star network with small message startup lateny. (a) Message length = 6 flits. (b) Message length = 10 flits. () Message length = 400 flits. 140 160 800 Multiast lateny (in miroseonds) 10 100 80 60 40 0 0 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 10 40 360 480 600 70 Multiast lateny (in miroseonds) 140 10 100 80 60 40 0 0 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 10 40 360 480 600 70 Multiast lateny (in miroseonds) 700 600 500 400 300 00 100 0 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 10 40 360 480 600 70 Number of destinations Number of destinations Number of destinations (a) (b) () Figure 8: Multiast lateny in a 6-star network with large message startup lateny. (a) Message length = 6 flits. (b) Message length = 10 flits. () Message length = 400 flits. In Figure 7, with small message startup latenies the performane of our proposed DHPB algorithms is superior to that of uniast-based, hamiltonian-path, and SHPB dual-path algorithms exept for very long messages. The performane of the two-phase optimum dual-path algorithm is the best with short and medium message lengths. For long messages, the optimum dual-path algorithm performs the best. This is beause in the two-phase optimum dual-path algorithm the message lengths plays a determining role on the performane of message transmission and its impat to transmission lateny is larger for long messages, but smaller for short and medium messages. Figure 8 shows the performane with large message startup latenies. The performane of NSB dualpath and optimum dual-path algorithms is better than that of hamiltonian-path and SHPB dual-path algorithms. With short and medium messages, the performane of two-phase optimum dual-path algorithm is 13

better than that of SHPB dual-path algorithm for large number of destinations, but worse than that of SHPB dual-path algorithm for small number of destinations. For long messages, the performane of two-phase optimum dual-path algorithm is worse than that of hamiltonian-path and SHPB dual-path algorithms. The reason is the same as that desribed above. The optimum dual-path routing sheme performs very well for short, medium, and long messages. 4. Utilization of Network Traffi We then onsider the traffi (in links) of interonnetion networks. The network traffi may affet other ommuniation in the network. We simulated the network traffi by the total number of links visited. Eah link visited represents the use of one ommuniation link by one message. Figure 9 presents the link usage for a 6-star network over various multiast sizes. In Figure 9, the DHPB algorithms require fewer ommuniation links than that of the uniast-based algorithm. The network traffi of our proposed algorithms is superior to that of hamiltonian-path algorithm and almost equal to that of SHPB dual-path algorithm. 3500 Network traffi (in links) 3000 500 000 1500 1000 500 uniast-based hampath SHPB dualpath DHPB nsb_dualpath DHPB opt_dualpath DHPB tpo_dualpath 0 0 10 40 360 480 600 70 Number of destinations Figure 9: Network traffi in a 6-star network. 5 Conlusions In this paper, we first address a dual-hamiltonian-path-based (DHPB) routing model with two virtual hannels based on two hamiltonian paths and a network partitioning strategy for wormhole star graph networks. Then, we propose three effiient multiast routing shemes on basis of suh a model. All of the three proposed shemes are proved deadlok-free. The former two shemes, NSB dual-path and optimum dual-path routing shemes, have the advantage of reduing the number of traversed links to improve the ommuniation performane. The third sheme, two-phase optimum dual-path routing sheme, has the advantage of reduing both the number of traversed links and parallel transmission in the seond phase. Finally, experimental results are given to show our proposed three routing shemes outperform the uniast-based, the hamiltonian-path, and the single-hamiltonian-path-based (SHPB) dual-path routing shemes signifiantly. In general, the optimum dual-path algorithm is the best for a large message startup lateny, while the twophase optimum dual-path algorithm is the best for a small message startup lateny exept for very long messages. 14

Aknowledgements This work was supported by the National Siene Counil of Republi of China under grants NSC-90-13-E-309-006 and NSC-90-745-P-309-003. In addition, the authors would like to thank the anonymous referees for their ritial review and valuable omments and greatly improving the overall presentation of this paper. Referenes [1] S.B. Akers, D. Harel, and B. Krishnamurthy, The Star Graph : An Attrative Alternative to the n- Cube, Proeedings of the 1987 International Conferene on Parallel Proessing, pp. 393-400, August 1987. [] S.B. Akers and B. Krishnamurthy, A Group-Theoreti Model for Symmetri Interonnetion Networks, IEEE Trans. on Computers, Vol. 38, No. 4, pp. 555-565, April 1989. [3] T.-S. Chen, N.-C. Wang, and C.-P. Chu, Multiast Communiation in Wormhole-Routed Star Graph Interonnetion Networks, Parallel Computing, Vol. 6, No. 11, pp. 1459-1490, Otober 000. [4] L.D. Coster, N. Dewulf, and C.T. Ho, Effiient Multi-Paket Multiast Algorithms on Meshes with Wormhole and Dimension-Ordered Routing, Proessdings of International Conferene on Parallel Proessing, Vol. III, pp. 137-141, Augest 1995. [5] W.J. Dally and C.L. Seitz, Deadlok-Free Message Routing in Multiproessor Interonnetion Networks, IEEE Trans. on Computers, Vol. C-36, No. 5, pp. 547-553, May 1987. [6] Y. Lan, A.H. Esfahanian and L.M. Ni, Multiast in Hyperube Multiproessors, Journal of Parallel and Distributed Computing, pp. 30-41, 1990. [7] X. Lin, P.K. MKinley, and L.M. Ni, Deadlok-Free Multiast Wormhole Routing in D Mesh Multiomputers, IEEE Trans. on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 793-804, Otober 1994. [8] P.K. MKinley, H. Xu, A.H. Esfahanianm, and L.M. Ni, Uniast-Based Multiast Communiation in Wormhole-Routed Networks, IEEE Trans. on Parellel and Distributed Systems, Vol. 5, No 1, pp. 15-165, Deember 1994. [9] P.K. MKinley, Y.J. Tsai, and D.F. Robinson, Colletive Communiation in Wormhole-Routed Massively Parallel Computers, Computer, Vol. 8, No. 1, pp. 39-50, Deember 1995. [10] L.M. Ni and P.K. MKinley, A Survey of Wormhole Routing Tehniques in Diret Networks, Computer, Vol. 6, No., pp. 6-76, February 1993. [11] M. Nigam, S. Sahni, and B. Kirshnamurthy, Embedding Hamiltonians and Hyperubes in Star Interonnetion Graphs, Proeedings of International Conferene on Parallel Proessing, Vol. 3, pp. 340-343, August 1990. [1] D.K. Panda, S. Singal, and R. Kesavan, Multidestination Message Passing in Wormhole k-ary n-ube Networks with Base Routing Conformed Path, IEEE Trans. on Parellel and Distributed Systems, Vol. 10, No. 1, pp. 76-96, January 1999. [13] Y.-C. Tseng and J.-P. Sheu, Toward Optimal Broadast in a Star Graph Using Multiple Spanning Trees, IEEE Trans. on Computers, Vol. 46, No. 5, pp. 593-599, May 1997. 15