NETWORK-BASED parallel processing using system area

Size: px

Start display at page:

Download "NETWORK-BASED parallel processing using system area"

Nathaniel Jones
6 years ago
Views:

1 320 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 An Effective Design of Deadlock-Free Routing Algorithms Based on 2D Turn Model for Irregular Networks Akiya Jouraku, Michihiro Koibuchi, Member, IEEE, and Hideharu Amano, Member, IEEE Abstract System area networks (SANs), which usually accept arbitrary topologies, have been used to connect hosts in PC clusters. Although deadlock-free routing is often employed for low-latency communications using wormhole or virtual cut-through switching, the interconnection adaptivity introduces difficulties in establishing deadlock-free paths. An up*/down* routing algorithm, which has been widely used to avoid deadlocks in irregular networks, tends to make unbalanced paths as it employs a one-dimensional directed graph. The current study introduces a two-dimensional directed graph on which adaptive routings called left-up first turn (L-turn) routings and right-down last turn (R-turn) routings are proposed to make the paths as uniformly distributed as possible. This scheme guarantees deadlock-freedom because it uses the turn model approach, and the extra degree of freedom in the two-dimensional graph helps to ensure that the prohibited turns are well-distributed. Simulation results show that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes. The L-turn routings, which meet this condition, improve throughput by up to 100 percent compared with two up*/down*-based routings, and also reduce latency. Index Terms Adaptive routing, deadlock avoidance, turn model, irregular topologies, system area networks, interconnection networks, PC clusters. Ç 1 INTRODUCTION NETWORK-BASED parallel processing using system area networks (SANs) has been researched as potential cost-effective parallel-computing environments [1], [2], [3], [4]. SANs, which consist of switches connected with pointto-point links, are likely to provide low-latency highbandwidth communications like those of interconnection networks in massively parallel computers. SANs architectures use wormhole routing [5] or virtual cut-through [6] as their switching technique, and they achieve reliable communications at the hardware level with deadlock-free routing. Such communication simplifies the design of a system software stack including a lightweight communication library [7] which provides zero-copy or one-copy communication. Unlike the interconnection networks used in massively parallel computers, SANs usually accept arbitrary topologies so as to provide extensibility and dependability to cope with low-reliability commodity hosts. The interconnection adaptivity, however, makes it difficult to establish paths that are free of deadlocks. A deadlock-free routing algorithm is thus crucial for making efficient use of network resources, yet the current deadlock-free routing algorithms. A. Jouraku and H. Amano are with the Department of Information and Computer Science, Keio University, Hiyoshi, Kouhoku-ku, Yokohama , Japan. {jouraku, hunga}@am.ics.keio.ac.jp.. M. Koibuchi is with the Infrastructure Systems Research Division, National Institute of Informatics, National Center of Sciences, Hitotsubashi, Chiyoda-ku, Tokyo , Japan. koibuchi@nii.ac.jp. Manuscript received 12 June 2005; revised 7 Dec. 2005; accepted 20 Jan. 2006; published online 25 Jan Recommended for acceptance by S. Olariu. For information on obtaining reprints of this article, please send to: tpds@computer.org, and reference IEEECS Log Number TPDS in massively parallel computers with regular topologies [5], [8], [9], [10] cannot be directly employed in most cases. The following two strategies can be taken when a routing algorithm is designed: Deterministic routing takes a single path between hosts, and it guarantees in-order packet delivery between the same pair of hosts [5]. On the other hand, adaptive routing [8], [11], [9], [10], [12] dynamically selects the route of a packet in order to make the best use of bandwidth in interconnection networks. In adaptive routing, when a packet encounters a faulty or congested path, another bypassing path can be selected. Since this allows for a better balance of network traffic, adaptive routing improves throughput and latency. In spite of the adaptive routing s advantages, most current SANs [1], [2], [13] do not employ it. This is because it does not guarantee in-order packet delivery, which is required for some message-passing libraries, and the logic to dynamically select a single channel from among a set of alternatives might substantially add to the switch s complexity. However, simple sorting mechanisms for out-oforder packets in network interfaces have been researched [14], [15] and real parallel machines, such as the Cray T3E [16] or the Reliable Router [17], have shown the feasibility of adaptive routing. A simple method to support adaptivity in InfiniBand switches has also been proposed [18]. We thus consider that these switches will employ adaptive routing as well as interconnection networks in massively parallel computers. Adaptive-routing strategies to avoid deadlocks are classified into two approaches. The simpler approach removes cyclic channel dependencies in the channel dependency graph (CDG) [8], [11], [9]. The more complex one deals with cyclic channel dependencies by introducing /07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society

2 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR escape paths [10], [12]. The latter strategy is difficult to apply to arbitrary topologies with neither virtual channels nor buffers. On the other hand, the former strategy is usually based on a spanning tree for irregular networks [19], [20], [21], and it exploits the connectivity and acyclicity of the tree structure. In particular, up*/down* routing has been widely used to avoid deadlocks in SANs [19], [12], [22], [23], [21]. Up*/down* routing allocates a direction (up or down) to each channel, and it restricts packet transfer from the down direction to the up direction in order to guarantee deadlock-freedom. However, the algorithm tends to make unbalanced paths because it employs a onedimensional directed graph. In this study, we propose two turn model [8] based adaptive deadlock-free routing algorithms, called left-up first-turn (L-turn) routing and right-down last turn (R-turn) routing, that work by extending the dimension of the directed-graph used in up*/down* routing [24], [25]. The proposed routing algorithms do not require virtual channels like up*/down* routing. By taking advantage of the extra degree of freedom of a two-dimensional graph, the proposed routing algorithms set well-distributed routing restrictions which ensure deadlock-freedom. Since they avoid deadlocks by removing all cyclic channel dependencies, they can be deterministically implemented by selecting a single path for each source-destination pair [26]. These deterministic routings still provide better balanced paths because they use sophisticated path selection algorithms [26]. One of the algorithms has already been used for deterministic routing in the RHiNET-2 cluster [4]. The rest of this paper is organized as follows: Section 2 describes the L-turn and R-turn routings based on an extended dimensional graph and Section 3 describes an evaluation using a flit-level simulation. Section 4 discusses related work and Section 5 presents our conclusions. 2 L/R-TURN ROUTING ALGORITHMS 2.1 Motivation Up*/down* routing has been widely used to avoid deadlocks in SANs with arbitrary topologies using neither virtual channels nor buffers. Up*/down* routing is based on the assignment of direction to network channels [19]. As the basis of the assignment, a spanning tree whose nodes correspond to switches in the network is built. The up end of each channel is then defined as follows: 1) the end whose node is closer to the root in the spanning tree and 2) the end whose node has the lower unique identifier (UID), if both ends are on nodes at the same tree level. A legal path must traverse zero or more channels in the up direction followed by zero or more channels in the down direction, and this rule guarantees deadlock-freedom while still allowing all hosts to be reached. However, an up*/ down* routing algorithm tends to make unbalanced paths because it employs a one-dimensional directed graph. We logically demonstrate the unbalanced paths of up*/ down* routing from the turn-model point of view [8]. In the turn model, all directions of packet turns and their cycles in the target network are analyzed. Accordingly, a set of turns that are just sufficient to break all of the cycles is prohibited Fig. 1. Pairs of prohibited packet turns in up*/down* routing. (the details are in Section 2.3). Since only a one-dimensional direction, up or down, is included in the graph for up*/ down* routing, there are only two turns and one cycle. All cyclic dependencies between channels are thus broken by prohibiting a turn from the down direction to the up direction, and a pair of prohibited turns between two links is always formed, leading to the unbalanced paths. In Fig. 1, a pair of prohibited turns is formed at node B, and three pairs of prohibited turns are formed at node A. This concentration of prohibited turns would lead packets in the root direction, and the resulting heavy traffic around the root would cause congestion that would degrade the total throughput. Such concentration phenomena essentially stems from the simple classification of turns in the graph. That is, only one prohibited turn is sufficient to guarantee deadlockfreedom because only two directions and two turns are defined in the graph. To resolve this problem, we introduce a two-dimensional directed graph called an H/V graph. Since the H/V graph provides four directions and 12 turns, a prohibited turn set for deadlock-free routing can be selected adaptively and systematically, thereby achieving better traffic balance. Below, we describe our methodology to construct an H/V graph and new routing algorithms based on a twodimensional turn model on the H/V graph. Although the two-dimensional graph for irregular networks makes deadlock-avoidance complicated, we will investigate all cyclic-free turn sets. 2.2 Constructing an H/V Graph Building a BFS Spanning Tree The BFS spanning tree is built using the same method as in up*/down* routing. As in the case of a one-dimensional graph for up*/down* routing, a depth is assigned to each node and it is used to determine the vertical direction, i.e., up or down, of each channel. Definition 1 (depth). The depth of the node is the minimal distance from the root node. For example, Fig. 2a shows the assignment of depths to an irregular network with nine switches. Each node corresponds to a switch, and each link is a bidirectional physical channel. We call a link that belongs to a spanning tree a tree link and a link that does not belong to a spanning tree an outer link. As shown in the figure, the same depth can be assigned to different nodes.

3 322 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 Fig. 2. Constructing an H/V graph: (a) Depth, (b) depth and horizontal spread, and (c) H/V graph Assignment of Horizontal Spread to Each Node To construct a two-dimensional directed graph, we assign a horizontal spread to each node in addition to a depth and introduce a horizontal direction, i.e., left or right. Definition 2 (horizontal spread). A horizontal spread is assigned to each node by pre-order traversal of the spanning tree. An ascending integer which is incremented in the visiting order is assigned. In the next section, the horizontal spreads are used to determine the horizontal direction of each channel and the vertical direction of the channels between nodes having the same depth. After the horizontal spreads have been assigned, the two-dimensional coordinates of each node are determined. The coordinates of node N are represented as ðh; dþ, where h and d are the horizontal spread and the depth of node N, respectively. For example, Fig. 2b shows the assignment of horizontal spreads to the network shown in Fig. 2a. In Fig. 2b, each node has unique coordinates because the horizontal spread of each node is an ascending integer in the visiting order of the preorder traversal, and each child node has a larger horizontal spread than that of its parent node in the spanning tree. The latter feature is used to guarantee the path connectivity of the routing algorithms described in Section 2.3. Since two or more children nodes can be selected as the next-visit node in the preorder traversal, several selection policies can be applied. Various H/V graphs can thus be built from the same target network. Section 3 evaluates the effect of different selection policies Assignment of Directions to Channels The vertical and horizontal directions are assigned to each channel according to the two-dimensional coordinates of each node, and H/V directions are then introduced by combining them. To begin with, the horizontal direction, left or right, is assigned to each channel according to the following definition: Definition 3 (horizontal direction). The horizontal direction of the channel from ðx s ;y s Þ to ðx d ;y d Þ is determined as follows: 1) left is assigned if x s >x d ; 2) right is assigned if x s <x d. Next, the vertical direction, up or down, is assigned to each channel based on the following definition. Definition 4 (vertical direction). The vertical direction of the channel from ðx s ;y s Þ to ðx d ;y d Þ is determined as follows: 1) up is assigned if ðy s >y d Þ_ððy s ¼ y d Þ^ðx s <x d ÞÞ; and 2) down is assigned if ðy s <y d Þ_ððy s ¼ y d Þ^ðx s >x d ÞÞ. The H/V direction is assigned to each channel according to the following definition. Definition 5 (H/V direction). The H/V direction of each channel HV ðh; vþ is defined using the pair of horizontal ðhþ and vertical ðvþ directions as follows: 1) the left-up (LU) direction is assigned to HV ðleft; upþ; 2) the left-down (LD) direction is assigned to HV ðleft; downþ; 3) the right-up (RU) direction is assigned to HV ðright; upþ; and 4) the rightdown (RD) direction is assigned to HV ðright; downþ. The coordinates of nodes introduce a two-dimensional directed graph (an H/V graph) by virtue of assigning the H/V direction to each channel. In particular, the subgraph of the H/V graph that consists of channels in the spanning tree is called the H/V tree. Fig. 2c shows the H/V graph for the network in Fig. 2b. 2.3 Turn-Model-Based Routing Algorithms We devise the deadlock-free routing algorithms by breaking all possible cycles in the H/V graph. To do so, the algorithms are designed according to the turn model [8]. The original turn model methodology is based on the four steps [8]: 1. Identify all possible turns from one direction to another. 2. Identify all possible cycles that the turns can form. 3. Prohibit the minimum number of turns so that at least one turn is prohibited in each cycle. 4. Incorporate as many turns as possible without reintroducing cycles.

4 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR Fig. 3. All possible turns in the H/V graph. However, an H/V graph can include complicated cycles that are difficult to identify. This difficulty in turn causes difficulty with Steps 2 and 3. Thus, to find all possible cycles in the H/V graph, we combine Steps 2 and 3 as follows: As soon as a cycle is identified, a prohibited turn is chosen to break this cycle; the next cycle is then searched for under the condition that it is prohibited. Notice that to make the paths as uniformly distributed as possible, we add the following policy to select prohibited turns: The prohibited turn is distributed as uniformly as possible. Furthermore, corresponding to Step 4, we introduce an algorithm that identifies cycles including redundant prohibited turns in a target topology. In the following sections, we introduce the methodology for designing deadlock-free routing algorithms on the H/V graph by using the turn model with the above-described policies Preliminaries The following notation is introduced to represent routing algorithms: Definition 6 (Turn). T p dir;n dir represents the turn from a direction p dir to another direction n dir. Definition 7 (Turn dependency). TDðT i ;T j Þ represents the direct turn dependency between T i and T j in which T j is formed immediately after T i. Definition 8 (Cycle). CðT 0 ;T 1 ;...;T n 1 Þ represents the cycle formed by a turn dependency ftdðt i ;T j Þjj ¼ði þ 1Þ mod n; i ¼ 0; 1;...;n 1g: For example, there is a turn dependency TDðT up;down ; T down;up Þ and a cycle CðT up;down ;T down;up Þ in the one-dimensional directed graph for up*/down* routing Identifying All Possible Turns Fig. 3 shows all possible turns from an H/V direction to another H/V direction in the H/V graph. Since there are four H/V directions in the graph, there are 12 possible turns Identifying All Possible Cycles and Choosing the Prohibited Turns We identify all possible cycles formed by the turns shown in Fig. 3, and prohibit a minimum turn set in order to break all cycles. For better traffic balance, the prohibited turns are chosen so as not to concentrate traffic at specific nodes. To begin with, we identify simple cycles consisting of tree links and an outer link. Since a tree consists of n nodes connected by n 1 links, an outer link introduces a cycle. When an outer link is added to the H/V tree, two of four cycles shown in Fig. 4 are always generated. In Figs. 4a and 4b, nodes B and C are directly connected with a single outer link and have the same ancestor node A. Each channel between B and C has a different direction in the two subgraphs. The cycles in Fig. 4a are C 1 ðt LU;RD ;T RD;RU ;T RU;LU Þ and C 2 ðt LU;RD ; T RD;LD ;T LD;LU Þ. Those in Fig. 4b are C 3 ðt LU;RD ;T RD;LU Þ and C3 0 ðt LU;RD;T RD;LU Þ. Since C 3 and C3 0 are logically equivalent, we will consider only C 3. To break the three cycles, one of the turns in each cycle must be prohibited according to the following two policies: 1) do not prohibit T LU;RD and 2) select a set of prohibited turns Fig. 4. Four possible cycles in the H/V graph.

5 324 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 of the turns in Q 1 thus include one of the turns in P 1.In the same way, there are no cycles that include one in Q 2, but none in P 2. tu Fig. 5. Prohibited turns in the H/V graph (dotted lines are prohibited turns). so as not to concentrate traffic, as shown in Fig. 1. To guarantee connectivity, T LU;RD must not be prohibited because T LU;RD can be formed between spanning tree channels. Accordingly, ft LD;LU ;T RU;LU g or ft RD;RU ;T RD;LD g is chosen as the prohibited turns to break C 1 and C 2. As shown in Figs. 5a and 5b, the prohibited turns are well distributed. However, to break C 3, it is necessary to prohibit T RD;LU, which concentrates the prohibited turns as shown in Fig. 5c, because T LU;RD cannot be prohibited. Thus, one of the following two turn sets should be chosen for breaking the three cycles: P 1 ¼ ft LD;LU ;T RU;LU ;T RD;LU g; P 2 ¼ ft RD;RU ;T RD;LD ;T RD;LU g: Second, we identify the other cycles that do not include the above prohibited turn set (P 1 or P 2 ). Although three turns in P 1 or P 2 are prohibited, the other nine turns in Fig. 3 can still form cycles. The nine turns can be classified into two turn sets under the condition that P 1 is prohibited: Q 1 ¼fT LU;n dir j n dir 2fLD; RU; RDgg and Q 0 1 ¼fT p dir;n dir j p dir; n dir 2 fld; RU; RDg; p dir 6¼ n dirg. Similarly, to identify cycles that do not include a turn in P 2, we classify the other nine turns into Q 2 ¼fT p dir;rd j p dir 2fLU; LD; RUgg and Q 0 2 ¼ ft p dir;n dir j p dir; n dir 2fLU; LD; RUg; p dir 6¼ n dirg. Theorem 1. A cycle including a turn in Q 1 includes a turn in P 1 and a cycle including a turn in Q 2 includes a turn in P 2. Proof. Assume that there is a cycle such that a turn T x in Q 1 is included, but no turn in P 1 is included. Since T x is formed by a packet turn from the LU direction to another direction, a turn formed immediately before T x must be one of the turns in ft p dir;lu j p dir 2fLD; RU; RDgg. However, the turn set is equivalent to P 1, which contradicts the above assumption. Cycles including one Theorem 1 demonstrates that Q 1 does not form cycles when P 1 is prohibited. As a result, all possible cycles including a turn with the LU direction are broken and the remaining possible cycles consist of six turns in Q 0 1 that does not include ones in the LU direction. Theorem 1 also demonstrates that Q 2 does not form cycles when P 2 is prohibited. The remaining possible cycles thus only consist of turns in Q 0 2 when P 2 is prohibited. To show such cycles, we introduce a turn dependency graph (TDG) for Q 0 1 and Q0 2, as shown in Fig. 6. In the TDG, each node represents a turn and each arrow represents a direct turn dependency between two turns. All possible cycles formed by Q 0 1 are based on one of the four cycles shown in Fig. 6a as dotted cycles, namely, C 4 ðt RU;RD ;T RD;LD ;T LD;RU Þ; C 5 ðt RD;RU ;T RU;LD ;T LD;RD Þ; C 6 ðt LD;RU ;T RU;LD Þ; and C 7 ðt RD;RU ;T RU;RD ;T RD;LD ;T LD;RD Þ: Similarly, there are also four cycles formed by Q 0 2 in Fig. 6b: C 8 ðt LD;RU ;T RU;LU ;T LU;LD Þ; C 9 ðt LD;LU ;T LU;RU ;T RU;LD Þ; C 10 ðt LD;RU ;T RU;LD Þ; and C 11 ðt LD;LU ;T LU;RU ;T RU;LU ;T LU;LD Þ: Fig. 7 shows these cycles for each turn set. Notice that the cyclic turn dependencies TDðT RD;RU ; T RU;RD Þ and TDðT LU;LD ;T LD;LU Þ in Fig. 6 cannot form cycles in the H/V graph, because the turns in the dependencies never turn in the horizontal direction. The cyclic turn dependencies TDðT RD;LD ;T LD;RD Þ and TDðT LU;RU ;T RU;LU Þ also cannot form cycles for the similar reason. To break the four cycles formed by Q 0 1, P 1 0 ¼fT LD;RU; T LD;RD g or P1 00 ¼fT RU;LD;T RU;RD g can be chosen as the set of well-distributed prohibited turns as shown in Fig. 7a, when P 1 is prohibited. Although prohibiting T LD;RU or T RU;LD concentrates the prohibited turns, they are needed to break Fig. 6. Turn Dependency Graph (TDG): (a) The TDG for Q 0 1 and (b) the TDG for Q0 2.

6 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR Fig. 7. Four possible cycles: (a) Formed by Q 0 1 and (b) formed by Q0 2. cycle C 6. Consequently, one of the following two turn sets should be chosen as the prohibited turns. P 1 þ P1 0 ¼fT LD;LU;T RU;LU ;T RD;LU ;T LD;RU ;T LD;RD g; P 1 þ P1 00 ¼fT LD;LU ;T RU;LU ;T RD;LU ;T RU;LD ;T RU;RD g: All possible cycles with a turn including the LU direction are broken by P 1 and the other possible cycles are broken by P or P1. All possible cycles are thus broken by ðp 1 þ P1 0Þ or ðp 1 þ P1 00Þ. In almost the same way, the turn set P2 0 ¼fT LD;RU; T LU;RU g or P2 00 ¼fT RU;LD;T LU;LD g is prohibited as shown in Fig. 7b if P 2 is chosen instead of P 1. As a result, one of the following turn sets should be chosen: P 2 þ P2 0 ¼fT RD;LD;T RD;RU ;T RD;LU ;T LD;RU ;T LU;RU g; P 2 þ P2 00 ¼fT RD;LD ;T RD;RU ;T RD;LU ;T RU;LD ;T LU;LD g: Reducing Number of Prohibited Turns Four alternative sets of prohibited turns for breaking all possible cycles in the H/V graph were stated in the previous section. However, turns in P1 0, P 1 00, P ,orP2 do not always form cycles owing to another prohibited turn set, i.e., P 1 for P1 0, P 1 00, and P 2 for P2 0, P For example, in Fig. 8, T LD;RU and T LD;RD in P1 0 are prohibited. However, even if the two turns are permitted in the figure, there are still no cycles, since each cycle is broken by each prohibited turn in P 1. That is, depending on the target topology, some of the turns may be prohibited unnecessarily. To reduce such redundant prohibited turns, we introduce a traversal algorithm on the constructed H/V graph to detect the four cycles in Fig. 7 which do not include a turn in another prohibited turn set (P 1 or P 2 )ofp1 0, P 1 00, P ,orP2. It judges whether each turn in one of the four target turn sets form a cycle or not. That is, only the turns in the target turn set that form detected cycles are prohibited at the detected node in the target topology. The following describes the traversal algorithm for identifying the cycles that include a turn in P1 0 (i.e., P 1 is another prohibited turn set). The traversal algorithms for P1 00, P , and P2 share the same procedure except that another prohibited turn set (P 1,orP 2 ) and the target turn set (P1 00, P ,orP2) are interchanged, respectively. The traversal procedure consists of two searches. The first search is as follows: 1. Select a starting node from those connected to one or more output RU channels, and one or more output RD channels (forming T LD;RD in P 0 1 ). 2. Visit an adjacent node that can be reached via an output RD channel from the starting node. 3. The graph is then traversed in the order of depthfirst search under the following conditions:. the next channel is not the LU channel (i.e., does not form a turn in P 1 ) and. does not form a turn that has already been prohibited during the previous search. If the traversal process returns to the starting node via an output LD channel from an adjacent node, a cycle that includes T LD;RD is detected. As a result, T LD;RD which is from the last used LD channel to the first used RD channel is prohibited. 4. Repeat from Step 2 for all RD channels in the starting node. Then, repeat from Step 1 for all starting nodes. The second search is performed in the same way, except that the following conditions apply:. a starting node connects to two or more output RU channels (forming T LD;RU in P 0 1 ),. a channel used for the first visit is an output RU channel, and Fig. 8. Redundant prohibited turns. Fig. 9. Detected cycle by the traversal algorithm for P 0 1.

7 326 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 Fig. 10. The prohibited turn sets of the L-turn and the R-turn routings (dashed lines are prohibited turns and dotted lines are conditionally prohibited turns by the cycle detection algorithm). (a) The L-turn/. (b) The L-turn/. (c) The R-turn/. (d) The R-turn/.. when a cycle is detected, the turn T LD;RU is prohibited. Fig. 9 shows an example of a cycle detected by the traversal algorithm under the condition that the prohibited turn set is P 1 and the target turn set is P1 0. Although there are four starting nodes for the traversal algorithm, only one turn in P1 0 must be prohibited because the other turns do not lead to cycles. The computation cost for the algorithm is Oðn 2 lþ, where n is the number of nodes and l is the number of links per node Definition of Routing Algorithms Four routing algorithms based on the above prohibited turn sets are defined. A notation is introduced to express sets of prohibited turns for deadlock-free routing algorithms. Definition 9 (Prohibited turn set). A prohibited turn set is represented as DP ¼ DAðH;P;P cond Þ, where H, P, and P cond are a target H/V graph, prohibited turn set, and conditionally prohibited turn set by the cycle detection algorithm, respectively. Based on Definition 9, the four routing algorithms are defined as follows:. L-turn (Left-up first turn)/ routing prohibits DP la ¼ DAðH;P 1 ;P 0 1 Þ, where P 1 ¼fT LD;LU ;T RU;LU ;T RD;LU g is prohibited, and P 0 1 ¼fT LD;RU;T LD;RD g is conditionally prohibited.. L-turn/ routing prohibits DP lb ¼ DAðH;P 1 ;P1 00Þ, where P 1 is prohibited, and P1 00 ¼fT RU;LD;T RU;RD g is conditionally prohibited.. R-turn (Right-down last turn)/ routing prohibits DP ra ¼ DAðH;P 2 ;P2 0Þ, where P 2 ¼fT RD;RU ;T RD;LD ; T RD;LU g is prohibited, and P2 0 ¼fT LD;RU;T LU;RU g is conditionally prohibited.. R-turn/ routing prohibits DP rb ¼ DAðH;P 2 ;P2 00Þ, where P 2 is prohibited, and P2 00 ¼fT RU;LD;T LU;LD g is conditionally prohibited. Since all turns to the LU direction are prohibited in both L-turn routings, a packet must start out in that direction in order to reach the destination node in the LU direction. On the other hand, since all turns from the RD direction are prohibited in both R-turn routings, a packet must be lastly transferred in the RD direction in order to reach the destination node in the RD direction. Fig. 10 demonstrates the restrictions of the L-turn and R- turn routings. As shown in the figure, the L-turn and R-turn routings distribute prohibited turns more uniformly than up*/down* routing. Furthermore, Fig. 11 demonstrates the prohibited turns of L-turn routings and west-first turn model [24] in a 2D mesh. This figure shows that the L-turn routings are the same as in the west-first turn model. Theorem 2. The L-turn routings and the R-turn routings are deadlock-free. Proof. In the L-turn routings, all possible cycles that include the LU direction are broken by the prohibited turn set P 1 Fig. 11. Prohibited turns of the west-first turn model and the L-turn routings on 2D mesh (dotted lines are prohibited turns). (a) The west-first turn model. (b) The L-turn routings on 2D mesh.

8 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR and the other possible cycles are broken by the conditionally prohibited turn set P or P1. All possible cycles are thus broken. In the same way, we can show that there are no cycles in the R-turn routings. Therefore, the L-turn routings and the R-turn routings are deadlock-free. tu Theorem 3. The L-turn routings and the R-turn routings guarantee connectivity between every pair of nodes. Proof. A turn between tree links is always turn T LU;RD on the H/V graph. Since the turn T LU;RD is allowed in the L-turn routings and the R-turn routings, there is a path along the spanning tree between any pair of nodes. The connectivity between every pair of nodes is therefore guaranteed. tu As long as a packet travels so as not to form a prohibited turn, all possible paths (shortest or nonshortest) between every pair of nodes are available. However, a hot-spot is more likely to be formed when nonshortest paths are allowed in irregular networks. Accordingly, only the shortest paths should be taken. As in up*/down* routing, the proposed routing algorithms can implement such a path search simply. The following describes a path search algorithm for L-turn/ routing. This algorithm can be applied to the other proposed algorithms by changing its prohibited turn sets. 1. Prohibit all turns in P 1 on the H/V graph. 2. Prohibit those turns in P 0 1 that form any cycle detected by the cycle detection algorithm. 3. Search for the shortest paths between every pair of nodes by using the Dijkstra algorithm [27] under the conditions that 1) each channel has the same constant cost and 2) all channel transitions on prohibited turns are forbidden. Although the proposed algorithms are intended for adaptive routing, they can be implemented as deterministic routing by determining a single path between a sourcedestination pair as in up*/down* routing [28], [26]. 3 PERFORMANCE EVALUATION The performance of the proposed routings and the up*/ down* routings were evaluated in a flit-level network simulation. 3.1 Simulation Environment We developed a flit-level network simulator written in C++, on which we put switch-based networks using point-topoint links. Every switching fabric was assumed to provide the same number of ports (eight ports, such as RHiNET-2/ SW [29], were used) and the same number of hosts connected to every switch (four ports). The remaining four ports were connected to other switches. Two classes of network topologies, irregular and regular, were used. Twenty different irregular topologies were randomly generated under the condition that a single link connected two different switches. The regular topology was a twodimensional torus. The destination of a packet was determined by the traffic patterns, i.e., uniform or bitreversal; in the case of uniform traffic, a destination host is randomly selected. On the other hand, in the case of bitreversal traffic, a host with the identifier ða 0 ;a 1 ; ;a n 1 Þ sends a packet to the host whose identifier is the bit-reversal ða n 1 ; ;a 1 ;a 0 Þ of the source host. Each host injected a packet synchronized to the same interval, leading to burst traffic like that in most scientific applications [4]. The switching fabric was a simple model consisting of channel buffers, a crossbar, link controllers, and control circuits. The delay of recent SAN switches is several hundred nanoseconds, and the optical link delay is quite small (dozens of nanoseconds) [4]. Thus, the simulation assumed that the header flit transfer required at least 23 clock cycles, that is, 21 clock cycles for the switch delay, and the remaining two clock cycles for the link delay in the simulation. Every switch used virtual cut-through as the switching technique. There were no virtual channels like in Myrinet, because the proposed routing algorithms are designed for networks without virtual channels. The performance of tree-based routing algorithms is influenced by the algorithm that builds the spanning tree. A commonly used algorithm to build the tree is the breadth-first search (BFS), which is used in the up*/down* routing of Autonet [19]. The L-turn and R-turn routings use this algorithm for constructing the H/V graph. Sancho et al. proposed a more sophisticated tree algorithm that is based on a depth-first search (DFS) with a heuristic rule for up*/down* routing [21]. Thus, in the simulation, we compared up*/ down* routings using BFS or DFS with the heuristic rule with the proposed L-turn and R-turn routings. To select the root node of each spanning tree, each routing algorithm employed the crossing-path-based heuristic rule proposed by Sancho et al. [21]. The heuristic rule usually improves the throughput compared with the simple root selection policy of the Autonet in which the switch with a unique identifier zero is chosen as the root. The Sancho et al. s root selection rule would also be efficient in the case of the L-turn and R-turn routings since the rule is based on common performance measures, i.e., average hops and crossing paths. To evaluate the impact of the root selection policy on the performance of the proposed routings, we also evaluated the simple root selection policy of the Autonet in Section 3.2. In the L-turn and R-turn routings, different H/V graphs can be constructed from a BFS tree because the coordinates of each node are determined by the next-visit node selection policy of the preorder traversal for assigning horizontal spreads. To investigate the impact of the selection policy, we evaluated three next-visit node selection policies for the preorder traversal;. More upper-channel first visits first the neighboring node with the largest number of up-direction outer channels.. Less child-node first visits first the neighboring node with the smallest subtree whose root is the neighboring node.. More child-node first visits first the neighboring node with the largest subtree whose root is the neighboring node. Unless mentioned otherwise, the following assumes that the simulation used more upper-channel first, whose throughput is usually better.

9 328 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 TABLE 1 Average Throughputs [Flits/Cycle/Host] and Their Standard Deviation of Adaptive Routings on Irregular Topologies Each adaptive routing employed a simple output selection function that randomly selects an available output port. On the other hand, because current SANs, such as Myrinet, support only deterministic routings, we also evaluated the proposed routing algorithms with source routing (deterministic routing). We used Sancho et al. s algorithm [21], which is based on a static analysis of routing paths, to determine a single path between each sourcedestination pair. The simulation time was 500,000 clock cycles and the packet length was 128 flits. 3.2 Simulation Results for Adaptive Routing Table 1 lists the average throughputs of six adaptive routings on 20 different irregular topologies and their standard deviations (SD). We consider that the most important performance metric is throughput. Accepted traffic is the flit reception rate, which is measured in flits per clock cycle per host [30], and the throughput is the maximum accepted traffic. L-turn/, (Root #0) represents the L-turn/ routing with the root selection policy of the Autonet. The table shows that the L-turn routings (except for L-turn/, (Root #0)) achieve the highest throughputs for each condition. In particular, regarding bit-reversal traffic on 64-switch network, they achieve an approximate 22 percent improvement in throughput compared with up*/down* routings. In contrast, the R-turn routings have the worst throughputs. In particular, regarding bit-reversal traffic on 64-switch network, their throughput is approximately 20 percent lower than that of up*/down* routings. Fig. 12 shows the relation between the accepted traffic and the average latency of six routing algorithms on 16 and 64-switch irregular topologies (under bit-reversal traffic and uniform traffic, respectively) which provide nearly average relative performance. The L-turn routings achieve the lowest latency on both networks. Since the throughput and latency of L-turn/ (using Sancho et al. s heuristic rule) are better than those of L-turn/, (Root #0) (not using the rule), it can be said that the selection policy of the root node significantly affects the performance of the proposed routings. Table 2 shows the performance factors of each routing algorithm on 20 different irregular topologies in terms of the average path hops (DIS), average number of prohibited Fig. 12. Accepted traffic versus latency of adaptive routings on irregular topologies. (a) 16-Switch, bit-reverssal traffic. (b) 64-Switch, uniform traffic.

10 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR TABLE 2 The Average Distance (DIS), the Average Value of the Number of Prohibited Turns per Switch (PT), Its Standard Deviation (SD), the Average Number of Crossing Routing Paths on Up Channels (CPUP), and Those on Down Channels (CPDW) of Adaptive Routings on Irregular Topologies turns per switch (PT), standard deviation of the PT (SDPT), average number of crossing routing paths on up channels (CPUP) and those on down channels (CPDW). The SDPT shows how uniformly the prohibited turns are distributed, and a smaller value is better. The crossing routing path is the number of source-destination pairs that passes through a channel, and it indicates the potential channel load. Thus, the CPUP and CPDW show the approximate tendency in which the traffic is more likely to be distributed, i.e., upward (toward the root node) or downward (toward the leaf nodes). The traffic would be distributed toward the root node (leaf nodes) when CPUP (CPDW) is larger than CPDW (CPUP). Since the table shows that all the proposed routings have smaller SDPTs than either up*/down* routings have, it can be said that they all can distribute the prohibited turns more uniformly than the up*/down* routings. However, the balances of CPUP and CPDW show that the L-turn routings would favor distributing the traffic toward leaf nodes, whereas the R-turn routings would favor the root node. Thus, the results in Table 1, Table 2, and Fig. 12 make it clear that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes. Only L-turn routings, which also achieve the highest throughput, meet this condition. Although the R-turn routings distribute prohibited turns as uniformly as L-turn routings, their traffic is more likely to be distributed toward the root node, leading to poorer throughput. The reason why their traffic is distributed toward the root node is as follows: The R-turn routings allow all turns toward the LU direction except for T RD;LU, which leads packets toward the root node. On the other hand, they also prohibit all turns from the RD direction. This restriction restrains packet transfer toward the leaf nodes. For example, Fig. 13 shows the difference in the distributions of CPUP between the L-turn/ and the R-turn/ on a 4 4 2D mesh. The CPUP of R-turn/, especially along tree channels to the root node, is much larger than that of L-turn/. In such a situation, traffic tends to concentrate around the root node, and this degrades the throughput of the R-turn routings. Furthermore, Table 1 and Table 2 indicate that the L-turn/ and L-turn/ achieve almost the same performance. The reason is as follows: The difference between both L-turn routings is only the turn set of conditionally prohibited turns. However, the number of prohibited turns belonging to the conditionally prohibited turn set and the uniformity of their distributions are, on average, almost the same on irregular networks. Thus, this difference does not cause a significant performance gap between these routings on irregular networks. The same considerations apply to the R-turn routings. Table 1 also shows that each routing algorithm achieves higher throughput under bit-reversal traffic than under uniform traffic. This is because, under uniform traffic, packets whose source hosts are different may collide at a consumption channel on the destination host. Such collisions drastically degrade the performance, especially when the network is a small one. On the other hand, regarding bit-reversal traffic, such collisions at the consumption channel do not occur except for packets whose source hosts are the same. For example, in the throughput evaluation of the L-turn/ routing on the 16-switch irregular networks, the frequency of such collisions for uniform traffic is approximately 90 times higher on average than that for bit-reversal traffic. Now, we focus on the impact of the next-visit node selection policy of pre-order traversal, which determines Fig. 13. CPUP of the L-turn/ and R-turn/ on 4 4 2D mesh.

11 330 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 TABLE 3 Average Throughputs [Flits/Cycle/Host] and Their Standard Deviations of the L-Turn/ Routing with Three Next-Visit Node Selection Policies of Preorder Traversal on 64-Switch Irregular Topologies with Uniform Traffic the coordinates of the H/V graph. Table 3 lists the average throughputs and their standard deviations (SD) of L-turn/ routing with three next-visit node selection policy under uniform traffic on 20 different 64-switch irregular topologies. As shown in Table 3, the more upper-channel first policy improves throughput by up to 7 percent compared with the less child-node first policy because it tends to assign more outer links (channels) for the RU/LD direction, which would avoid the concentrated prohibited turns by T RD;LU. Fig. 14 shows the relation between the accepted traffic and the average latency of six routing algorithms on an 8 8 2D torus. Such an evaluation on a regular topology is important because SAN s topology may have iterative or hierarchical structures rather than completely irregular ones. As shown in the figure, the L-turn routings achieve higher throughput and lower latency than the others. In particular, for bit-reversal traffic, L-turn/ has approximately 100 percent higher throughput compared with up*/ down* routings. Thus, it can be said that the L-turn routings are also advantageous on topologies with some regularity. However, we can see a large performance gap between the L-turn/ and routings on the 2D torus. This is because, the difference, which is about the number of prohibited turns belonging to the conditionally prohibited turn set and their uniformity of distribution, between L-turn routings is more likely to be larger on regular networks than that on irregular networks. As shown in Fig. 14, which of the L-turn routings is much better on regular networks depends on conditions such as the traffic patterns. The same considerations also apply to the R-turn routings. Table 1 and Fig. 14 show the approximate tendency such that the relative performance improvement of the L-turn routings grows as network size increases or when bitreversal traffic is used instead of uniform traffic, whereas that of R-turn routings becomes worse as network size increases. 3.3 Simulation Results for Source Routing Table 4 and Fig. 15 show simulation results in the case that the proposed routing algorithms are implemented in the source routing on 16 or 64-switch irregular networks and an 8 8 2D torus, respectively. Like the adaptive routing results, the L-turn routings achieve the highest throughputs, whereas the R-turn routings have the worst throughputs. As shown in Table 1 and Table 4, the throughputs of most adaptive routings are slightly higher than those of the source routings on irregular topologies. We consider that, in irregular networks, the advantage of source routings, which is the efficient traffic balancing by Sancho et al. s algorithm, is almost equal to that of adaptive routings. On the other hand, on a 2D torus, the throughput of most adaptive routings are higher than those of the source routings, as shown in Fig. 14 and Fig. 15. In particular, the throughput of the L-turn/ with adaptive routing is approximately 60 percent higher than that of source routing. We consider that, on a 2D torus, the advantage of adaptive routings, which is the efficient channel utilization by using multiple available paths, is more effective because the 2D torus provides a larger number of paths between pairs of nodes than the irregular networks provide. Table 4 and Fig. 15 show the approximate tendency of the relative performance improvement (degradation) of the L-turn routings (R-turn routings) to become larger (smaller) when the network size becomes smaller or when uniform traffic is used instead of bit-reversal traffic. 4 RELATED WORK There are two strategies, i.e., acyclic channel dependencies, and cyclic channel dependencies with escape paths, for deadlock avoidance-based routing on irregular networks. Fig. 14. Accepted traffic versus latency of adaptive routings on 8 8 2D torus. (a) Uniform traffic. (b) Bit-reversal traffic.

12 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR TABLE 4 Average Throughputs [Flits/Cycle/Host] and Their Standard Deviations of Source Routings on Irregular Topologies Fig. 15. Accepted traffic versus latency of source routings on 8 8 2D torus. (a) Uniform traffic and (b) bit-reversal traffic. The simple approach used for SANs which requires no virtual channels is up*/down* routing as described in Section 2.1 and Section 3.1. The adaptive-trail routing proposed by Qiao and Ni [31] is another method without virtual channels. The method is based on to compute Eulerian trails and establish two unidirectional adaptivetrails which achieve deadlock-free and increased routing adaptivity. However, the method cannot be applied in some topologies, because the necessary and sufficient condition for an Eulerian trail to exist on a graph is that all vertices have even degrees or exactly two vertices have odd degrees. Recently, Sun et al. have proposed DOWN/UP routing [32] which is the turn model-based routing algorithm (i.e., requires no virtual channels) for irregular networks. DOWN/UP routing is based on our previously proposed strategy [24], [25], such as the turn model methodology using a two-dimensional directed graph (the H/V graph) and cycle detection algorithm for irregular networks. In DOWN/UP routing, the number of directions and turns in two-dimensional directed graph are increased by distinguishing the directions of tree links and outer links as different directions, and the packet must go downward (upward) then go upward (downward) when turning between outer (tree) links. Like L-turn routings, the basic policy of DOWN/UP routing is to push the traffic toward the leaf nodes as much as possible. The other approaches are based on using virtual channels or hardware support to improve throughput. When intermediate hosts provide buffers for routing, a true minimal path can be implemented [33]. This approach breaks all cycles by storing packets and reinjecting them later at some intermediate hosts. When an SAN provides virtual channels, multiple up*/down* trees [22], layered shortest-path (LASH) routing [34], layered InfiniBand routing [35], and descending layered (DL) routing [23] can be applied. Flich et al. have proposed the InfiniBand routing based on multiple up*/down* trees whose roots are different, and a packet is forwarded on a single tree-based graph [22] to avoid all cycles. The LASH routing [34] guarantees minimal paths by dividing the physical network into a set of virtual layers, and each path is assigned to a single virtual layer. Minimal InfiniBand routing proposed by Sancho et al. [35] adopts up*/down* routing to make acyclic virtual networks (layers). On the other hand, the adaptive escape-path routing proposed by Silla and Duato allows cyclic dependencies [12]. Since each packet in channels out of the escape path is forwarded along a minimal path, most packets take minimal paths. Another approach to cope with deadlocks is deadlock recovery-based routing, which usually employs minimal fully adaptive routing. It is useful only when deadlocks are infrequent and, recently, the techniques which are applicable

13 332 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 for efficient deadlock recovery-based routing on irregular networks have been proposed. FC3D by Rubio et al. [36] is the flow control-based distributed deadlock detection mechanism that uses only local router information. The mechanism is based on the use of the flow control information which is provided at each router, and tries to detect as few deadlocked messages as possible when a deadlock configuration is reached. FC3D detects all possible deadlocks while reducing the recovery overheads and the probability of false deadlock detection. Song and Pinkston have proposed the reservationbased distributed detection and resolution mechanism for network congestion and potential deadlock [37]. It precisely identifies the congestion configuration or potential deadlock by propagating pinging control packet over congested resources and effectively disperses the detected congestion by providing available resources for blocked packets which form the congestion. These techniques make it possible to effectively exploit the flexibility provided by minimal fully adaptive routing. However, these techniques need additional hardware at each router. Although a large number of deadlock avoidance-based or recovery-based routings have improved network latency and throughput compared with up*/down* routing by introducing virtual channels or hardware support, they would limit the applicability of network architectures or technologies [21]. For example, virtual channels are not always to be employed in current SANs [38]. Owing to its high portability, up*/down* routing is thus still used to avoid deadlocks [21]. In this study, we have focused on a simple routing strategy that requires no virtual channels, like up*/down* routing. 5 CONCLUSIONS System area networks (SANs), which usually accept arbitrary topologies, have been used to connect hosts in PC clusters. Although up*/down* routing has been widely used to avoid deadlocks in SANs, it tends to make unbalanced paths, because it employs a one-dimensional directed graph. In this study, a two-dimensional directed graph is introduced, and adaptive routings, called left-up first-turn (L-turn) routings and right-down last-turn (R-turn) routings are proposed to make the paths as uniformly distributed as possible. These routings guarantee deadlock-freedom because they use the turn model approach, and prohibited turns are welldistributed by taking advantage of extra degree of freedom afforded by a two-dimensional graph. They can also be deterministically implemented by using path selection algorithms. Simulation results show that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes. The L-turn routings, which meet this condition, improve throughput by up to 100 percent compared with two up*/down*-based routings, and also reduce latency. The turn-model-based routings for irregular networks can be extended for n-dimensional graphs by following the strategy explained in Section 3. When there are a large number of channels with different directions to neighboring switches in each switch, the turn-model based routing for n-dimensional graph could be a good fit for the target network. Since the n-dimensional graph makes turns and cycle-detections extremely complicated, the routing design requires further study, which is our future work. ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for their valuable comments that improved this paper and Dr. Akira Funahashi at ERATO-SORST Kitano Symbiotic Systems Project, Japan Science and Technology Agency for his helpful comments. REFERENCES [1] N.J. Boden et al., Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, vol. 15, no. 1, pp , [2] I.T.Association Infiniband Architecture. Specification Volume 1, Release 1.0.a. Available at the InfiniBand Trade Assoc., June [3] F. Petrini, W.C. Feng, A. Hoisie, S. Coll, and E. Frachtenberg, The Quadrics Network: High-Performance Clustering Technology, IEEE Micro, vol. 22, no. 1, pp , [4] M. Koibuchi, K. Watanabe, T. Otsuka, and H. Amano, Performance Evaluation of Deterministic Routings, Multicasts, and Topologies on RHiNET-2 Cluster, IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 8, pp , Aug [5] W.J. Dally and C.L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Trans. Computers, vol. 36, no. 5, pp , May [6] P. Kermani and L. Kleinrock, Virtual Cut-Through: A New Computer Communication Switching Techniques, Computer Networks, vol. 3, no. 4, pp , [7] T. Takahashi, S. Sumimoto, A. Hori, H. Harada, and Y. Ishikawa, PM2: High Performance Communication Middleware for Heterogeneous Network Environment, Proc. Supercomputing Conf. (SC 00), pp , Nov [8] C.J. Glass and L.M. Ni, The Turn Model for Adaptive Routing, Proc. Int l Symp. Computer Architecture, pp , [9] A.A. Chien and J.H. Kim, Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors, J. ACM, vol. 42, no. 1, pp , Jan [10] J. Duato, A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks, IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, pp , Oct [11] W.J. Dally and H. Aoki, Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels, IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp , Apr [12] F. Silla and J. Duato, High-Performance Routing in Networks of Workstations with Irregular Topology, IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 7, pp , July [13] T. Kudoh, S. Nishimura, J. Yamamoto, H. Nishi, O. Tatebe, and H. Amano, RHiNET: A Network for High Performance Parallel Computing Using Locally Distributed Computing, Proc. Int l Workshop Innovative Architecture (IWIA), pp , Nov [14] J.C. Martinez, J. Flich, A. Robles, P. Lopez, J. Duato, and M. Koibuchi, In-Order Packet Delivery in Interconnection Networks Using Adaptive Routing, Proc. IEEE Int l Parallel and Distributed Processing Symp., p. 101a, Apr [15] M. Koibuchi, J.C. Martinez, J. Flich, A. Robles, P. Lopez, and J. Duato, Enforcing In-Order Packet Delivery in System Area Networks with Adaptive Routing, J. Parallel and Distributed Computing, vol. 65, pp , Oct [16] S.L. Scott and G.T. Horson, The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Proc. Hot Interconnects Symp. IV, pp , Aug [17] W.J. Dally et al., Architecture and Implementation of the Reliable Router, Proc. Hot Interconnects Symp. II, Aug [18] J.C. Martinez, J. Flich, A. Robles, P. Lopez, and J. Duato, Supporting Adaptive Routing in IBA Switches, J. Systems Architecture, vol. 49, pp , [19] M.D. Schroeder et al., Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links, IEEE J. Selected Areas in Comm., vol. 9, pp , 1991.

JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR... 333 [20] J. Wu and L.

Duato, An Effective Methodology to Improve the Performance of the up*/down* Routing Algorithm, IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 8, pp. 740-754, Aug. 2004. [22] J. Flich, P.

Jouraku, and H. Amano, Descending Layers Routing: A Deadlock-Free Deterministic Routing Using Virtual Channels in System Area Networks with Irregular Topologies, Proc. Int l Conf.

14 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR [20] J. Wu and L. Sheng, Deadlock-Free Routing in Irregular Networks Using Prefix Routing, Proc. Parallel and Distributed Computing Systems Conf., pp , Aug [21] J.C. Sancho, A. Robles, and J. Duato, An Effective Methodology to Improve the Performance of the up*/down* Routing Algorithm, IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 8, pp , Aug [22] J. Flich, P. Lopez, J.C. Sancho, A. Robles, and J. Duato, Improving InfiniBand Routing through Multiple Virtual Networks, Proc. Int l Symp. High Performance Computing, pp , May [23] M. Koibuchi, A. Jouraku, and H. Amano, Descending Layers Routing: A Deadlock-Free Deterministic Routing Using Virtual Channels in System Area Networks with Irregular Topologies, Proc. Int l Conf. Parallel Processing, pp , Oct [24] M. Koibuchi, A. Funahashi, A. Jouraku, and H. Amano, L-Turn Routing: An Adaptive Routing in Irregular Networks, Proc. Int l Conf. Parallel Processing, pp , Sept [25] A. Jouraku, M. Koibuchi, A. Funahashi, and H. Amano, Routing Algorithms Based on 2D Turn Model for Irregular Networks, Proc. Int l Symp. Parallel Architectures, Algorithms, and Networks, pp , June [26] M. Koibuchi, A. Jouraku, and H. Amano, Path Selection Algorithm: The Strategy for Designing Deterministic Routing from Alternative Paths, Parallel Computing, vol. 31, no. 1, pp , Jan [27] E.W. Dijkstra, A Note on Two Problems in Connexion with Graphs, Numerische Math., vol. 1, pp , Oct [28] J.C. Sancho and A. Robles, Improving the up*/down* Routing Scheme for Networks of Workstations, Proc. European Conf. Parallel Computing, pp , Aug [29] S. Nishimura, T. Kudoh, H. Nishi, J. Yamamoto, K. Harasawa, N. Matsudaira, S. Akutsu, K. Tasho, and H. Amano, High-Speed Network Switch RHiNET-2/SW and Its Implementation with Optical Interconnections, Proc. Hot Interconnects Conf., pp , Aug [30] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, [31] W. Qiao, L.M. Ni, and T. Rokicki, Adaptive-Trail Routing and Performance Evaluation in Irregular Networks Using Cut- Through Switches, IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 11, pp , Nov [32] Y.M. Sun, Y.C. Chung, and T.Y. Huang, An Efficient Deadlock- Free Tree-Based Routing Algorithm for Irregular Wormhole- Routed Networks Based on the Turn Model, Proc. Int l Conf. Paralel Processing, pp , [33] J. Flich, P. Lopez, M.P. Malumbres, and J. Duato, Boosting the Performance of Myrinet Networks, IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 7, pp , July [34] T. Skeie, O. Lysne, and I. Theiss, Layered Shortest Path (LASH) Routing in Irregular System Area Networks, Proc. Int l Parallel and Distributed Processing Symp., pp , Apr [35] J.C. Sancho, A. Robles, J. Flich, P. Lopez, and J. Duato, Effective Methodology for Deadlock-Free Minimal Routing in Infiniband, Proc. Int l Conf. Parallel Processing, pp , Aug [36] J.M. Rubio, P. Lopez, and J. Duato, FC3D: Flow Control-Based Distributed Deadlock Detection Mechanism for True Fully Adaptive Routing in Wormhole Networks, IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 8, pp , Aug [37] Y.H. Song and T.M. Pinkston, Distributed Resolution of Network Congestion and Potential Deadlock Using Reservation-Based Scheduling, IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 8, pp , Aug [38] Akiya Jouraku received the BE and ME degrees from Keio University, Japan, in 1998 and He is currently a PhD candidate at Keio University. His research interests include the area of interconnection networks and parallel processing. Michihiro Koibuchi received the BE, ME, and PhD degrees from Keio University, Japan, in 2000, 2002, and He was a visiting researcher at the Technical University of Valencia, Spain, and a research fellow of the Japan Society for the Promotion of Science in He is currently an assistant professor at the National Institute of Informatics (NII), Japan. His research interests include the area of networks-on-chips, interconnection networks, and parallel processing. He is a member of the IEEE. Hideharu Amano received the PhD degree from Keio University He is currently a professor in the Department of Information and Computer Science, Keio University. His research interests include the area of parallel processing and reconfigurable systems. He is a member of the IEEE.. For more information on this or any other computing topic, please visit our Digital Library at

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática