NETWORK-BASED parallel processing using system area

Size: px
Start display at page:

Download "NETWORK-BASED parallel processing using system area"

Transcription

1 320 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 An Effective Design of Deadlock-Free Routing Algorithms Based on 2D Turn Model for Irregular Networks Akiya Jouraku, Michihiro Koibuchi, Member, IEEE, and Hideharu Amano, Member, IEEE Abstract System area networks (SANs), which usually accept arbitrary topologies, have been used to connect hosts in PC clusters. Although deadlock-free routing is often employed for low-latency communications using wormhole or virtual cut-through switching, the interconnection adaptivity introduces difficulties in establishing deadlock-free paths. An up*/down* routing algorithm, which has been widely used to avoid deadlocks in irregular networks, tends to make unbalanced paths as it employs a one-dimensional directed graph. The current study introduces a two-dimensional directed graph on which adaptive routings called left-up first turn (L-turn) routings and right-down last turn (R-turn) routings are proposed to make the paths as uniformly distributed as possible. This scheme guarantees deadlock-freedom because it uses the turn model approach, and the extra degree of freedom in the two-dimensional graph helps to ensure that the prohibited turns are well-distributed. Simulation results show that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes. The L-turn routings, which meet this condition, improve throughput by up to 100 percent compared with two up*/down*-based routings, and also reduce latency. Index Terms Adaptive routing, deadlock avoidance, turn model, irregular topologies, system area networks, interconnection networks, PC clusters. Ç 1 INTRODUCTION NETWORK-BASED parallel processing using system area networks (SANs) has been researched as potential cost-effective parallel-computing environments [1], [2], [3], [4]. SANs, which consist of switches connected with pointto-point links, are likely to provide low-latency highbandwidth communications like those of interconnection networks in massively parallel computers. SANs architectures use wormhole routing [5] or virtual cut-through [6] as their switching technique, and they achieve reliable communications at the hardware level with deadlock-free routing. Such communication simplifies the design of a system software stack including a lightweight communication library [7] which provides zero-copy or one-copy communication. Unlike the interconnection networks used in massively parallel computers, SANs usually accept arbitrary topologies so as to provide extensibility and dependability to cope with low-reliability commodity hosts. The interconnection adaptivity, however, makes it difficult to establish paths that are free of deadlocks. A deadlock-free routing algorithm is thus crucial for making efficient use of network resources, yet the current deadlock-free routing algorithms. A. Jouraku and H. Amano are with the Department of Information and Computer Science, Keio University, Hiyoshi, Kouhoku-ku, Yokohama , Japan. {jouraku, hunga}@am.ics.keio.ac.jp.. M. Koibuchi is with the Infrastructure Systems Research Division, National Institute of Informatics, National Center of Sciences, Hitotsubashi, Chiyoda-ku, Tokyo , Japan. koibuchi@nii.ac.jp. Manuscript received 12 June 2005; revised 7 Dec. 2005; accepted 20 Jan. 2006; published online 25 Jan Recommended for acceptance by S. Olariu. For information on obtaining reprints of this article, please send to: tpds@computer.org, and reference IEEECS Log Number TPDS in massively parallel computers with regular topologies [5], [8], [9], [10] cannot be directly employed in most cases. The following two strategies can be taken when a routing algorithm is designed: Deterministic routing takes a single path between hosts, and it guarantees in-order packet delivery between the same pair of hosts [5]. On the other hand, adaptive routing [8], [11], [9], [10], [12] dynamically selects the route of a packet in order to make the best use of bandwidth in interconnection networks. In adaptive routing, when a packet encounters a faulty or congested path, another bypassing path can be selected. Since this allows for a better balance of network traffic, adaptive routing improves throughput and latency. In spite of the adaptive routing s advantages, most current SANs [1], [2], [13] do not employ it. This is because it does not guarantee in-order packet delivery, which is required for some message-passing libraries, and the logic to dynamically select a single channel from among a set of alternatives might substantially add to the switch s complexity. However, simple sorting mechanisms for out-oforder packets in network interfaces have been researched [14], [15] and real parallel machines, such as the Cray T3E [16] or the Reliable Router [17], have shown the feasibility of adaptive routing. A simple method to support adaptivity in InfiniBand switches has also been proposed [18]. We thus consider that these switches will employ adaptive routing as well as interconnection networks in massively parallel computers. Adaptive-routing strategies to avoid deadlocks are classified into two approaches. The simpler approach removes cyclic channel dependencies in the channel dependency graph (CDG) [8], [11], [9]. The more complex one deals with cyclic channel dependencies by introducing /07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society

2 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR escape paths [10], [12]. The latter strategy is difficult to apply to arbitrary topologies with neither virtual channels nor buffers. On the other hand, the former strategy is usually based on a spanning tree for irregular networks [19], [20], [21], and it exploits the connectivity and acyclicity of the tree structure. In particular, up*/down* routing has been widely used to avoid deadlocks in SANs [19], [12], [22], [23], [21]. Up*/down* routing allocates a direction (up or down) to each channel, and it restricts packet transfer from the down direction to the up direction in order to guarantee deadlock-freedom. However, the algorithm tends to make unbalanced paths because it employs a onedimensional directed graph. In this study, we propose two turn model [8] based adaptive deadlock-free routing algorithms, called left-up first-turn (L-turn) routing and right-down last turn (R-turn) routing, that work by extending the dimension of the directed-graph used in up*/down* routing [24], [25]. The proposed routing algorithms do not require virtual channels like up*/down* routing. By taking advantage of the extra degree of freedom of a two-dimensional graph, the proposed routing algorithms set well-distributed routing restrictions which ensure deadlock-freedom. Since they avoid deadlocks by removing all cyclic channel dependencies, they can be deterministically implemented by selecting a single path for each source-destination pair [26]. These deterministic routings still provide better balanced paths because they use sophisticated path selection algorithms [26]. One of the algorithms has already been used for deterministic routing in the RHiNET-2 cluster [4]. The rest of this paper is organized as follows: Section 2 describes the L-turn and R-turn routings based on an extended dimensional graph and Section 3 describes an evaluation using a flit-level simulation. Section 4 discusses related work and Section 5 presents our conclusions. 2 L/R-TURN ROUTING ALGORITHMS 2.1 Motivation Up*/down* routing has been widely used to avoid deadlocks in SANs with arbitrary topologies using neither virtual channels nor buffers. Up*/down* routing is based on the assignment of direction to network channels [19]. As the basis of the assignment, a spanning tree whose nodes correspond to switches in the network is built. The up end of each channel is then defined as follows: 1) the end whose node is closer to the root in the spanning tree and 2) the end whose node has the lower unique identifier (UID), if both ends are on nodes at the same tree level. A legal path must traverse zero or more channels in the up direction followed by zero or more channels in the down direction, and this rule guarantees deadlock-freedom while still allowing all hosts to be reached. However, an up*/ down* routing algorithm tends to make unbalanced paths because it employs a one-dimensional directed graph. We logically demonstrate the unbalanced paths of up*/ down* routing from the turn-model point of view [8]. In the turn model, all directions of packet turns and their cycles in the target network are analyzed. Accordingly, a set of turns that are just sufficient to break all of the cycles is prohibited Fig. 1. Pairs of prohibited packet turns in up*/down* routing. (the details are in Section 2.3). Since only a one-dimensional direction, up or down, is included in the graph for up*/ down* routing, there are only two turns and one cycle. All cyclic dependencies between channels are thus broken by prohibiting a turn from the down direction to the up direction, and a pair of prohibited turns between two links is always formed, leading to the unbalanced paths. In Fig. 1, a pair of prohibited turns is formed at node B, and three pairs of prohibited turns are formed at node A. This concentration of prohibited turns would lead packets in the root direction, and the resulting heavy traffic around the root would cause congestion that would degrade the total throughput. Such concentration phenomena essentially stems from the simple classification of turns in the graph. That is, only one prohibited turn is sufficient to guarantee deadlockfreedom because only two directions and two turns are defined in the graph. To resolve this problem, we introduce a two-dimensional directed graph called an H/V graph. Since the H/V graph provides four directions and 12 turns, a prohibited turn set for deadlock-free routing can be selected adaptively and systematically, thereby achieving better traffic balance. Below, we describe our methodology to construct an H/V graph and new routing algorithms based on a twodimensional turn model on the H/V graph. Although the two-dimensional graph for irregular networks makes deadlock-avoidance complicated, we will investigate all cyclic-free turn sets. 2.2 Constructing an H/V Graph Building a BFS Spanning Tree The BFS spanning tree is built using the same method as in up*/down* routing. As in the case of a one-dimensional graph for up*/down* routing, a depth is assigned to each node and it is used to determine the vertical direction, i.e., up or down, of each channel. Definition 1 (depth). The depth of the node is the minimal distance from the root node. For example, Fig. 2a shows the assignment of depths to an irregular network with nine switches. Each node corresponds to a switch, and each link is a bidirectional physical channel. We call a link that belongs to a spanning tree a tree link and a link that does not belong to a spanning tree an outer link. As shown in the figure, the same depth can be assigned to different nodes.

3 322 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 Fig. 2. Constructing an H/V graph: (a) Depth, (b) depth and horizontal spread, and (c) H/V graph Assignment of Horizontal Spread to Each Node To construct a two-dimensional directed graph, we assign a horizontal spread to each node in addition to a depth and introduce a horizontal direction, i.e., left or right. Definition 2 (horizontal spread). A horizontal spread is assigned to each node by pre-order traversal of the spanning tree. An ascending integer which is incremented in the visiting order is assigned. In the next section, the horizontal spreads are used to determine the horizontal direction of each channel and the vertical direction of the channels between nodes having the same depth. After the horizontal spreads have been assigned, the two-dimensional coordinates of each node are determined. The coordinates of node N are represented as ðh; dþ, where h and d are the horizontal spread and the depth of node N, respectively. For example, Fig. 2b shows the assignment of horizontal spreads to the network shown in Fig. 2a. In Fig. 2b, each node has unique coordinates because the horizontal spread of each node is an ascending integer in the visiting order of the preorder traversal, and each child node has a larger horizontal spread than that of its parent node in the spanning tree. The latter feature is used to guarantee the path connectivity of the routing algorithms described in Section 2.3. Since two or more children nodes can be selected as the next-visit node in the preorder traversal, several selection policies can be applied. Various H/V graphs can thus be built from the same target network. Section 3 evaluates the effect of different selection policies Assignment of Directions to Channels The vertical and horizontal directions are assigned to each channel according to the two-dimensional coordinates of each node, and H/V directions are then introduced by combining them. To begin with, the horizontal direction, left or right, is assigned to each channel according to the following definition: Definition 3 (horizontal direction). The horizontal direction of the channel from ðx s ;y s Þ to ðx d ;y d Þ is determined as follows: 1) left is assigned if x s >x d ; 2) right is assigned if x s <x d. Next, the vertical direction, up or down, is assigned to each channel based on the following definition. Definition 4 (vertical direction). The vertical direction of the channel from ðx s ;y s Þ to ðx d ;y d Þ is determined as follows: 1) up is assigned if ðy s >y d Þ_ððy s ¼ y d Þ^ðx s <x d ÞÞ; and 2) down is assigned if ðy s <y d Þ_ððy s ¼ y d Þ^ðx s >x d ÞÞ. The H/V direction is assigned to each channel according to the following definition. Definition 5 (H/V direction). The H/V direction of each channel HV ðh; vþ is defined using the pair of horizontal ðhþ and vertical ðvþ directions as follows: 1) the left-up (LU) direction is assigned to HV ðleft; upþ; 2) the left-down (LD) direction is assigned to HV ðleft; downþ; 3) the right-up (RU) direction is assigned to HV ðright; upþ; and 4) the rightdown (RD) direction is assigned to HV ðright; downþ. The coordinates of nodes introduce a two-dimensional directed graph (an H/V graph) by virtue of assigning the H/V direction to each channel. In particular, the subgraph of the H/V graph that consists of channels in the spanning tree is called the H/V tree. Fig. 2c shows the H/V graph for the network in Fig. 2b. 2.3 Turn-Model-Based Routing Algorithms We devise the deadlock-free routing algorithms by breaking all possible cycles in the H/V graph. To do so, the algorithms are designed according to the turn model [8]. The original turn model methodology is based on the four steps [8]: 1. Identify all possible turns from one direction to another. 2. Identify all possible cycles that the turns can form. 3. Prohibit the minimum number of turns so that at least one turn is prohibited in each cycle. 4. Incorporate as many turns as possible without reintroducing cycles.

4 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR Fig. 3. All possible turns in the H/V graph. However, an H/V graph can include complicated cycles that are difficult to identify. This difficulty in turn causes difficulty with Steps 2 and 3. Thus, to find all possible cycles in the H/V graph, we combine Steps 2 and 3 as follows: As soon as a cycle is identified, a prohibited turn is chosen to break this cycle; the next cycle is then searched for under the condition that it is prohibited. Notice that to make the paths as uniformly distributed as possible, we add the following policy to select prohibited turns: The prohibited turn is distributed as uniformly as possible. Furthermore, corresponding to Step 4, we introduce an algorithm that identifies cycles including redundant prohibited turns in a target topology. In the following sections, we introduce the methodology for designing deadlock-free routing algorithms on the H/V graph by using the turn model with the above-described policies Preliminaries The following notation is introduced to represent routing algorithms: Definition 6 (Turn). T p dir;n dir represents the turn from a direction p dir to another direction n dir. Definition 7 (Turn dependency). TDðT i ;T j Þ represents the direct turn dependency between T i and T j in which T j is formed immediately after T i. Definition 8 (Cycle). CðT 0 ;T 1 ;...;T n 1 Þ represents the cycle formed by a turn dependency ftdðt i ;T j Þjj ¼ði þ 1Þ mod n; i ¼ 0; 1;...;n 1g: For example, there is a turn dependency TDðT up;down ; T down;up Þ and a cycle CðT up;down ;T down;up Þ in the one-dimensional directed graph for up*/down* routing Identifying All Possible Turns Fig. 3 shows all possible turns from an H/V direction to another H/V direction in the H/V graph. Since there are four H/V directions in the graph, there are 12 possible turns Identifying All Possible Cycles and Choosing the Prohibited Turns We identify all possible cycles formed by the turns shown in Fig. 3, and prohibit a minimum turn set in order to break all cycles. For better traffic balance, the prohibited turns are chosen so as not to concentrate traffic at specific nodes. To begin with, we identify simple cycles consisting of tree links and an outer link. Since a tree consists of n nodes connected by n 1 links, an outer link introduces a cycle. When an outer link is added to the H/V tree, two of four cycles shown in Fig. 4 are always generated. In Figs. 4a and 4b, nodes B and C are directly connected with a single outer link and have the same ancestor node A. Each channel between B and C has a different direction in the two subgraphs. The cycles in Fig. 4a are C 1 ðt LU;RD ;T RD;RU ;T RU;LU Þ and C 2 ðt LU;RD ; T RD;LD ;T LD;LU Þ. Those in Fig. 4b are C 3 ðt LU;RD ;T RD;LU Þ and C3 0 ðt LU;RD;T RD;LU Þ. Since C 3 and C3 0 are logically equivalent, we will consider only C 3. To break the three cycles, one of the turns in each cycle must be prohibited according to the following two policies: 1) do not prohibit T LU;RD and 2) select a set of prohibited turns Fig. 4. Four possible cycles in the H/V graph.

5 324 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 of the turns in Q 1 thus include one of the turns in P 1.In the same way, there are no cycles that include one in Q 2, but none in P 2. tu Fig. 5. Prohibited turns in the H/V graph (dotted lines are prohibited turns). so as not to concentrate traffic, as shown in Fig. 1. To guarantee connectivity, T LU;RD must not be prohibited because T LU;RD can be formed between spanning tree channels. Accordingly, ft LD;LU ;T RU;LU g or ft RD;RU ;T RD;LD g is chosen as the prohibited turns to break C 1 and C 2. As shown in Figs. 5a and 5b, the prohibited turns are well distributed. However, to break C 3, it is necessary to prohibit T RD;LU, which concentrates the prohibited turns as shown in Fig. 5c, because T LU;RD cannot be prohibited. Thus, one of the following two turn sets should be chosen for breaking the three cycles: P 1 ¼ ft LD;LU ;T RU;LU ;T RD;LU g; P 2 ¼ ft RD;RU ;T RD;LD ;T RD;LU g: Second, we identify the other cycles that do not include the above prohibited turn set (P 1 or P 2 ). Although three turns in P 1 or P 2 are prohibited, the other nine turns in Fig. 3 can still form cycles. The nine turns can be classified into two turn sets under the condition that P 1 is prohibited: Q 1 ¼fT LU;n dir j n dir 2fLD; RU; RDgg and Q 0 1 ¼fT p dir;n dir j p dir; n dir 2 fld; RU; RDg; p dir 6¼ n dirg. Similarly, to identify cycles that do not include a turn in P 2, we classify the other nine turns into Q 2 ¼fT p dir;rd j p dir 2fLU; LD; RUgg and Q 0 2 ¼ ft p dir;n dir j p dir; n dir 2fLU; LD; RUg; p dir 6¼ n dirg. Theorem 1. A cycle including a turn in Q 1 includes a turn in P 1 and a cycle including a turn in Q 2 includes a turn in P 2. Proof. Assume that there is a cycle such that a turn T x in Q 1 is included, but no turn in P 1 is included. Since T x is formed by a packet turn from the LU direction to another direction, a turn formed immediately before T x must be one of the turns in ft p dir;lu j p dir 2fLD; RU; RDgg. However, the turn set is equivalent to P 1, which contradicts the above assumption. Cycles including one Theorem 1 demonstrates that Q 1 does not form cycles when P 1 is prohibited. As a result, all possible cycles including a turn with the LU direction are broken and the remaining possible cycles consist of six turns in Q 0 1 that does not include ones in the LU direction. Theorem 1 also demonstrates that Q 2 does not form cycles when P 2 is prohibited. The remaining possible cycles thus only consist of turns in Q 0 2 when P 2 is prohibited. To show such cycles, we introduce a turn dependency graph (TDG) for Q 0 1 and Q0 2, as shown in Fig. 6. In the TDG, each node represents a turn and each arrow represents a direct turn dependency between two turns. All possible cycles formed by Q 0 1 are based on one of the four cycles shown in Fig. 6a as dotted cycles, namely, C 4 ðt RU;RD ;T RD;LD ;T LD;RU Þ; C 5 ðt RD;RU ;T RU;LD ;T LD;RD Þ; C 6 ðt LD;RU ;T RU;LD Þ; and C 7 ðt RD;RU ;T RU;RD ;T RD;LD ;T LD;RD Þ: Similarly, there are also four cycles formed by Q 0 2 in Fig. 6b: C 8 ðt LD;RU ;T RU;LU ;T LU;LD Þ; C 9 ðt LD;LU ;T LU;RU ;T RU;LD Þ; C 10 ðt LD;RU ;T RU;LD Þ; and C 11 ðt LD;LU ;T LU;RU ;T RU;LU ;T LU;LD Þ: Fig. 7 shows these cycles for each turn set. Notice that the cyclic turn dependencies TDðT RD;RU ; T RU;RD Þ and TDðT LU;LD ;T LD;LU Þ in Fig. 6 cannot form cycles in the H/V graph, because the turns in the dependencies never turn in the horizontal direction. The cyclic turn dependencies TDðT RD;LD ;T LD;RD Þ and TDðT LU;RU ;T RU;LU Þ also cannot form cycles for the similar reason. To break the four cycles formed by Q 0 1, P 1 0 ¼fT LD;RU; T LD;RD g or P1 00 ¼fT RU;LD;T RU;RD g can be chosen as the set of well-distributed prohibited turns as shown in Fig. 7a, when P 1 is prohibited. Although prohibiting T LD;RU or T RU;LD concentrates the prohibited turns, they are needed to break Fig. 6. Turn Dependency Graph (TDG): (a) The TDG for Q 0 1 and (b) the TDG for Q0 2.

6 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR Fig. 7. Four possible cycles: (a) Formed by Q 0 1 and (b) formed by Q0 2. cycle C 6. Consequently, one of the following two turn sets should be chosen as the prohibited turns. P 1 þ P1 0 ¼fT LD;LU;T RU;LU ;T RD;LU ;T LD;RU ;T LD;RD g; P 1 þ P1 00 ¼fT LD;LU ;T RU;LU ;T RD;LU ;T RU;LD ;T RU;RD g: All possible cycles with a turn including the LU direction are broken by P 1 and the other possible cycles are broken by P or P1. All possible cycles are thus broken by ðp 1 þ P1 0Þ or ðp 1 þ P1 00Þ. In almost the same way, the turn set P2 0 ¼fT LD;RU; T LU;RU g or P2 00 ¼fT RU;LD;T LU;LD g is prohibited as shown in Fig. 7b if P 2 is chosen instead of P 1. As a result, one of the following turn sets should be chosen: P 2 þ P2 0 ¼fT RD;LD;T RD;RU ;T RD;LU ;T LD;RU ;T LU;RU g; P 2 þ P2 00 ¼fT RD;LD ;T RD;RU ;T RD;LU ;T RU;LD ;T LU;LD g: Reducing Number of Prohibited Turns Four alternative sets of prohibited turns for breaking all possible cycles in the H/V graph were stated in the previous section. However, turns in P1 0, P 1 00, P ,orP2 do not always form cycles owing to another prohibited turn set, i.e., P 1 for P1 0, P 1 00, and P 2 for P2 0, P For example, in Fig. 8, T LD;RU and T LD;RD in P1 0 are prohibited. However, even if the two turns are permitted in the figure, there are still no cycles, since each cycle is broken by each prohibited turn in P 1. That is, depending on the target topology, some of the turns may be prohibited unnecessarily. To reduce such redundant prohibited turns, we introduce a traversal algorithm on the constructed H/V graph to detect the four cycles in Fig. 7 which do not include a turn in another prohibited turn set (P 1 or P 2 )ofp1 0, P 1 00, P ,orP2. It judges whether each turn in one of the four target turn sets form a cycle or not. That is, only the turns in the target turn set that form detected cycles are prohibited at the detected node in the target topology. The following describes the traversal algorithm for identifying the cycles that include a turn in P1 0 (i.e., P 1 is another prohibited turn set). The traversal algorithms for P1 00, P , and P2 share the same procedure except that another prohibited turn set (P 1,orP 2 ) and the target turn set (P1 00, P ,orP2) are interchanged, respectively. The traversal procedure consists of two searches. The first search is as follows: 1. Select a starting node from those connected to one or more output RU channels, and one or more output RD channels (forming T LD;RD in P 0 1 ). 2. Visit an adjacent node that can be reached via an output RD channel from the starting node. 3. The graph is then traversed in the order of depthfirst search under the following conditions:. the next channel is not the LU channel (i.e., does not form a turn in P 1 ) and. does not form a turn that has already been prohibited during the previous search. If the traversal process returns to the starting node via an output LD channel from an adjacent node, a cycle that includes T LD;RD is detected. As a result, T LD;RD which is from the last used LD channel to the first used RD channel is prohibited. 4. Repeat from Step 2 for all RD channels in the starting node. Then, repeat from Step 1 for all starting nodes. The second search is performed in the same way, except that the following conditions apply:. a starting node connects to two or more output RU channels (forming T LD;RU in P 0 1 ),. a channel used for the first visit is an output RU channel, and Fig. 8. Redundant prohibited turns. Fig. 9. Detected cycle by the traversal algorithm for P 0 1.

7 326 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 Fig. 10. The prohibited turn sets of the L-turn and the R-turn routings (dashed lines are prohibited turns and dotted lines are conditionally prohibited turns by the cycle detection algorithm). (a) The L-turn/. (b) The L-turn/. (c) The R-turn/. (d) The R-turn/.. when a cycle is detected, the turn T LD;RU is prohibited. Fig. 9 shows an example of a cycle detected by the traversal algorithm under the condition that the prohibited turn set is P 1 and the target turn set is P1 0. Although there are four starting nodes for the traversal algorithm, only one turn in P1 0 must be prohibited because the other turns do not lead to cycles. The computation cost for the algorithm is Oðn 2 lþ, where n is the number of nodes and l is the number of links per node Definition of Routing Algorithms Four routing algorithms based on the above prohibited turn sets are defined. A notation is introduced to express sets of prohibited turns for deadlock-free routing algorithms. Definition 9 (Prohibited turn set). A prohibited turn set is represented as DP ¼ DAðH;P;P cond Þ, where H, P, and P cond are a target H/V graph, prohibited turn set, and conditionally prohibited turn set by the cycle detection algorithm, respectively. Based on Definition 9, the four routing algorithms are defined as follows:. L-turn (Left-up first turn)/ routing prohibits DP la ¼ DAðH;P 1 ;P 0 1 Þ, where P 1 ¼fT LD;LU ;T RU;LU ;T RD;LU g is prohibited, and P 0 1 ¼fT LD;RU;T LD;RD g is conditionally prohibited.. L-turn/ routing prohibits DP lb ¼ DAðH;P 1 ;P1 00Þ, where P 1 is prohibited, and P1 00 ¼fT RU;LD;T RU;RD g is conditionally prohibited.. R-turn (Right-down last turn)/ routing prohibits DP ra ¼ DAðH;P 2 ;P2 0Þ, where P 2 ¼fT RD;RU ;T RD;LD ; T RD;LU g is prohibited, and P2 0 ¼fT LD;RU;T LU;RU g is conditionally prohibited.. R-turn/ routing prohibits DP rb ¼ DAðH;P 2 ;P2 00Þ, where P 2 is prohibited, and P2 00 ¼fT RU;LD;T LU;LD g is conditionally prohibited. Since all turns to the LU direction are prohibited in both L-turn routings, a packet must start out in that direction in order to reach the destination node in the LU direction. On the other hand, since all turns from the RD direction are prohibited in both R-turn routings, a packet must be lastly transferred in the RD direction in order to reach the destination node in the RD direction. Fig. 10 demonstrates the restrictions of the L-turn and R- turn routings. As shown in the figure, the L-turn and R-turn routings distribute prohibited turns more uniformly than up*/down* routing. Furthermore, Fig. 11 demonstrates the prohibited turns of L-turn routings and west-first turn model [24] in a 2D mesh. This figure shows that the L-turn routings are the same as in the west-first turn model. Theorem 2. The L-turn routings and the R-turn routings are deadlock-free. Proof. In the L-turn routings, all possible cycles that include the LU direction are broken by the prohibited turn set P 1 Fig. 11. Prohibited turns of the west-first turn model and the L-turn routings on 2D mesh (dotted lines are prohibited turns). (a) The west-first turn model. (b) The L-turn routings on 2D mesh.

8 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR and the other possible cycles are broken by the conditionally prohibited turn set P or P1. All possible cycles are thus broken. In the same way, we can show that there are no cycles in the R-turn routings. Therefore, the L-turn routings and the R-turn routings are deadlock-free. tu Theorem 3. The L-turn routings and the R-turn routings guarantee connectivity between every pair of nodes. Proof. A turn between tree links is always turn T LU;RD on the H/V graph. Since the turn T LU;RD is allowed in the L-turn routings and the R-turn routings, there is a path along the spanning tree between any pair of nodes. The connectivity between every pair of nodes is therefore guaranteed. tu As long as a packet travels so as not to form a prohibited turn, all possible paths (shortest or nonshortest) between every pair of nodes are available. However, a hot-spot is more likely to be formed when nonshortest paths are allowed in irregular networks. Accordingly, only the shortest paths should be taken. As in up*/down* routing, the proposed routing algorithms can implement such a path search simply. The following describes a path search algorithm for L-turn/ routing. This algorithm can be applied to the other proposed algorithms by changing its prohibited turn sets. 1. Prohibit all turns in P 1 on the H/V graph. 2. Prohibit those turns in P 0 1 that form any cycle detected by the cycle detection algorithm. 3. Search for the shortest paths between every pair of nodes by using the Dijkstra algorithm [27] under the conditions that 1) each channel has the same constant cost and 2) all channel transitions on prohibited turns are forbidden. Although the proposed algorithms are intended for adaptive routing, they can be implemented as deterministic routing by determining a single path between a sourcedestination pair as in up*/down* routing [28], [26]. 3 PERFORMANCE EVALUATION The performance of the proposed routings and the up*/ down* routings were evaluated in a flit-level network simulation. 3.1 Simulation Environment We developed a flit-level network simulator written in C++, on which we put switch-based networks using point-topoint links. Every switching fabric was assumed to provide the same number of ports (eight ports, such as RHiNET-2/ SW [29], were used) and the same number of hosts connected to every switch (four ports). The remaining four ports were connected to other switches. Two classes of network topologies, irregular and regular, were used. Twenty different irregular topologies were randomly generated under the condition that a single link connected two different switches. The regular topology was a twodimensional torus. The destination of a packet was determined by the traffic patterns, i.e., uniform or bitreversal; in the case of uniform traffic, a destination host is randomly selected. On the other hand, in the case of bitreversal traffic, a host with the identifier ða 0 ;a 1 ; ;a n 1 Þ sends a packet to the host whose identifier is the bit-reversal ða n 1 ; ;a 1 ;a 0 Þ of the source host. Each host injected a packet synchronized to the same interval, leading to burst traffic like that in most scientific applications [4]. The switching fabric was a simple model consisting of channel buffers, a crossbar, link controllers, and control circuits. The delay of recent SAN switches is several hundred nanoseconds, and the optical link delay is quite small (dozens of nanoseconds) [4]. Thus, the simulation assumed that the header flit transfer required at least 23 clock cycles, that is, 21 clock cycles for the switch delay, and the remaining two clock cycles for the link delay in the simulation. Every switch used virtual cut-through as the switching technique. There were no virtual channels like in Myrinet, because the proposed routing algorithms are designed for networks without virtual channels. The performance of tree-based routing algorithms is influenced by the algorithm that builds the spanning tree. A commonly used algorithm to build the tree is the breadth-first search (BFS), which is used in the up*/down* routing of Autonet [19]. The L-turn and R-turn routings use this algorithm for constructing the H/V graph. Sancho et al. proposed a more sophisticated tree algorithm that is based on a depth-first search (DFS) with a heuristic rule for up*/down* routing [21]. Thus, in the simulation, we compared up*/ down* routings using BFS or DFS with the heuristic rule with the proposed L-turn and R-turn routings. To select the root node of each spanning tree, each routing algorithm employed the crossing-path-based heuristic rule proposed by Sancho et al. [21]. The heuristic rule usually improves the throughput compared with the simple root selection policy of the Autonet in which the switch with a unique identifier zero is chosen as the root. The Sancho et al. s root selection rule would also be efficient in the case of the L-turn and R-turn routings since the rule is based on common performance measures, i.e., average hops and crossing paths. To evaluate the impact of the root selection policy on the performance of the proposed routings, we also evaluated the simple root selection policy of the Autonet in Section 3.2. In the L-turn and R-turn routings, different H/V graphs can be constructed from a BFS tree because the coordinates of each node are determined by the next-visit node selection policy of the preorder traversal for assigning horizontal spreads. To investigate the impact of the selection policy, we evaluated three next-visit node selection policies for the preorder traversal;. More upper-channel first visits first the neighboring node with the largest number of up-direction outer channels.. Less child-node first visits first the neighboring node with the smallest subtree whose root is the neighboring node.. More child-node first visits first the neighboring node with the largest subtree whose root is the neighboring node. Unless mentioned otherwise, the following assumes that the simulation used more upper-channel first, whose throughput is usually better.

9 328 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 TABLE 1 Average Throughputs [Flits/Cycle/Host] and Their Standard Deviation of Adaptive Routings on Irregular Topologies Each adaptive routing employed a simple output selection function that randomly selects an available output port. On the other hand, because current SANs, such as Myrinet, support only deterministic routings, we also evaluated the proposed routing algorithms with source routing (deterministic routing). We used Sancho et al. s algorithm [21], which is based on a static analysis of routing paths, to determine a single path between each sourcedestination pair. The simulation time was 500,000 clock cycles and the packet length was 128 flits. 3.2 Simulation Results for Adaptive Routing Table 1 lists the average throughputs of six adaptive routings on 20 different irregular topologies and their standard deviations (SD). We consider that the most important performance metric is throughput. Accepted traffic is the flit reception rate, which is measured in flits per clock cycle per host [30], and the throughput is the maximum accepted traffic. L-turn/, (Root #0) represents the L-turn/ routing with the root selection policy of the Autonet. The table shows that the L-turn routings (except for L-turn/, (Root #0)) achieve the highest throughputs for each condition. In particular, regarding bit-reversal traffic on 64-switch network, they achieve an approximate 22 percent improvement in throughput compared with up*/down* routings. In contrast, the R-turn routings have the worst throughputs. In particular, regarding bit-reversal traffic on 64-switch network, their throughput is approximately 20 percent lower than that of up*/down* routings. Fig. 12 shows the relation between the accepted traffic and the average latency of six routing algorithms on 16 and 64-switch irregular topologies (under bit-reversal traffic and uniform traffic, respectively) which provide nearly average relative performance. The L-turn routings achieve the lowest latency on both networks. Since the throughput and latency of L-turn/ (using Sancho et al. s heuristic rule) are better than those of L-turn/, (Root #0) (not using the rule), it can be said that the selection policy of the root node significantly affects the performance of the proposed routings. Table 2 shows the performance factors of each routing algorithm on 20 different irregular topologies in terms of the average path hops (DIS), average number of prohibited Fig. 12. Accepted traffic versus latency of adaptive routings on irregular topologies. (a) 16-Switch, bit-reverssal traffic. (b) 64-Switch, uniform traffic.

10 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR TABLE 2 The Average Distance (DIS), the Average Value of the Number of Prohibited Turns per Switch (PT), Its Standard Deviation (SD), the Average Number of Crossing Routing Paths on Up Channels (CPUP), and Those on Down Channels (CPDW) of Adaptive Routings on Irregular Topologies turns per switch (PT), standard deviation of the PT (SDPT), average number of crossing routing paths on up channels (CPUP) and those on down channels (CPDW). The SDPT shows how uniformly the prohibited turns are distributed, and a smaller value is better. The crossing routing path is the number of source-destination pairs that passes through a channel, and it indicates the potential channel load. Thus, the CPUP and CPDW show the approximate tendency in which the traffic is more likely to be distributed, i.e., upward (toward the root node) or downward (toward the leaf nodes). The traffic would be distributed toward the root node (leaf nodes) when CPUP (CPDW) is larger than CPDW (CPUP). Since the table shows that all the proposed routings have smaller SDPTs than either up*/down* routings have, it can be said that they all can distribute the prohibited turns more uniformly than the up*/down* routings. However, the balances of CPUP and CPDW show that the L-turn routings would favor distributing the traffic toward leaf nodes, whereas the R-turn routings would favor the root node. Thus, the results in Table 1, Table 2, and Fig. 12 make it clear that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes. Only L-turn routings, which also achieve the highest throughput, meet this condition. Although the R-turn routings distribute prohibited turns as uniformly as L-turn routings, their traffic is more likely to be distributed toward the root node, leading to poorer throughput. The reason why their traffic is distributed toward the root node is as follows: The R-turn routings allow all turns toward the LU direction except for T RD;LU, which leads packets toward the root node. On the other hand, they also prohibit all turns from the RD direction. This restriction restrains packet transfer toward the leaf nodes. For example, Fig. 13 shows the difference in the distributions of CPUP between the L-turn/ and the R-turn/ on a 4 4 2D mesh. The CPUP of R-turn/, especially along tree channels to the root node, is much larger than that of L-turn/. In such a situation, traffic tends to concentrate around the root node, and this degrades the throughput of the R-turn routings. Furthermore, Table 1 and Table 2 indicate that the L-turn/ and L-turn/ achieve almost the same performance. The reason is as follows: The difference between both L-turn routings is only the turn set of conditionally prohibited turns. However, the number of prohibited turns belonging to the conditionally prohibited turn set and the uniformity of their distributions are, on average, almost the same on irregular networks. Thus, this difference does not cause a significant performance gap between these routings on irregular networks. The same considerations apply to the R-turn routings. Table 1 also shows that each routing algorithm achieves higher throughput under bit-reversal traffic than under uniform traffic. This is because, under uniform traffic, packets whose source hosts are different may collide at a consumption channel on the destination host. Such collisions drastically degrade the performance, especially when the network is a small one. On the other hand, regarding bit-reversal traffic, such collisions at the consumption channel do not occur except for packets whose source hosts are the same. For example, in the throughput evaluation of the L-turn/ routing on the 16-switch irregular networks, the frequency of such collisions for uniform traffic is approximately 90 times higher on average than that for bit-reversal traffic. Now, we focus on the impact of the next-visit node selection policy of pre-order traversal, which determines Fig. 13. CPUP of the L-turn/ and R-turn/ on 4 4 2D mesh.

11 330 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 TABLE 3 Average Throughputs [Flits/Cycle/Host] and Their Standard Deviations of the L-Turn/ Routing with Three Next-Visit Node Selection Policies of Preorder Traversal on 64-Switch Irregular Topologies with Uniform Traffic the coordinates of the H/V graph. Table 3 lists the average throughputs and their standard deviations (SD) of L-turn/ routing with three next-visit node selection policy under uniform traffic on 20 different 64-switch irregular topologies. As shown in Table 3, the more upper-channel first policy improves throughput by up to 7 percent compared with the less child-node first policy because it tends to assign more outer links (channels) for the RU/LD direction, which would avoid the concentrated prohibited turns by T RD;LU. Fig. 14 shows the relation between the accepted traffic and the average latency of six routing algorithms on an 8 8 2D torus. Such an evaluation on a regular topology is important because SAN s topology may have iterative or hierarchical structures rather than completely irregular ones. As shown in the figure, the L-turn routings achieve higher throughput and lower latency than the others. In particular, for bit-reversal traffic, L-turn/ has approximately 100 percent higher throughput compared with up*/ down* routings. Thus, it can be said that the L-turn routings are also advantageous on topologies with some regularity. However, we can see a large performance gap between the L-turn/ and routings on the 2D torus. This is because, the difference, which is about the number of prohibited turns belonging to the conditionally prohibited turn set and their uniformity of distribution, between L-turn routings is more likely to be larger on regular networks than that on irregular networks. As shown in Fig. 14, which of the L-turn routings is much better on regular networks depends on conditions such as the traffic patterns. The same considerations also apply to the R-turn routings. Table 1 and Fig. 14 show the approximate tendency such that the relative performance improvement of the L-turn routings grows as network size increases or when bitreversal traffic is used instead of uniform traffic, whereas that of R-turn routings becomes worse as network size increases. 3.3 Simulation Results for Source Routing Table 4 and Fig. 15 show simulation results in the case that the proposed routing algorithms are implemented in the source routing on 16 or 64-switch irregular networks and an 8 8 2D torus, respectively. Like the adaptive routing results, the L-turn routings achieve the highest throughputs, whereas the R-turn routings have the worst throughputs. As shown in Table 1 and Table 4, the throughputs of most adaptive routings are slightly higher than those of the source routings on irregular topologies. We consider that, in irregular networks, the advantage of source routings, which is the efficient traffic balancing by Sancho et al. s algorithm, is almost equal to that of adaptive routings. On the other hand, on a 2D torus, the throughput of most adaptive routings are higher than those of the source routings, as shown in Fig. 14 and Fig. 15. In particular, the throughput of the L-turn/ with adaptive routing is approximately 60 percent higher than that of source routing. We consider that, on a 2D torus, the advantage of adaptive routings, which is the efficient channel utilization by using multiple available paths, is more effective because the 2D torus provides a larger number of paths between pairs of nodes than the irregular networks provide. Table 4 and Fig. 15 show the approximate tendency of the relative performance improvement (degradation) of the L-turn routings (R-turn routings) to become larger (smaller) when the network size becomes smaller or when uniform traffic is used instead of bit-reversal traffic. 4 RELATED WORK There are two strategies, i.e., acyclic channel dependencies, and cyclic channel dependencies with escape paths, for deadlock avoidance-based routing on irregular networks. Fig. 14. Accepted traffic versus latency of adaptive routings on 8 8 2D torus. (a) Uniform traffic. (b) Bit-reversal traffic.

12 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR TABLE 4 Average Throughputs [Flits/Cycle/Host] and Their Standard Deviations of Source Routings on Irregular Topologies Fig. 15. Accepted traffic versus latency of source routings on 8 8 2D torus. (a) Uniform traffic and (b) bit-reversal traffic. The simple approach used for SANs which requires no virtual channels is up*/down* routing as described in Section 2.1 and Section 3.1. The adaptive-trail routing proposed by Qiao and Ni [31] is another method without virtual channels. The method is based on to compute Eulerian trails and establish two unidirectional adaptivetrails which achieve deadlock-free and increased routing adaptivity. However, the method cannot be applied in some topologies, because the necessary and sufficient condition for an Eulerian trail to exist on a graph is that all vertices have even degrees or exactly two vertices have odd degrees. Recently, Sun et al. have proposed DOWN/UP routing [32] which is the turn model-based routing algorithm (i.e., requires no virtual channels) for irregular networks. DOWN/UP routing is based on our previously proposed strategy [24], [25], such as the turn model methodology using a two-dimensional directed graph (the H/V graph) and cycle detection algorithm for irregular networks. In DOWN/UP routing, the number of directions and turns in two-dimensional directed graph are increased by distinguishing the directions of tree links and outer links as different directions, and the packet must go downward (upward) then go upward (downward) when turning between outer (tree) links. Like L-turn routings, the basic policy of DOWN/UP routing is to push the traffic toward the leaf nodes as much as possible. The other approaches are based on using virtual channels or hardware support to improve throughput. When intermediate hosts provide buffers for routing, a true minimal path can be implemented [33]. This approach breaks all cycles by storing packets and reinjecting them later at some intermediate hosts. When an SAN provides virtual channels, multiple up*/down* trees [22], layered shortest-path (LASH) routing [34], layered InfiniBand routing [35], and descending layered (DL) routing [23] can be applied. Flich et al. have proposed the InfiniBand routing based on multiple up*/down* trees whose roots are different, and a packet is forwarded on a single tree-based graph [22] to avoid all cycles. The LASH routing [34] guarantees minimal paths by dividing the physical network into a set of virtual layers, and each path is assigned to a single virtual layer. Minimal InfiniBand routing proposed by Sancho et al. [35] adopts up*/down* routing to make acyclic virtual networks (layers). On the other hand, the adaptive escape-path routing proposed by Silla and Duato allows cyclic dependencies [12]. Since each packet in channels out of the escape path is forwarded along a minimal path, most packets take minimal paths. Another approach to cope with deadlocks is deadlock recovery-based routing, which usually employs minimal fully adaptive routing. It is useful only when deadlocks are infrequent and, recently, the techniques which are applicable

13 332 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 3, MARCH 2007 for efficient deadlock recovery-based routing on irregular networks have been proposed. FC3D by Rubio et al. [36] is the flow control-based distributed deadlock detection mechanism that uses only local router information. The mechanism is based on the use of the flow control information which is provided at each router, and tries to detect as few deadlocked messages as possible when a deadlock configuration is reached. FC3D detects all possible deadlocks while reducing the recovery overheads and the probability of false deadlock detection. Song and Pinkston have proposed the reservationbased distributed detection and resolution mechanism for network congestion and potential deadlock [37]. It precisely identifies the congestion configuration or potential deadlock by propagating pinging control packet over congested resources and effectively disperses the detected congestion by providing available resources for blocked packets which form the congestion. These techniques make it possible to effectively exploit the flexibility provided by minimal fully adaptive routing. However, these techniques need additional hardware at each router. Although a large number of deadlock avoidance-based or recovery-based routings have improved network latency and throughput compared with up*/down* routing by introducing virtual channels or hardware support, they would limit the applicability of network architectures or technologies [21]. For example, virtual channels are not always to be employed in current SANs [38]. Owing to its high portability, up*/down* routing is thus still used to avoid deadlocks [21]. In this study, we have focused on a simple routing strategy that requires no virtual channels, like up*/down* routing. 5 CONCLUSIONS System area networks (SANs), which usually accept arbitrary topologies, have been used to connect hosts in PC clusters. Although up*/down* routing has been widely used to avoid deadlocks in SANs, it tends to make unbalanced paths, because it employs a one-dimensional directed graph. In this study, a two-dimensional directed graph is introduced, and adaptive routings, called left-up first-turn (L-turn) routings and right-down last-turn (R-turn) routings are proposed to make the paths as uniformly distributed as possible. These routings guarantee deadlock-freedom because they use the turn model approach, and prohibited turns are welldistributed by taking advantage of extra degree of freedom afforded by a two-dimensional graph. They can also be deterministically implemented by using path selection algorithms. Simulation results show that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes. The L-turn routings, which meet this condition, improve throughput by up to 100 percent compared with two up*/down*-based routings, and also reduce latency. The turn-model-based routings for irregular networks can be extended for n-dimensional graphs by following the strategy explained in Section 3. When there are a large number of channels with different directions to neighboring switches in each switch, the turn-model based routing for n-dimensional graph could be a good fit for the target network. Since the n-dimensional graph makes turns and cycle-detections extremely complicated, the routing design requires further study, which is our future work. ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for their valuable comments that improved this paper and Dr. Akira Funahashi at ERATO-SORST Kitano Symbiotic Systems Project, Japan Science and Technology Agency for his helpful comments. REFERENCES [1] N.J. Boden et al., Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, vol. 15, no. 1, pp , [2] I.T.Association Infiniband Architecture. Specification Volume 1, Release 1.0.a. Available at the InfiniBand Trade Assoc., June [3] F. Petrini, W.C. Feng, A. Hoisie, S. Coll, and E. Frachtenberg, The Quadrics Network: High-Performance Clustering Technology, IEEE Micro, vol. 22, no. 1, pp , [4] M. Koibuchi, K. Watanabe, T. Otsuka, and H. Amano, Performance Evaluation of Deterministic Routings, Multicasts, and Topologies on RHiNET-2 Cluster, IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 8, pp , Aug [5] W.J. Dally and C.L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Trans. Computers, vol. 36, no. 5, pp , May [6] P. Kermani and L. Kleinrock, Virtual Cut-Through: A New Computer Communication Switching Techniques, Computer Networks, vol. 3, no. 4, pp , [7] T. Takahashi, S. Sumimoto, A. Hori, H. Harada, and Y. Ishikawa, PM2: High Performance Communication Middleware for Heterogeneous Network Environment, Proc. Supercomputing Conf. (SC 00), pp , Nov [8] C.J. Glass and L.M. Ni, The Turn Model for Adaptive Routing, Proc. Int l Symp. Computer Architecture, pp , [9] A.A. Chien and J.H. Kim, Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors, J. ACM, vol. 42, no. 1, pp , Jan [10] J. Duato, A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks, IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 10, pp , Oct [11] W.J. Dally and H. Aoki, Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels, IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp , Apr [12] F. Silla and J. Duato, High-Performance Routing in Networks of Workstations with Irregular Topology, IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 7, pp , July [13] T. Kudoh, S. Nishimura, J. Yamamoto, H. Nishi, O. Tatebe, and H. Amano, RHiNET: A Network for High Performance Parallel Computing Using Locally Distributed Computing, Proc. Int l Workshop Innovative Architecture (IWIA), pp , Nov [14] J.C. Martinez, J. Flich, A. Robles, P. Lopez, J. Duato, and M. Koibuchi, In-Order Packet Delivery in Interconnection Networks Using Adaptive Routing, Proc. IEEE Int l Parallel and Distributed Processing Symp., p. 101a, Apr [15] M. Koibuchi, J.C. Martinez, J. Flich, A. Robles, P. Lopez, and J. Duato, Enforcing In-Order Packet Delivery in System Area Networks with Adaptive Routing, J. Parallel and Distributed Computing, vol. 65, pp , Oct [16] S.L. Scott and G.T. Horson, The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Proc. Hot Interconnects Symp. IV, pp , Aug [17] W.J. Dally et al., Architecture and Implementation of the Reliable Router, Proc. Hot Interconnects Symp. II, Aug [18] J.C. Martinez, J. Flich, A. Robles, P. Lopez, and J. Duato, Supporting Adaptive Routing in IBA Switches, J. Systems Architecture, vol. 49, pp , [19] M.D. Schroeder et al., Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links, IEEE J. Selected Areas in Comm., vol. 9, pp , 1991.

14 JOURAKU ET AL.: AN EFFECTIVE DESIGN OF DEADLOCK-FREE ROUTING ALGORITHMS BASED ON 2D TURN MODEL FOR IRREGULAR [20] J. Wu and L. Sheng, Deadlock-Free Routing in Irregular Networks Using Prefix Routing, Proc. Parallel and Distributed Computing Systems Conf., pp , Aug [21] J.C. Sancho, A. Robles, and J. Duato, An Effective Methodology to Improve the Performance of the up*/down* Routing Algorithm, IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 8, pp , Aug [22] J. Flich, P. Lopez, J.C. Sancho, A. Robles, and J. Duato, Improving InfiniBand Routing through Multiple Virtual Networks, Proc. Int l Symp. High Performance Computing, pp , May [23] M. Koibuchi, A. Jouraku, and H. Amano, Descending Layers Routing: A Deadlock-Free Deterministic Routing Using Virtual Channels in System Area Networks with Irregular Topologies, Proc. Int l Conf. Parallel Processing, pp , Oct [24] M. Koibuchi, A. Funahashi, A. Jouraku, and H. Amano, L-Turn Routing: An Adaptive Routing in Irregular Networks, Proc. Int l Conf. Parallel Processing, pp , Sept [25] A. Jouraku, M. Koibuchi, A. Funahashi, and H. Amano, Routing Algorithms Based on 2D Turn Model for Irregular Networks, Proc. Int l Symp. Parallel Architectures, Algorithms, and Networks, pp , June [26] M. Koibuchi, A. Jouraku, and H. Amano, Path Selection Algorithm: The Strategy for Designing Deterministic Routing from Alternative Paths, Parallel Computing, vol. 31, no. 1, pp , Jan [27] E.W. Dijkstra, A Note on Two Problems in Connexion with Graphs, Numerische Math., vol. 1, pp , Oct [28] J.C. Sancho and A. Robles, Improving the up*/down* Routing Scheme for Networks of Workstations, Proc. European Conf. Parallel Computing, pp , Aug [29] S. Nishimura, T. Kudoh, H. Nishi, J. Yamamoto, K. Harasawa, N. Matsudaira, S. Akutsu, K. Tasho, and H. Amano, High-Speed Network Switch RHiNET-2/SW and Its Implementation with Optical Interconnections, Proc. Hot Interconnects Conf., pp , Aug [30] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, [31] W. Qiao, L.M. Ni, and T. Rokicki, Adaptive-Trail Routing and Performance Evaluation in Irregular Networks Using Cut- Through Switches, IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 11, pp , Nov [32] Y.M. Sun, Y.C. Chung, and T.Y. Huang, An Efficient Deadlock- Free Tree-Based Routing Algorithm for Irregular Wormhole- Routed Networks Based on the Turn Model, Proc. Int l Conf. Paralel Processing, pp , [33] J. Flich, P. Lopez, M.P. Malumbres, and J. Duato, Boosting the Performance of Myrinet Networks, IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 7, pp , July [34] T. Skeie, O. Lysne, and I. Theiss, Layered Shortest Path (LASH) Routing in Irregular System Area Networks, Proc. Int l Parallel and Distributed Processing Symp., pp , Apr [35] J.C. Sancho, A. Robles, J. Flich, P. Lopez, and J. Duato, Effective Methodology for Deadlock-Free Minimal Routing in Infiniband, Proc. Int l Conf. Parallel Processing, pp , Aug [36] J.M. Rubio, P. Lopez, and J. Duato, FC3D: Flow Control-Based Distributed Deadlock Detection Mechanism for True Fully Adaptive Routing in Wormhole Networks, IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 8, pp , Aug [37] Y.H. Song and T.M. Pinkston, Distributed Resolution of Network Congestion and Potential Deadlock Using Reservation-Based Scheduling, IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 8, pp , Aug [38] Akiya Jouraku received the BE and ME degrees from Keio University, Japan, in 1998 and He is currently a PhD candidate at Keio University. His research interests include the area of interconnection networks and parallel processing. Michihiro Koibuchi received the BE, ME, and PhD degrees from Keio University, Japan, in 2000, 2002, and He was a visiting researcher at the Technical University of Valencia, Spain, and a research fellow of the Japan Society for the Promotion of Science in He is currently an assistant professor at the National Institute of Informatics (NII), Japan. His research interests include the area of networks-on-chips, interconnection networks, and parallel processing. He is a member of the IEEE. Hideharu Amano received the PhD degree from Keio University He is currently a professor in the Department of Information and Computer Science, Keio University. His research interests include the area of parallel processing and reconfigurable systems. He is a member of the IEEE.. For more information on this or any other computing topic, please visit our Digital Library at

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

FOLLOWING the introduction of networks of workstations

FOLLOWING the introduction of networks of workstations IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 1, JANUARY 2006 51 Layered Routing in Irregular Networks Olav Lysne, Member, IEEE, Tor Skeie, Sven-Arne Reinemo, and Ingebjørg Theiss

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY 2003 1 Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing José Flich, Member, IEEE, Pedro López, Member, IEEE Computer

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Tree-turn routing: an efficient deadlock-free routing algorithm for irregular networks

Tree-turn routing: an efficient deadlock-free routing algorithm for irregular networks J Supercomput (2012) 59:882 900 DOI 10.1007/s11227-010-0477-0 Tree-turn routing: an efficient deadlock-free routing algorithm for irregular networks Jiazheng Zhou Yeh-Ching Chung Published online: 9 September

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ P. López, J. Flich and J. Duato Dept. of Computing Engineering (DISCA) Universidad Politécnica de Valencia, Valencia, Spain plopez@gap.upv.es

More information

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ J. Flich, P. López, M. P. Malumbres, and J. Duato Dept. of Computer Engineering

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

MESH-CONNECTED networks have been widely used in

MESH-CONNECTED networks have been widely used in 620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,

More information

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip A Lightweight Fault-Tolerant Mechanism for Network-on-Chip Michihiro Koibuchi 1, Hiroki Matsutani 2, Hideharu Amano 2, and Timothy Mark Pinkston 3 1 National Institute of Informatics, 2-1-2, Hitotsubashi,

More information

An Examination of Routing Algorithms for Parallel Computing Environments

An Examination of Routing Algorithms for Parallel Computing Environments A case can be made that the Achilles heel of parallel processing networks and clusters is that they all have to deal with the unavoidable problem of communication over the System Area Network. In distributed

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing

Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing 808 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 12, NO. 8, AUGUST 2001 Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing Ram Kesavan and Dhabaleswar

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The

More information

Escape Path based Irregular Network-on-chip Simulation Framework

Escape Path based Irregular Network-on-chip Simulation Framework Escape Path based Irregular Network-on-chip Simulation Framework Naveen Choudhary College of technology and Engineering MPUAT Udaipur, India M. S. Gaur Malaviya National Institute of Technology Jaipur,

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ A First Implementation of In-Transit Buffers on Myrinet GM Software Λ S. Coll, J. Flich, M. P. Malumbres, P. López, J. Duato and F.J. Mora Universidad Politécnica de Valencia Camino de Vera, 14, 46071

More information

EE 382C Interconnection Networks

EE 382C Interconnection Networks EE 8C Interconnection Networks Deadlock and Livelock Stanford University - EE8C - Spring 6 Deadlock and Livelock: Terminology Deadlock: A condition in which an agent waits indefinitely trying to acquire

More information

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia

More information

A Case for Random Shortcut Topologies for HPC Interconnects

A Case for Random Shortcut Topologies for HPC Interconnects A Case for Random Shortcut Topologies for HPC Interconnects Michihiro Koibuchi National Institute of Informatics / SOKENDAI 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, JAPAN 11-843 koibuchi@nii.ac.jp Hiroki

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

IN a mobile ad hoc network, nodes move arbitrarily.

IN a mobile ad hoc network, nodes move arbitrarily. IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 5, NO. 6, JUNE 2006 609 Distributed Cache Updating for the Dynamic Source Routing Protocol Xin Yu Abstract On-demand routing protocols use route caches to make

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC

More information

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing In-Order Packet Delivery in Interconnection Networks using Adaptive Routing J.C. Martínez, J. Flich, A. Robles, P. López, and J. Duato Dept. of Computer Engineering Universidad Politécnica de Valencia

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Routing and Deadlock

Routing and Deadlock 3.5-1 3.5-1 Routing and Deadlock Routing would be easy...... were it not for possible deadlock. Topics For This Set: Routing definitions. Deadlock definitions. Resource dependencies. Acyclic deadlock free

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Tightly-Coupled Multi-Layer Topologies for 3-D NoCs

Tightly-Coupled Multi-Layer Topologies for 3-D NoCs Tightly-Coupled Multi-Layer Topologies for -D NoCs Hiroki Matsutani, Michihiro Koibuchi, and Hideharu Amano Keio University National Institute of Informatics -4-, Hiyoshi, Kohoku-ku, Yokohama, --, Hitotsubashi,

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

Graphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College

Graphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College Laura Toma Algorithms (csci2200), Bowdoin College Undirected graphs Concepts: connectivity, connected components paths (undirected) cycles Basic problems, given undirected graph G: is G connected how many

More information

EECS 578 Interconnect Mini-project

EECS 578 Interconnect Mini-project EECS578 Bertacco Fall 2015 EECS 578 Interconnect Mini-project Assigned 09/17/15 (Thu) Due 10/02/15 (Fri) Introduction In this mini-project, you are asked to answer questions about issues relating to interconnect

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Deterministic versus Adaptive Routing in Fat-Trees

Deterministic versus Adaptive Routing in Fat-Trees Deterministic versus Adaptive Routing in Fat-Trees C. Gómez, F. Gilabert, M.E. Gómez, P. López and J. Duato Dept. of Computer Engineering Universidad Politécnica de Valencia Camino de Vera,, 07 Valencia,

More information

in Oblivious Routing

in Oblivious Routing Static Virtual Channel Allocation in Oblivious Routing Keun Sup Shim, Myong Hyon Cho, Michel Kinsy, Tina Wen, Mieszko Lis G. Edward Suh (Cornell) Srinivas Devadas MIT Computer Science and Artificial Intelligence

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

Data Structures Question Bank Multiple Choice

Data Structures Question Bank Multiple Choice Section 1. Fundamentals: Complexity, Algorthm Analysis 1. An algorithm solves A single problem or function Multiple problems or functions Has a single programming language implementation 2. A solution

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits

Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits Computer Science Department Technical Report #TR050021 University of California, Los Angeles, June 2005 Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits Yoshio Turner and Yuval

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

A Comparison of Allocation Policies in Wavelength Routing Networks*

A Comparison of Allocation Policies in Wavelength Routing Networks* Photonic Network Communications, 2:3, 267±295, 2000 # 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. A Comparison of Allocation Policies in Wavelength Routing Networks* Yuhong Zhu, George

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Efficient Bufferless Packet Switching on Trees and Leveled Networks

Efficient Bufferless Packet Switching on Trees and Leveled Networks Efficient Bufferless Packet Switching on Trees and Leveled Networks Costas Busch Malik Magdon-Ismail Marios Mavronicolas Abstract In bufferless networks the packets cannot be buffered while they are in

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Dynamic Network Reconfiguration for Switch-based Networks

Dynamic Network Reconfiguration for Switch-based Networks Dynamic Network Reconfiguration for Switch-based Networks Ms. Deepti Metri 1, Prof. A. V. Mophare 2 1Student, Computer Science and Engineering, N. B. N. Sinhgad College of Engineering, Maharashtra, India

More information

Lecture 3: Graphs and flows

Lecture 3: Graphs and flows Chapter 3 Lecture 3: Graphs and flows Graphs: a useful combinatorial structure. Definitions: graph, directed and undirected graph, edge as ordered pair, path, cycle, connected graph, strongly connected

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Question Bank Subject: Advanced Data Structures Class: SE Computer

Question Bank Subject: Advanced Data Structures Class: SE Computer Question Bank Subject: Advanced Data Structures Class: SE Computer Question1: Write a non recursive pseudo code for post order traversal of binary tree Answer: Pseudo Code: 1. Push root into Stack_One.

More information

IN the recent years, overlay networks have been increasingly

IN the recent years, overlay networks have been increasingly IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 6, JUNE 2008 837 Scalable and Efficient End-to-End Network Topology Inference Xing Jin, Student Member, IEEE Computer Society, Wanqing

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Multicomputer distributed system LECTURE 8

Multicomputer distributed system LECTURE 8 Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in

More information

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes N.A. Nordbotten 1, M.E. Gómez 2, J. Flich 2, P.López 2, A. Robles 2, T. Skeie 1, O. Lysne 1, and J. Duato 2 1 Simula Research

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Anjan K. V. Timothy Mark Pinkston José Duato Pyramid Technology Corp. Electrical Engg. - Systems Dept.

More information

Crossbar Analysis for Optimal Deadlock Recovery Router Architecture

Crossbar Analysis for Optimal Deadlock Recovery Router Architecture rossbar Analysis for Optimal Deadlock Recovery Router Architecture Yungho hoi Timothy Mark Pinkston SMART Interconnects Group EE-Systems Dept, University of Southern alifornia, Los Angeles, A 90089-2562

More information

CHAPTER-III WAVELENGTH ROUTING ALGORITHMS

CHAPTER-III WAVELENGTH ROUTING ALGORITHMS CHAPTER-III WAVELENGTH ROUTING ALGORITHMS Introduction A wavelength routing (WR) algorithm selects a good route and a wavelength to satisfy a connection request so as to improve the network performance.

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

Networks, Routers and Transputers:

Networks, Routers and Transputers: This is Chapter 1 from the second edition of : Networks, Routers and Transputers: Function, Performance and applications Edited M.D. by: May, P.W. Thompson, and P.H. Welch INMOS Limited 1993 This edition

More information

Characterization of Deadlocks in Interconnection Networks

Characterization of Deadlocks in Interconnection Networks Characterization of Deadlocks in Interconnection Networks Sugath Warnakulasuriya Timothy Mark Pinkston SMART Interconnects Group EE-System Dept., University of Southern California, Los Angeles, CA 90089-56

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Graph Algorithms. Definition

Graph Algorithms. Definition Graph Algorithms Many problems in CS can be modeled as graph problems. Algorithms for solving graph problems are fundamental to the field of algorithm design. Definition A graph G = (V, E) consists of

More information

Traffic Control in Wormhole Routing Meshes under Non-Uniform Traffic Patterns

Traffic Control in Wormhole Routing Meshes under Non-Uniform Traffic Patterns roceedings of the IASTED International Conference on arallel and Distributed Computing and Systems (DCS) November 3-6, 1999, Boston (MA), USA Traffic Control in Wormhole outing Meshes under Non-Uniform

More information