Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound

Size: px
Start display at page:

Download "Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound"

Transcription

1 Oblivious Routing Design for Mesh Networs to Achieve a New Worst-Case Throughput Bound Guang Sun,, Chia-Wei Chang, Bill Lin, Lieguang Zeng Tsinghua University, University of California, San Diego State Key Laboratory on Microwave and Digital Communications, Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 00084, China Abstract / networ capacity is often believed to be the limit of worst-case throughput for mesh networs. However, this paper provides a new worst-case throughput bound, which is higher than / networ capacity, for odd radix two-dimensional mesh networs. In addition, we propose a routing algorithm called UTURN that can achieve this worstcase throughput bound for odd radix meshes. For even radix meshes, we prove that UTURN achieves the optimal worst-case throughput, namely, half of networ capacity. UTURN considers all routing paths with at most turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that UTURN outperforms existing routing algorithms in worstcase throughput. Moreover, UTURN achieves good average-throughput at the expense of approximately.5x minimal average hop count. For asymmetric meshes, we further propose an algorithm called UTURN- A and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that UTURN and UTURN-A outperform existing algorithms VAL, DOR and OTURN in both worstcase and average throughput for asymmetric meshes. I. INTRODUCTION Networs-on-chip NoC) has been proposed as a new intrachip communication infrastructure. Regular topologies, especially the mesh topology, have been widely deployed [][]. The choice of oblivious routing algorithm, which is oblivious of the networ state when determining a route, has a significant impact on the networ performance, such as the throughput and the latency. Average-case throughput provides an average performance metric, whereas worstcase throughput provides a performance guarantee. It is well-nown and has been shown that the optimal worst-case throughput for oblivious routing in two-dimensional mesh networs is / of the networ capacity when the radix is even. For example, such an analysis for the even radix case is provided in []. For the odd radix case, it is also well-nown that a worst-case throughput of / networ capacity can be achieved. This / of networ capacity is often believed to be the limit of the worstcase throughput. For example, in the near optimality analysis of the OTURN routing algorithm [4], the near optimality of OTURN in the odd radix case is relative to a presumed / networ capacity limit. In this paper, we provide a new worst-case throughput bound of +)/ +) of networ capacity for the odd radix twodimensional mesh with radix, which is higher than / networ capacity. We propose a non-minimal oblivious routing algorithm called UTURN uniform -turn) that can achieve this worst-case throughput bound. Improved analytical results for oblivious routing are important because they often form the theoretical underpinnings for a variety of networ analysis problems. The main contributions of this paper are as follows: a) We provide a new worst-case throughput bound, which is higher than / networ capacity, for the odd radix two-dimensional mesh. b) UTURN achieves this worst-case throughput. It outperforms VAL [0], DOR [5] and OTURN [4] in worst-case throughput for all odd radix mesh networs. For even radix two-dimensional mesh, we proved UTURN achieves optimal worst-case throughput, namely, half of networ capacity. c) For average-case throughput, UTURN also outperforms VAL, DOR, and OTURN, on average over the six evaluated networ sizes both odd and even radix), by 7.%, 39.7% and 8.8%, respectively. d) For asymmetric meshes, we describe a variant called UTURN- A and analyze the theoretical performances for different routing algorithms. Both theoretical analysis and simulation show that UTURN and UTURN-A achieve better worst-case and average case throughput than other existing algorithms VAL, DOR and OTURN. The rest of this paper is organized as follows. Section II provides a brief bacground on routing algorithms and performance metrics. Section III describes our UTURN routing algorithm and provides theoretical performance analysis. Section IV discusses the performance in asymmetric meshes for different algorithms. Section V provides simulation results. Finally, Section VI draws conclusions. II. BACKGROUND A. Routing Algorithms In this section, we first enumerate some classic routing algorithms in NoC, and then we discuss the pros and cons of each routing scheme by comparing it with our routing design. Dimension-order routing DOR)[5] algorithm is one of the most popular routing algorithms. However, it suffers from poor worstcase and average-case throughput due to the lac of routing diversity [4]. On the other hand, Valiant proposed a routing algorithm called VAL [0] that offers excellent worst-case throughput half of networ capacity), but it suffers from poor average-case throughput and long routing path [6]. ROMM [3] overcomes some of the drawbacs in DOR and VAL. It achieves minimal-length routing and good averagecase throughput. However, its worst-case throughput is very poor. Thus, these existing routing algorithms suffer from either poor worstcase throughput e.g. DOR, ROMM) or poor latency e.g. VAL). OTURN [4] achieves both minimal-length routing and good throughput. For D odd radix mesh networs, OTURN has worstcase throughput within a factor of / of half of networ capacity. However, instead of using at most one turn in OTURN routing, our UTURN routing considers all routing paths with at most turns and distributes the traffic uniformly in both X and Y dimensions, which results in higher worst-case throughput than half of networ capacity for all odd radix mesh networs. Simulations show that UTURN outperforms the above routing algorithms in both worst-case and average-case throughput. Although the routing algorithms described in [9] and [7] also use routes with at most turns, they are developed for torus networs rather than meshes. Besides, TURN [9] does not have a closedform description. It requires solving a linear programming for each networ radix. But the linear programming grows rapidly, maing it difficult to scale to large networs [7]. IVAL [9] has only been shown to achieve the same worst-case throughput as VAL for torus networs, whereas UTURN has a closed-form description and //$ IEEE 47

2 achieves a higher worst-case throughput than VAL in the case of odd radix mesh networs. Lie ITURN and WTURN described in [7] for torus networs, UTURN also provides closed-form analytical expressions for evaluating the average path length. These closedform analytical expressions helped us to derive a routing algorithm that achieves a worst-case throughput better than the / networ capacity previously believed to be the limit of worst-case throughput for mesh networs. B. Performance Metrics In this section, we give the performance metrics, such as worstcase and average-case throughput, for oblivious routing algorithms. In order to simplify the discussion on these performance metrics, lie [4] [6], we ignore flow control issues, and assume single flit pacets that route from node to node in one cycle. In particular, we elaborate on the concept of networ capacity, and provide a brief overview of analytical methods to evaluate worst-case and averagecase throughput for oblivious routing algorithms. Networ Capacity: Networ capacity is defined by the maximal sustainable channel load Υ when a networ is loaded with uniformly distributed traffic [4]. For any -ary n-mesh [][6], independent of n, is even Υ 4 = ) is odd 4 where Υ is the inverse of the networ capacity. Worst-Case Throughput: For a given routing algorithm R and traffic matrix Λ, the maximal channel load ΥR, Λ) is the expected traffic loads crossing the heaviest loaded channel under R and Λ [6]. For a given routing algorithm R, the worst-case channel load Υ wcr) is the heaviest channel load that can be caused by any admissible traffic [6]. A double sub-stochastic traffic matrix Λ is admissible when it has row and column sums of at most. For a given routing algorithm, [8] provided a method to find worst-case traffic that can cause worst-case channel load and worst-case throughput. Note that the worst-case channel load is the inverse of worst-case throughput. Lie [6][4], we use the normalized worst-case throughput, which is normalized to the networ capacity, as worst-case performance metric: «ΥwcR) Θ wcr) = ) Average-Case Throughput: As for average performance metric, we use normalized average-case throughput. Using the same method in [6][4][8], the average-case throughput for a given routing algorithm R is found by averaging the throughput over a large set of random traffic matrix T e.g. T = ) III. UTURN ROUTING In this section, we first describe our UTURN routing algorithm for mesh networs, and then we provide the performance analysis, including throughput analysis and latency analysis. A. Routing Description As the name implies, UTURN considers any routing path between the source and destination with at most turns, where each turn changes dimension from X to Y or Y to X. Note that U-turns, turns that reverse direction along the same dimension, are not allowed in UTURN. In comparison to OTURN routing, UTURN is more flexible in that OTURN routes are a subset of UTURN routes. OTURN only uses routes with at most one turn. However, our UTURN routing considers all routing paths with at most turns and balances Υ the traffic in both X and Y dimensions. The greater flexibility and more uniform traffic distribution enable UTURN to achieve better throughput performance. For implementation, UTURN routing consists of two parts: XYX- Routing with 0.5 probability and YXY-Routing with also 0.5 probability. First, we elaborate on XYX-Routing. Suppose x,y ) is the source node and x,y ) is the destination node. XYX-Routing is a three-segment-routing generated as follows: a) X-segment: choose at uniform random an X position x [0, ], and route in X-dimension from source node x,y ) to x,y ). b) Y-segment: route from x,y ) to x,y ) in Y- dimension. c) X-segment: route from x,y ) to destination node x,y ) in X-dimension. Each of these three segments uses minimal-length routing, although the combination of segments can result in non-minimal routing along the entire path. There is a degenerate case. When y = y, it just needs one-segment routing: route directly from x,y ) to x,y ) using minimal-length routing in the X-dimension. Fig. a) depicts five possible XYX-Routing paths from source node to destination node for 5 5 mesh, where each path has a probability of /5. YXY-Routing is similar to XYX-Routing for symmetry, which is shown in Fig. b). There is also a degenerate case for YXY-Routing. When x = x, it just needs one-segment routing: route directly from x,y ) to x,y ) using minimal-length routing in the Y-dimension. Fig.. S D a) XYX-Routing S D b) YXY-Routing Routing paths for XYX-Routing and YXY-Routing in 5 5 mesh. From the aforementioned routing description, we now that XYX- Routing and YXY-Routing cover all routing paths between the source and destination with at most turns. In addition, XYX- Routing distributes the traffic uniformly along the X-Dimension, and YXY-Routing distributes the traffic uniformly along the Y- dimension. UTURN uses XYX-Routing and YXY-Routing with equal probability, so that it balances traffic loads in both X and Y dimensions, which results in a high throughput. As for the deadloc problem, two virtual channels per physical channel is sufficient to achieve deadloc-free routing by incrementing a pacet s virtual channel set after each turn from the Y to the X dimension [6]. B. Performance Analysis ) Throughput Analysis: In this section, we prove that our UTURN achieves a new worst-case throughput bound of + )/ +) of networ capacity for an odd radix two-dimensional mesh with radix, which is higher than / of networ capacity. For even case, UTURN achieves optimal worst-case throughput, namely, half of networ capacity. We prove these by three claims as follows. Claim. For one-dimensional mesh, the worst-case channel load of minimal-length routing is when the radix is odd and when is even. Proof: [8] proposed a method, which find the worst-case traffic for any given oblivious routing algorithm in a given topology. So, we use this method, and find that the worst-case traffic for 48

3 minimal-length routing in one dimension mesh is Tornado: from x to x + ) mod. The heaviest channel load of the minimal-length routing under Tornado traffic matrix is when is odd and when is even. Therefore, this is the worst-case channel load for minimal-length routing in one-dimensional mesh. As the worst-case throughput is the inverse of worst-case channel load, the worst-case throughput of minimal-length routing in odd radix one-dimensional mesh is, which is higher than half of networ capacity. For even radix one-dimensional mesh, the worst-case throughput of minimal-length routing is, which is optimal, namely, half of networ capacity. Claim. For the odd radix two-dimensional mesh, the worst-case channel load of XYX-Routing is: in Y-dimension and in X- dimension, respectively. For the even radix two-dimensional mesh, the worst-case channel load of XYX-Routing is in X and Y dimensions. Proof: As discussed in Section II-B, the maximal channel load under uniformly distributed traffic is when is odd and 4 4 when is even. XYX-Routing consists of three segments routing, and the two X-segment routings get twice uniform. The maximal channel load under twice uniform traffic matrix is simply ) 4 = when is odd and )= when is even. Thus, for 4 two-dimensional mesh, the worst-case channel load of XYX-Routing in X-dimension is when the radix is odd and when the radix is even. As for the Y-dimension case in XYX-Routing, it uses minimallength routing from x,y ) to x,y ). Since traffic is uniformly distributed along the X-dimension, the Y-dimension channels with different X positions but the same Y position have the same traffic loads. So the traffic of Y-dimension of XYX-Routing is degenerated into the case in Claim, namely, minimal-length routing in onedimension mesh. Thus, we get that the maximal channel load in Y dimension for XYX-Routing is when the radix is odd and when is even. Therefore, the worst-case channel load for XYX-Routing is when the radix is odd and when the radix is even. And the worst-case throughput for XYX-Routing is when the radix is odd and when the radix is even. As shown in Section II-B, the networ capacity for mesh is the inverse of Υ 4, namely, when the radix is odd and 4 when the radix is even. Thus, the worst-case throughput for XYX-Routing equals to half of networ capacity. Claim 3. For odd radix two-dimensional mesh, UTURN achieves a worst-case throughput bound of +)/+) of networ capacity, which is higher than half of networ capacity. For even radix twodimensional mesh, UTURN achieves optimal worst-case throughput, namely, half of networ capacity. Proof: The worst-case channel load analysis for YXY-Routing is the same with XYX-Routing because of symmetry. Similarly, it follows that the worst-case channel load of YXY-Routing in odd radix two-dimensional mesh is in X-dimension and in the Y-dimension, respectively. For the even radix two-dimensional mesh, the worst-case channel load of YXY-Routing is in X and Y dimensions. UTURN consists of two parts: XYX-Routing with 0.5 probability and YXY-Routing with also 0.5 probability. Thus, for odd radix twodimensional mesh, the worst-case channel load of UTURN in X- dimension is 0.5 )+0.5 )=. Similarly, the worstcase channel load of UTURN in Y-dimension is also 4. 4 So, the the worst-case channel load of UTURN is and 4 the worst-case throughput is in odd radix two-dimensional 4 mesh. For even radix two-dimensional mesh, the worst-case channel load of UTURN in X-dimension and Y-dimension are all 0.5 )+ 0.5 )=. So, the the worst-case channel load of UTURN is and the worst-case throughput is in even radix two-dimensional mesh. 4 As shown in Section II-B, the networ capacity for mesh is when the radix is odd and 4 when the radix is even. Therefore, for an odd radix two-dimensional mesh, UTURN achieves a worst-case 4 4 throughput bound of / = +)/ +) of networ capacity, which is higher than half of networ capacity for all radix. For an even radix two-dimensional mesh, UTURN achieves a worst-case throughput of ` ` / 4 =0.5, which is optimal. Fig. shows that UTURN outperforms VAL, DOR and OTURN in worst-case throughput for different odd radix networs. The worstcase throughput in even radix case is optimal and we do not show the even results in Fig.. Different with UTURN, the optimal routing including TURN) proposed in [9] does not have a closed-form description, which requires solving a separate linear programming for each networ radix. The linear programming grow so fast that it is difficult to scale to large networs [7]. Note that UTURN achieves the same optimal worst-case throughput as the optimal routing that can be obtained with the method proposed in [9]. Normalized Throughput Fig VAL DOR 0. Our algorithm OTURN Optimal Routing Networ Radix ) Worst-case throughput for odd radix meshes. ) Latency Analysis: For latency analysis, we use the average hop count H aver) for a given routing algorithm R to measure the average latency. We assume that the traffic between all sourcedestination pairs are all the same when computing H aver) [6]. The average hop count for DOR in -ary -mesh networ is, 3 where each component represents the average hop count in 3 each dimension using minimal-length routing [][6]. In UTURN, minimal-length routing is used in three segments. However, as mentioned in Section III-A, there is a degenerated case when y = y, which has a probability of /. In this degenerated case, there is only one minimal-length routing segment. Therefore, the average hop count of UTURN is given as follows: H aveuturn) = 3 3 )+ 3 = 3 3 ) 3) In order to quantify the penalty factor in average latency for given routing algorithm R over DOR, we normalize the average hop count 49

4 of R to DOR. So the penalty factor for UTURN is 3, which is less than.5 for all radix. Compared with VAL, which has a penalty factor of, UTURN achieves better performance in both throughput and latency. UTURN is a non-minimal routing, which has.5x average hop count when compared with OTURN and DOR. In comparison with DOR and OTURN, UTurn offers a latency-bandwidth tradeoff for mesh networs where it offers better throughput at the expense of a higher latency cost. Fig. 3. routing. a) DOR-XY routing b) DOR-YX routing Traffic patterns which achieve maximal load lower bound in DOR IV. ASYMMETRIC MESHES Networ Capacity: SectionII-B discusses the networ capacity for -ary n-mesh networs, namely symmetric meshes. We can get the networ capacity in asymmetric meshes using the same analysis method with symmetric meshes [][6]. Suppose max = max x, y) in the asymmetric meshes, the maximal sustainable channel loads under uniform traffic are max Υ 4 max is even asym = 4) max is odd max 4 max where Υ asym is the inverse of the networ capacity. A. Throughput Analysis for Different Algorithms In the following, we will analyze the worst-case throughput for different routing algorithms in asymmetric meshes with the assumption of max = x. For the case of max = y, these algorithms just need to exchange the operations between X and Y dimensions. VAL: For VAL routing, the two-stage minimal routing gets twice uniform loads. Thus, the worst-case channel loads of VAL is Υ asym, which results in normalized worst-case throughput T WCVAL)=. 5) DOR: For DOR routing, the worst-case throughput can be different for various y in the meshes where max = x > y >. However, we can get a maximal load bound for different y: max max is even L maxdor) + max 6) max is odd The result for the even max case is easy to find under the traffic: the left half part nodes are sources and the right half part nodes are destinations. For the odd max case, the result can be found under the traffic patterns showed in Fig. 3, where the red nodes represent sources and the green nodes represent destinations. For both DOR- XY and DOR-YX routings, the channel load represented by blac arrows achieve the maximal load bound of +max. Eqn. 6 just shows the lower bounds. Absolutely, there are worse cases for different y. For example, when x y =, the maximal loads are max for both even and odd max, which are showed in Fig. 4. Therefore, according to the Eqn. 6, the upper bound of normalized worst-case throughput for DOR routing is T WCDOR) max is even max max < max is odd OTURN: For OTURN, we can also get a maximal load bound as following: L maxoturn) max, max is even or odd. 8) Fig.5 shows the traffic patterns for this load lower bound in both even and odd max cases. For 7 6 mesh Fig. 5a)) and 6 5 mesh Fig. 5b)), the channel loads represented by blac arrows are 3.5 and 7) Fig. 4. a) DOR-XY routing 7 x 6) b) DOR-YX routing6 x5) Examples where the maximal loads are max in DOR routing. 3, respectively. Therefore, according to the Eqn. 8, the upper bound of normalized worst-case throughput for OTURN routing is T WCOTURN) max is even max max < max is odd a) OTURN routing 7 x 6) b) OTURN routing6 x5) Fig. 5. Traffic patterns which achieve maximal load lower bound in OTURN routing. UTURN: Section III-A describes the UTURN algorithm for the symmetric case. According to the analysis of Claim in the Section III-B, we can get the maximal channel loads in X and Y dimensions L x and L y) in the mesh x > y for YXY and XYX routing, which are showed in Table I. For meshes x > y, the maximal channel load is decided by L x because L x >L y for all the cases in Table I. YXY-routing achieves excellent worst-case channel loads when x > y: max max is even L maxyxy) = max 0) max is odd Thus, the normalized worst-case throughput for YXY-routing in the mesh x > y is T WCYXY) = max is even max+ max > max is odd 9) ) For meshes y > x, XYX-routing achieves the same worst-case throughput with YXY-routing in meshes x > y, namely, Eqn.. Therefore, UTURN only implements YXY-routing in x > y meshes. In x < y meshes, UTURN only implements XYXrouting. Thus, UTURN achieves the worst-case throughput of Eqn.. UTURN-A: UTURN achieves the optimal worst-case throughput for both odd and even radix networs and for both symmetric and asymmetric networs. In the case of asymmetric meshes, we can employ a variant of UTURN that we refer to as UTURN-A, UTURN-A is a variant of UTURN that optimizes for average case performance in asymmetric meshes. 430

5 TABLE I MAXIMAL CHANNEL LOADS IN X AND Y DIMENSIONS FOR XYX-ROUTING AND YXY-ROUTING. x y mesh XYX-routing YXY-routing even x L x = x L x = x even y L y = y L y = y even x L x = x L x = x odd y L y = y L y = y y odd x L x = x x L x = x even y L y = y L y = y odd x L x = x L x = x x odd y L y = y L y = y y better average-case throughput by 4.6%, 0.8% and 4.5%, respectively. Compared with DOR, OTURN and VAL, our UTURN achieves better average-case throughput by.9%, 8.5% and 8.8%, respectively. Compared with UTURN-A, UTURN achieves better worst-case throughput but worse average throughput. which can achieve better average case throughput than UTURN at the expense of worst-case throughput optimality. UTURN-A is implemented by using XYX-routing and YXY-routing with the same probability. According to Table I, UTURN-A achieves worst-case channel loads when x > y: Fig. 6. Worst-case throughput for different routing algorithms in different asymmetric mesh sizes. L maxuturn-a) =L xuturn-a) = LxXYX)+ LxYXY) max max is even = max ) max+) 4 max max is odd ) For meshes x < y, UTURN-A achieves the same maximal channel load with the x > y case, namely, Eqn., excepting that max = y. So, the normalized worst-case throughput is T WCUTURN-A) = max is even max is odd max+ max+ > 3) As can be seen in Eqn. 3, the worst-case throughput of UTURN- A is the same as that of UTURN for the even radix case, which is optimal. However, in the odd radix case, although the worst-case throughput of UTURN-A is better than / networ capacity, it is only near-optimal in that UTURN-A cannot achieve the worst-case throughput optimal result of UTURN as shown in Eqn. ). B. Throughput Comparison for Different Algorithms According to Equations 5, 7, 9, and 3, our UTURN and UTURN-A algorithms achieve better worst-case throughput than VAL, DOR and OTURN in asymmetric meshes. The aforementioned theoretical analysis shows that UTURN and UTURN-A achieve best worst-case throughput of / networ capacity in the asymmetric meshes with even max. For the asymmetric meshes with odd max, our algorithms UTURN and UTURN-A achieve a worstcase throughput better than / networ capacity. Fig. 6 shows the normalized worst-case throughput of different routing algorithms for different asymmetric mesh sizes. It shows that our UTURN and UTURN-A algorithms outperform existing algorithms lie VAL, DOR and OTURN in worst-case throughput, which also validates the aforementioned theoretical worst-case throughput analysis for asymmetric meshes. As for average-case throughput, our algorithms also achieves excellent performance. Fig. 7 shows the average-case throughput of different routing algorithms for different asymmetric mesh sizes. Compared with DOR, OTURN and VAL, UTURN-A achieves Fig. 7. Average-case throughput for different routing algorithms in different asymmetric mesh sizes. V. EVALUATION FOR SYMMETRIC MESHES In this section, we compare the throughput of UTURN with VAL, DOR and OTURN with respect to each traffic matrix shown in Table II. Table III and Table IV provide the evaluation results for different odd and even radix symmetric networs respectively. As shown in Table III, UTURN outperforms all other routing algorithms in worst-case throughput for odd radix mesh networs, and achieves a new optimal worst-case throughput bound of +)/+ ), which is higher than / of networ capacity. For the performance in the even radix mesh networs showed in Table IV, UTURN achieves the best worst-case throughput, namely, half of networ capacity. This also validates the throughput analysis of UTURN in Section III-B. For average-case throughput, UTURN also outperforms VAL, DOR, and OTURN, on average over the six evaluated networ sizes both odd and even radix), by 7.%, 39.7% and 8.8%, respectively. UTURN also performs very well in both odd and even radix networs under adversarial traffic patterns, namely, Transpose and DOR-WC. For Complement and Nearest-Neighbor traffic, UTURN outperforms VAL, but does worse than DOR and OTURN. For ideal uniformly Random traffic, although DOR achieves better throughput than UTURN, DOR is significantly worse in the average and worse case throughput. In summary, our evaluation shows that UTURN performs very well in different networ sizes and traffic matrices. VI. CONCLUSION UTURN is a closed-form non-minimal oblivious routing algorithm for mesh networs. We prove it achieves a new worst-case throughput bound of +)/ +) of networ capacity with odd radix, which is higher than / networ capacity. For even radix 43

6 Worst-Case Average- Case Transpose Random DOR-WC Complement Nearest- Neighbor TABLE II TRAFFIC MATRICES EVALUATED. Worst-case traffic matrix that results in lowest throughput Average throughput over a million random traffic matrix Pacets route from x,y) to y,x) Destination node chosen at uniform random Worst-case traffic for DOR. Pacets route from x,y) to -y-,-x-) Pacet at x,y) sent to -x-,-y-). Pacets sent to nodes one hop away with equal probability TABLE III NORMALIZED WORST-CASE &AVERAGE-CASE THROUGHPUT FOR ODD CASE 3x3 mesh VAL DOR OTURN UTURN Worst-case Average-case Transpose Random DOR-WC Complement Nearest-Neighbor x5 mesh VAL DOR OTURN UTURN Worst-case Average-case Transpose Random DOR-WC Complement Nearest-Neighbor x7 mesh VAL DOR OTURN UTURN Worst-case Average-case Transpose Random DOR-WC Complement Nearest-Neighbor meshes, UTURN achieves optimal worst-case throughput. Thus, it outperforms existing algorithms such as VAL, DOR and OTURN in worst-case throughput. In addition, when compared with the optimal routings that can be obtained with the method proposed in [9] for odd radix meshes, UTURN can achieve the same optimal worst-case throughput. For average-case throughput, UTURN also outperforms VAL, DOR and OTURN on average of six simulation cases by 7.%, 39.7% and 8.8%, respectively. Moreover, we provide a closed-form expression for the average hop count of UTURN in latency analysis. In comparison with DOR and OTURN, UTURN achieves a higher throughput at the expense of an average hop count that is approximately.5x minimal. For asymmetric meshes, both theoretical analysis and simulation show that our UTURN and UTURN-A algorithms achieve better worst-case and average case throughput than other existing algorithms VAL, DOR and OTURN. VII. ACKNOWLEDGMENTS This wor was partially supported by the Creative Research Groups of National Natural Science Foundation NNSF) under grant No and 63305, the National S&T Major Project under No. 0ZX and 009ZX , Research Fund of Tsinghua University under No , and ZTE Corporation. TABLE IV NORMALIZED WORST-CASE &AVERAGE-CASE THROUGHPUT FOR EVEN CASE 4x4 mesh VAL DOR OTURN UTURN Worst-case Average-case Transpose Random DOR-WC Complement Nearest-Neighbor x6 mesh VAL DOR OTURN UTURN Worst-case Average-case Transpose Random DOR-WC Complement Nearest-Neighbor x8 mesh VAL DOR OTURN UTURN Worst-case Average-case Transpose Random DOR-WC Complement Nearest-Neighbor The authors would also lie to acnowledge the support of NSF CCF REFERENCES [] W. Dally and B. Towles. Principles and practices of interconnection networs. Morgan Kaufmann, 004. [] S. Guang, L. Shijun, J. Depeng, L. Yong, S. Li, Y. Zhang, and Z. Lieguang. Performance-aware hybrid algorithm for mapping ips onto mesh-based networ on chip. IEICE TRANSACTIONS on Information and Systems, 945): , 0. [3] T. Nesson and S. Johnsson. Romm routing on mesh and torus networs. In Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, pages ACM, 995. [4] D. Seo, A. Ali, W. Lim, N. Rafique, and M. Thottethodi. Near-optimal worst-case throughput routing for two-dimensional mesh networs. In ACM SIGARCH Computer Architecture News, volume 33, pages IEEE Computer Society, 005. [5] H. Sullivan and T. Bashow. A large scale, homogeneous, fully distributed parallel machine, i. In ACM SIGARCH Computer Architecture News, volume 5, pages ACM, 977. [6] R. Sunam Ramanujam and B. Lin. Randomized partially-minimal routing on three-dimensional mesh networs. Computer Architecture Letters, 7):37 40, 008. [7] R. Sunam Ramanujam and B. Lin. Weighted random oblivious routing on torus networs. In Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networing and Communications Systems, pages 04. ACM, 009. [8] B. Towles and W. Dally. Worst-case traffic for oblivious routing functions. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 8. ACM, 00. [9] B. Towles, W. Dally, and S. Boyd. Throughput-centric routing algorithm design. In Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, pages ACM, 003. [0] L. Valiant and G. Brebner. Universal schemes for parallel communication. In Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages ACM,

INTERCONNECTION networks are used in a variety of applications,

INTERCONNECTION networks are used in a variety of applications, 1 Randomized Throughput-Optimal Oblivious Routing for Torus Networs Rohit Sunam Ramanujam, Student Member, IEEE, and Bill Lin, Member, IEEE Abstract In this paper, we study the problem of optimal oblivious

More information

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks 2080 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012 Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks Rohit Sunkam

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

A Novel 3D Layer-Multiplexed On-Chip Network

A Novel 3D Layer-Multiplexed On-Chip Network A Novel D Layer-Multiplexed On-Chip Network Rohit Sunkam Ramanujam University of California, San Diego rsunkamr@ucsdedu Bill Lin University of California, San Diego billlin@eceucsdedu ABSTRACT Recently,

More information

Finding Worst-case Permutations for Oblivious Routing Algorithms

Finding Worst-case Permutations for Oblivious Routing Algorithms Stanford University Concurrent VLSI Architecture Memo 2 Stanford University Computer Systems Laboratory Finding Worst-case Permutations for Oblivious Routing Algorithms Brian Towles Abstract We present

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Daeho Seo Akif Ali Won-Taek Lim Nauman Rafique Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Daeho Seo Akif Ali Won-Taek Lim Nauman Rafique Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Throughput-Centric Routing Algorithm Design

Throughput-Centric Routing Algorithm Design Throughput-Centric Routing Algorithm Design Brian Towles, William J. Dally, and Stephen Boyd Department of Electrical Engineering Stanford University {btowles,billd}@cva.stanford.edu ABSTRACT The increasing

More information

Static Virtual Channel Allocation in Oblivious Routing

Static Virtual Channel Allocation in Oblivious Routing Static Virtual Channel Allocation in Oblivious Routing Keun Sup Shim Myong Hyon Cho Michel Kinsy Tina Wen Mieszko Lis G. Edward Suh Srinivas Devadas Computer Science and Artificial Intelligence Laboratory

More information

Chapter 7 Slicing and Dicing

Chapter 7 Slicing and Dicing 1/ 22 Chapter 7 Slicing and Dicing Lasse Harju Tampere University of Technology lasse.harju@tut.fi 2/ 22 Concentrators and Distributors Concentrators Used for combining traffic from several network nodes

More information

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,

More information

Topology basics. Constraints and measures. Butterfly networks.

Topology basics. Constraints and measures. Butterfly networks. EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh 98 Int. J. High Performance Systems Architecture, Vol. 1, No. 2, 27 Design of a router for network-on-chip Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh Department of Electrical Engineering and Computer

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy, Tina Wen, and Srinivas Devadas

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy, Tina Wen, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-009-0 March 7, 009 Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim,

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed

More information

Center Concentrated X-Torus Topology

Center Concentrated X-Torus Topology Center Concentrated X-Torus Topolog Dinesh Kumar 1, Vive Kumar Sehgal, and Nitin 3 1 Department of Comp. Sci. & Eng., Japee Universit of Information Technolog, Wanaghat, Solan, Himachal Pradesh, INDIA-17334.

More information

Path-Diverse Inorder Routing

Path-Diverse Inorder Routing Path-Diverse Inorder Routing Mieszko Lis, Myong Hyon Cho, Keun Sup Shim, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {mieszko, mhcho,

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

IN this letter we focus on OpenFlow-based network state

IN this letter we focus on OpenFlow-based network state 1 On the Impact of Networ State Collection on the Performance of SDN Applications Mohamed Aslan, and Ashraf Matrawy Carleton University, ON, Canada Abstract Intelligent and autonomous SDN applications

More information

Scheduling Unsplittable Flows Using Parallel Switches

Scheduling Unsplittable Flows Using Parallel Switches Scheduling Unsplittable Flows Using Parallel Switches Saad Mneimneh, Kai-Yeung Siu Massachusetts Institute of Technology 77 Massachusetts Avenue Room -07, Cambridge, MA 039 Abstract We address the problem

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng Prof. Natalie Enright Jerger Announcements Feedback on your project proposals This week Scheduled extended 1 week Next week:

More information

Slim Fly: A Cost Effective Low-Diameter Network Topology

Slim Fly: A Cost Effective Low-Diameter Network Topology TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Routing Algorithms for 2-D Mesh Network-On-Chip Architectures

Routing Algorithms for 2-D Mesh Network-On-Chip Architectures Routing Algorithms for 2-D Mesh Network-On-hip Architectures Edward hron Gene Kishinevsky Brandon Nefcy Nishant Patil Department of Electrical Engineering, Stanford University {echron, genek, nefcy, nppatil}@stanford.edu

More information

Adaptive Channel Queue Routing on k-ary n-cubes

Adaptive Channel Queue Routing on k-ary n-cubes Adaptive Channel Queue Routing on k-ary n-cubes Arjun Singh, William J Dally, Amit K Gupta, Brian Towles Computer Systems Laboratory, Stanford University {arjuns,billd,btowles,agupta}@cva.stanford.edu

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S

More information

Chapter 3 : Topology basics

Chapter 3 : Topology basics 1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks 009 8th International Conference on Parallel Architectures and Compilation Techniques Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy,

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information

in Oblivious Routing

in Oblivious Routing Static Virtual Channel Allocation in Oblivious Routing Keun Sup Shim, Myong Hyon Cho, Michel Kinsy, Tina Wen, Mieszko Lis G. Edward Suh (Cornell) Srinivas Devadas MIT Computer Science and Artificial Intelligence

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

DISTRIBUTED ROUTER FABRICS

DISTRIBUTED ROUTER FABRICS DISTRIBUTED ROUTER FABRICS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

Centralized Adaptive Routing for NoCs

Centralized Adaptive Routing for NoCs Centralized Adaptive Routing for NoCs Authors Name/s per 1st Affiliation (Author) line 1 (of Affiliation): dept. name of organization line 2: name of organization, acronyms acceptable line 3: City, Country

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web TR020701 April 2002 Erbil Yilmaz Department of Computer Science The Florida State University Tallahassee, FL 32306

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

Adaptive Routing in Hexagonal Torus Interconnection Networks

Adaptive Routing in Hexagonal Torus Interconnection Networks Adaptive Routing in Hexagonal Torus Interconnection Networks Arash Shamaei and Bella Bose School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR 97331 5501 Email: {shamaei,bose}@eecs.oregonstate.edu

More information

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing 1 Selvarajah Thuseethan, 2 Shanmuganathan Vasanthapriyan 1,2 Department of Computing and Information Systems, Sabaragamuwa University

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture On-chip Networking Prof. Michel A. Kinsy Virtual Channel Router VC 0 Routing Computation Virtual Channel Allocator Switch Allocator Input Ports VC x VC 0 VC x It s a system

More information

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors Proceedings of the World Congress on Engineering 2018 ol I A Routing Algorithm for 3 Network-on-Chip in Chip Multi-Processors Rui Ben, Fen Ge, intian Tong, Ning Wu, ing hang, and Fang hou Abstract communication

More information

Chapter 4 : Butterfly Networks

Chapter 4 : Butterfly Networks 1 Chapter 4 : Butterfly Networks Structure of a butterfly network Isomorphism Channel load and throughput Optimization Path diversity Case study: BBN network 2 Structure of a butterfly network A K-ary

More information

Throughout this course, we use the terms vertex and node interchangeably.

Throughout this course, we use the terms vertex and node interchangeably. Chapter Vertex Coloring. Introduction Vertex coloring is an infamous graph theory problem. It is also a useful toy example to see the style of this course already in the first lecture. Vertex coloring

More information

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng Prof. Natalie Enright Jerger Rou1ng Overview Discussion of topologies assumed ideal rou1ng In prac1ce Rou1ng algorithms are

More information

Design of a high-throughput distributed shared-buffer NoC router

Design of a high-throughput distributed shared-buffer NoC router Design of a high-throughput distributed shared-buffer NoC router The MIT Faculty has made this article openly available Please share how this access benefits you Your story matters Citation As Published

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

Dynamic Wavelength Assignment for WDM All-Optical Tree Networks

Dynamic Wavelength Assignment for WDM All-Optical Tree Networks Dynamic Wavelength Assignment for WDM All-Optical Tree Networks Poompat Saengudomlert, Eytan H. Modiano, and Robert G. Gallager Laboratory for Information and Decision Systems Massachusetts Institute of

More information

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product Available online at www.sciencedirect.com ScienceDirect IERI Procedia 10 (2014 ) 153 159 2014 International Conference on Future Information Engineering Achieve Significant Throughput Gains in Wireless

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network Improving the Data Scheduling Efficiency of the IEEE 802.16(d) Mesh Network Shie-Yuan Wang Email: shieyuan@csie.nctu.edu.tw Chih-Che Lin Email: jclin@csie.nctu.edu.tw Ku-Han Fang Email: khfang@csie.nctu.edu.tw

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The

More information

A Low-Overhead Hybrid Routing Algorithm for ZigBee Networks. Zhi Ren, Lihua Tian, Jianling Cao, Jibi Li, Zilong Zhang

A Low-Overhead Hybrid Routing Algorithm for ZigBee Networks. Zhi Ren, Lihua Tian, Jianling Cao, Jibi Li, Zilong Zhang A Low-Overhead Hybrid Routing Algorithm for ZigBee Networks Zhi Ren, Lihua Tian, Jianling Cao, Jibi Li, Zilong Zhang Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs

Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs MATEC Web of Conferences 22, 01028 ( 2015) DOI: 10.1051/ matecconf/ 20152201028 C Owned by the authors, published by EDP Sciences, 2015 Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs

More information

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and Parallel Processing Letters c World Scientific Publishing Company A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS DANNY KRIZANC Department of Computer Science, University of Rochester

More information

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips : An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips Varsha Sharma, Rekha Agarwal Manoj S. Gaur, Vijay Laxmi, and Vineetha V. Computer Engineering Department, Malaviya

More information

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS Mariano Benito Pablo Fuentes Enrique Vallejo Ramón Beivide With support from: 4th IEEE International Workshop of High-Perfomance Interconnection

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks

More information

Diagonal Principal Component Analysis for Face Recognition

Diagonal Principal Component Analysis for Face Recognition Diagonal Principal Component nalysis for Face Recognition Daoqiang Zhang,2, Zhi-Hua Zhou * and Songcan Chen 2 National Laboratory for Novel Software echnology Nanjing University, Nanjing 20093, China 2

More information

Partitioning Methods for Multicast in Bufferless 3D Network on Chip

Partitioning Methods for Multicast in Bufferless 3D Network on Chip Partitioning Methods for Multicast in Bufferless 3D Network on Chip Chaoyun Yao 1(B), Chaochao Feng 1, Minxuan Zhang 1, Wei Guo 1, Shouzhong Zhu 1, and Shaojun Wei 2 1 College of Computer, National University

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao

More information

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara Parallel Computer Architecture

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Application-Specific Routing Algorithm Selection Function Look-Ahead Traffic-aware Execution (LATEX) Algorithm Experimental

More information

Geometric transformations assign a point to a point, so it is a point valued function of points. Geometric transformation may destroy the equation

Geometric transformations assign a point to a point, so it is a point valued function of points. Geometric transformation may destroy the equation Geometric transformations assign a point to a point, so it is a point valued function of points. Geometric transformation may destroy the equation and the type of an object. Even simple scaling turns a

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011

548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011 548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011 Extending the Effective Throughput of NoCs with Distributed Shared-Buffer Routers Rohit Sunkam

More information

Capacity of Grid-Oriented Wireless Mesh Networks

Capacity of Grid-Oriented Wireless Mesh Networks Capacity of Grid-Oriented Wireless Mesh Networks Nadeem Akhtar and Klaus Moessner Centre for Communication Systems Research University of Surrey Guildford, GU2 7H, UK Email: n.akhtar@surrey.ac.uk, k.moessner@surrey.ac.uk

More information

Testing Isomorphism of Strongly Regular Graphs

Testing Isomorphism of Strongly Regular Graphs Spectral Graph Theory Lecture 9 Testing Isomorphism of Strongly Regular Graphs Daniel A. Spielman September 26, 2018 9.1 Introduction In the last lecture we saw how to test isomorphism of graphs in which

More information

An Empirical Study of Performance Benefits of Network Coding in Multihop Wireless Networks

An Empirical Study of Performance Benefits of Network Coding in Multihop Wireless Networks An Empirical Study of Performance Benefits of Network Coding in Multihop Wireless Networks Dimitrios Koutsonikolas, Y. Charlie Hu, Chih-Chun Wang School of Electrical and Computer Engineering, Purdue University,

More information

A novel look-ahead routing algorithm based on graph theory for triplet-based network-on-chip router

A novel look-ahead routing algorithm based on graph theory for triplet-based network-on-chip router LETTER IEICE Electronics Express, Vol.15, No.8, 1 12 A novel look-ahead routing algorithm based on graph theory for triplet-based network-on-chip router Xu Chen, Feng Shi a), Fei Yin, and Xiaojun Wang

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Quantification of Capacity and Transmission Delay for Mobile Ad Hoc Networks (MANET)

Quantification of Capacity and Transmission Delay for Mobile Ad Hoc Networks (MANET) Quantification of Capacity and Transmission Delay for Mobile Ad Hoc Networks (MANET) 1 Syed S. Rizvi, 2 Aasia Riasat, and 3 Khaled M. Elleithy 1, 3 Computer Science and Engineering Department, University

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

Constant Queue Routing on a Mesh

Constant Queue Routing on a Mesh Constant Queue Routing on a Mesh Sanguthevar Rajasekaran Richard Overholt Dept. of Computer and Information Science Univ. of Pennsylvania, Philadelphia, PA 19104 ABSTRACT Packet routing is an important

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links Hoda Naghibi Jouybari College of Electrical Engineering, Iran University of Science and Technology, Tehran,

More information

A Combined BIT and TIMESTAMP Algorithm for. the List Update Problem. Susanne Albers, Bernhard von Stengel, Ralph Werchner

A Combined BIT and TIMESTAMP Algorithm for. the List Update Problem. Susanne Albers, Bernhard von Stengel, Ralph Werchner A Combined BIT and TIMESTAMP Algorithm for the List Update Problem Susanne Albers, Bernhard von Stengel, Ralph Werchner International Computer Science Institute, 1947 Center Street, Berkeley, CA 94704,

More information

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1 Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a

More information