Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound

Similar documents
INTERCONNECTION networks are used in a variety of applications,

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Novel 3D Layer-Multiplexed On-Chip Network

Finding Worst-case Permutations for Oblivious Routing Algorithms

WITH the development of the semiconductor technology,

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Topologies. Maurizio Palesi. Maurizio Palesi 1

Network-on-chip (NOC) Topologies

Throughput-Centric Routing Algorithm Design

Static Virtual Channel Allocation in Oblivious Routing

Chapter 7 Slicing and Dicing

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Topology basics. Constraints and measures. Butterfly networks.

Basic Switch Organization

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy, Tina Wen, and Srinivas Devadas

Lecture 3: Topology - II

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Center Concentrated X-Torus Topology

Path-Diverse Inorder Routing

A Novel Energy Efficient Source Routing for Mesh NoCs

IN this letter we focus on OpenFlow-based network state

Scheduling Unsplittable Flows Using Parallel Switches

Lecture 2: Topology - I

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

Slim Fly: A Cost Effective Low-Diameter Network Topology

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Routing Algorithms for 2-D Mesh Network-On-Chip Architectures

Adaptive Channel Queue Routing on k-ary n-cubes

Topologies. Maurizio Palesi. Maurizio Palesi 1

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

Deadlock-free XY-YX router for on-chip interconnection network

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Chapter 3 : Topology basics

Interconnection networks

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks

Asynchronous Bypass Channel Routers

in Oblivious Routing

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Performance Analysis of a Minimal Adaptive Router

DISTRIBUTED ROUTER FABRICS

Centralized Adaptive Routing for NoCs

A Survey of Techniques for Power Aware On-Chip Networks.

Routing Algorithms. Review

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Demand Based Routing in Network-on-Chip(NoC)

Adaptive Routing in Hexagonal Torus Interconnection Networks

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

EC 513 Computer Architecture

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

Chapter 4 : Butterfly Networks

Throughout this course, we use the terms vertex and node interchangeably.

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

Design of a high-throughput distributed shared-buffer NoC router

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Dynamic Wavelength Assignment for WDM All-Optical Tree Networks

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

The final publication is available at

A Low-Overhead Hybrid Routing Algorithm for ZigBee Networks. Zhi Ren, Lihua Tian, Jianling Cao, Jibi Li, Zilong Zhang

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS

Improving Range Query Performance on Historic Web Page Data

Diagonal Principal Component Analysis for Face Recognition

Partitioning Methods for Multicast in Bufferless 3D Network on Chip

Design and Implementation of Multistage Interconnection Networks for SoC Networks

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee

Geometric transformations assign a point to a point, so it is a point valued function of points. Geometric transformation may destroy the equation

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011

Capacity of Grid-Oriented Wireless Mesh Networks

Testing Isomorphism of Strongly Regular Graphs

An Empirical Study of Performance Benefits of Network Coding in Multihop Wireless Networks

A novel look-ahead routing algorithm based on graph theory for triplet-based network-on-chip router

Complementary Graph Coloring

Design of a System-on-Chip Switched Network and its Design Support Λ

4. Networks. in parallel computers. Advances in Computer Architecture

Quantification of Capacity and Transmission Delay for Mobile Ad Hoc Networks (MANET)

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

Constant Queue Routing on a Mesh

6 Randomized rounding of semidefinite programs

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

A Combined BIT and TIMESTAMP Algorithm for. the List Update Problem. Susanne Albers, Bernhard von Stengel, Ralph Werchner

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Transcription:

Oblivious Routing Design for Mesh Networs to Achieve a New Worst-Case Throughput Bound Guang Sun,, Chia-Wei Chang, Bill Lin, Lieguang Zeng Tsinghua University, University of California, San Diego State Key Laboratory on Microwave and Digital Communications, Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 00084, China Abstract / networ capacity is often believed to be the limit of worst-case throughput for mesh networs. However, this paper provides a new worst-case throughput bound, which is higher than / networ capacity, for odd radix two-dimensional mesh networs. In addition, we propose a routing algorithm called UTURN that can achieve this worstcase throughput bound for odd radix meshes. For even radix meshes, we prove that UTURN achieves the optimal worst-case throughput, namely, half of networ capacity. UTURN considers all routing paths with at most turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that UTURN outperforms existing routing algorithms in worstcase throughput. Moreover, UTURN achieves good average-throughput at the expense of approximately.5x minimal average hop count. For asymmetric meshes, we further propose an algorithm called UTURN- A and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that UTURN and UTURN-A outperform existing algorithms VAL, DOR and OTURN in both worstcase and average throughput for asymmetric meshes. I. INTRODUCTION Networs-on-chip NoC) has been proposed as a new intrachip communication infrastructure. Regular topologies, especially the mesh topology, have been widely deployed [][]. The choice of oblivious routing algorithm, which is oblivious of the networ state when determining a route, has a significant impact on the networ performance, such as the throughput and the latency. Average-case throughput provides an average performance metric, whereas worstcase throughput provides a performance guarantee. It is well-nown and has been shown that the optimal worst-case throughput for oblivious routing in two-dimensional mesh networs is / of the networ capacity when the radix is even. For example, such an analysis for the even radix case is provided in []. For the odd radix case, it is also well-nown that a worst-case throughput of / networ capacity can be achieved. This / of networ capacity is often believed to be the limit of the worstcase throughput. For example, in the near optimality analysis of the OTURN routing algorithm [4], the near optimality of OTURN in the odd radix case is relative to a presumed / networ capacity limit. In this paper, we provide a new worst-case throughput bound of +)/ +) of networ capacity for the odd radix twodimensional mesh with radix, which is higher than / networ capacity. We propose a non-minimal oblivious routing algorithm called UTURN uniform -turn) that can achieve this worst-case throughput bound. Improved analytical results for oblivious routing are important because they often form the theoretical underpinnings for a variety of networ analysis problems. The main contributions of this paper are as follows: a) We provide a new worst-case throughput bound, which is higher than / networ capacity, for the odd radix two-dimensional mesh. b) UTURN achieves this worst-case throughput. It outperforms VAL [0], DOR [5] and OTURN [4] in worst-case throughput for all odd radix mesh networs. For even radix two-dimensional mesh, we proved UTURN achieves optimal worst-case throughput, namely, half of networ capacity. c) For average-case throughput, UTURN also outperforms VAL, DOR, and OTURN, on average over the six evaluated networ sizes both odd and even radix), by 7.%, 39.7% and 8.8%, respectively. d) For asymmetric meshes, we describe a variant called UTURN- A and analyze the theoretical performances for different routing algorithms. Both theoretical analysis and simulation show that UTURN and UTURN-A achieve better worst-case and average case throughput than other existing algorithms VAL, DOR and OTURN. The rest of this paper is organized as follows. Section II provides a brief bacground on routing algorithms and performance metrics. Section III describes our UTURN routing algorithm and provides theoretical performance analysis. Section IV discusses the performance in asymmetric meshes for different algorithms. Section V provides simulation results. Finally, Section VI draws conclusions. II. BACKGROUND A. Routing Algorithms In this section, we first enumerate some classic routing algorithms in NoC, and then we discuss the pros and cons of each routing scheme by comparing it with our routing design. Dimension-order routing DOR)[5] algorithm is one of the most popular routing algorithms. However, it suffers from poor worstcase and average-case throughput due to the lac of routing diversity [4]. On the other hand, Valiant proposed a routing algorithm called VAL [0] that offers excellent worst-case throughput half of networ capacity), but it suffers from poor average-case throughput and long routing path [6]. ROMM [3] overcomes some of the drawbacs in DOR and VAL. It achieves minimal-length routing and good averagecase throughput. However, its worst-case throughput is very poor. Thus, these existing routing algorithms suffer from either poor worstcase throughput e.g. DOR, ROMM) or poor latency e.g. VAL). OTURN [4] achieves both minimal-length routing and good throughput. For D odd radix mesh networs, OTURN has worstcase throughput within a factor of / of half of networ capacity. However, instead of using at most one turn in OTURN routing, our UTURN routing considers all routing paths with at most turns and distributes the traffic uniformly in both X and Y dimensions, which results in higher worst-case throughput than half of networ capacity for all odd radix mesh networs. Simulations show that UTURN outperforms the above routing algorithms in both worst-case and average-case throughput. Although the routing algorithms described in [9] and [7] also use routes with at most turns, they are developed for torus networs rather than meshes. Besides, TURN [9] does not have a closedform description. It requires solving a linear programming for each networ radix. But the linear programming grows rapidly, maing it difficult to scale to large networs [7]. IVAL [9] has only been shown to achieve the same worst-case throughput as VAL for torus networs, whereas UTURN has a closed-form description and 978--4673-305-7//$3.00 0 IEEE 47

achieves a higher worst-case throughput than VAL in the case of odd radix mesh networs. Lie ITURN and WTURN described in [7] for torus networs, UTURN also provides closed-form analytical expressions for evaluating the average path length. These closedform analytical expressions helped us to derive a routing algorithm that achieves a worst-case throughput better than the / networ capacity previously believed to be the limit of worst-case throughput for mesh networs. B. Performance Metrics In this section, we give the performance metrics, such as worstcase and average-case throughput, for oblivious routing algorithms. In order to simplify the discussion on these performance metrics, lie [4] [6], we ignore flow control issues, and assume single flit pacets that route from node to node in one cycle. In particular, we elaborate on the concept of networ capacity, and provide a brief overview of analytical methods to evaluate worst-case and averagecase throughput for oblivious routing algorithms. Networ Capacity: Networ capacity is defined by the maximal sustainable channel load Υ when a networ is loaded with uniformly distributed traffic [4]. For any -ary n-mesh [][6], independent of n, is even Υ 4 = ) is odd 4 where Υ is the inverse of the networ capacity. Worst-Case Throughput: For a given routing algorithm R and traffic matrix Λ, the maximal channel load ΥR, Λ) is the expected traffic loads crossing the heaviest loaded channel under R and Λ [6]. For a given routing algorithm R, the worst-case channel load Υ wcr) is the heaviest channel load that can be caused by any admissible traffic [6]. A double sub-stochastic traffic matrix Λ is admissible when it has row and column sums of at most. For a given routing algorithm, [8] provided a method to find worst-case traffic that can cause worst-case channel load and worst-case throughput. Note that the worst-case channel load is the inverse of worst-case throughput. Lie [6][4], we use the normalized worst-case throughput, which is normalized to the networ capacity, as worst-case performance metric: «ΥwcR) Θ wcr) = ) Average-Case Throughput: As for average performance metric, we use normalized average-case throughput. Using the same method in [6][4][8], the average-case throughput for a given routing algorithm R is found by averaging the throughput over a large set of random traffic matrix T e.g. T = 000000) III. UTURN ROUTING In this section, we first describe our UTURN routing algorithm for mesh networs, and then we provide the performance analysis, including throughput analysis and latency analysis. A. Routing Description As the name implies, UTURN considers any routing path between the source and destination with at most turns, where each turn changes dimension from X to Y or Y to X. Note that U-turns, turns that reverse direction along the same dimension, are not allowed in UTURN. In comparison to OTURN routing, UTURN is more flexible in that OTURN routes are a subset of UTURN routes. OTURN only uses routes with at most one turn. However, our UTURN routing considers all routing paths with at most turns and balances Υ the traffic in both X and Y dimensions. The greater flexibility and more uniform traffic distribution enable UTURN to achieve better throughput performance. For implementation, UTURN routing consists of two parts: XYX- Routing with 0.5 probability and YXY-Routing with also 0.5 probability. First, we elaborate on XYX-Routing. Suppose x,y ) is the source node and x,y ) is the destination node. XYX-Routing is a three-segment-routing generated as follows: a) X-segment: choose at uniform random an X position x [0, ], and route in X-dimension from source node x,y ) to x,y ). b) Y-segment: route from x,y ) to x,y ) in Y- dimension. c) X-segment: route from x,y ) to destination node x,y ) in X-dimension. Each of these three segments uses minimal-length routing, although the combination of segments can result in non-minimal routing along the entire path. There is a degenerate case. When y = y, it just needs one-segment routing: route directly from x,y ) to x,y ) using minimal-length routing in the X-dimension. Fig. a) depicts five possible XYX-Routing paths from source node to destination node for 5 5 mesh, where each path has a probability of /5. YXY-Routing is similar to XYX-Routing for symmetry, which is shown in Fig. b). There is also a degenerate case for YXY-Routing. When x = x, it just needs one-segment routing: route directly from x,y ) to x,y ) using minimal-length routing in the Y-dimension. Fig.. S D a) XYX-Routing S D b) YXY-Routing Routing paths for XYX-Routing and YXY-Routing in 5 5 mesh. From the aforementioned routing description, we now that XYX- Routing and YXY-Routing cover all routing paths between the source and destination with at most turns. In addition, XYX- Routing distributes the traffic uniformly along the X-Dimension, and YXY-Routing distributes the traffic uniformly along the Y- dimension. UTURN uses XYX-Routing and YXY-Routing with equal probability, so that it balances traffic loads in both X and Y dimensions, which results in a high throughput. As for the deadloc problem, two virtual channels per physical channel is sufficient to achieve deadloc-free routing by incrementing a pacet s virtual channel set after each turn from the Y to the X dimension [6]. B. Performance Analysis ) Throughput Analysis: In this section, we prove that our UTURN achieves a new worst-case throughput bound of + )/ +) of networ capacity for an odd radix two-dimensional mesh with radix, which is higher than / of networ capacity. For even case, UTURN achieves optimal worst-case throughput, namely, half of networ capacity. We prove these by three claims as follows. Claim. For one-dimensional mesh, the worst-case channel load of minimal-length routing is when the radix is odd and when is even. Proof: [8] proposed a method, which find the worst-case traffic for any given oblivious routing algorithm in a given topology. So, we use this method, and find that the worst-case traffic for 48

minimal-length routing in one dimension mesh is Tornado: from x to x + ) mod. The heaviest channel load of the minimal-length routing under Tornado traffic matrix is when is odd and when is even. Therefore, this is the worst-case channel load for minimal-length routing in one-dimensional mesh. As the worst-case throughput is the inverse of worst-case channel load, the worst-case throughput of minimal-length routing in odd radix one-dimensional mesh is, which is higher than half of networ capacity. For even radix one-dimensional mesh, the worst-case throughput of minimal-length routing is, which is optimal, namely, half of networ capacity. Claim. For the odd radix two-dimensional mesh, the worst-case channel load of XYX-Routing is: in Y-dimension and in X- dimension, respectively. For the even radix two-dimensional mesh, the worst-case channel load of XYX-Routing is in X and Y dimensions. Proof: As discussed in Section II-B, the maximal channel load under uniformly distributed traffic is when is odd and 4 4 when is even. XYX-Routing consists of three segments routing, and the two X-segment routings get twice uniform. The maximal channel load under twice uniform traffic matrix is simply ) 4 = when is odd and )= when is even. Thus, for 4 two-dimensional mesh, the worst-case channel load of XYX-Routing in X-dimension is when the radix is odd and when the radix is even. As for the Y-dimension case in XYX-Routing, it uses minimallength routing from x,y ) to x,y ). Since traffic is uniformly distributed along the X-dimension, the Y-dimension channels with different X positions but the same Y position have the same traffic loads. So the traffic of Y-dimension of XYX-Routing is degenerated into the case in Claim, namely, minimal-length routing in onedimension mesh. Thus, we get that the maximal channel load in Y dimension for XYX-Routing is when the radix is odd and when is even. Therefore, the worst-case channel load for XYX-Routing is when the radix is odd and when the radix is even. And the worst-case throughput for XYX-Routing is when the radix is odd and when the radix is even. As shown in Section II-B, the networ capacity for mesh is the inverse of Υ 4, namely, when the radix is odd and 4 when the radix is even. Thus, the worst-case throughput for XYX-Routing equals to half of networ capacity. Claim 3. For odd radix two-dimensional mesh, UTURN achieves a worst-case throughput bound of +)/+) of networ capacity, which is higher than half of networ capacity. For even radix twodimensional mesh, UTURN achieves optimal worst-case throughput, namely, half of networ capacity. Proof: The worst-case channel load analysis for YXY-Routing is the same with XYX-Routing because of symmetry. Similarly, it follows that the worst-case channel load of YXY-Routing in odd radix two-dimensional mesh is in X-dimension and in the Y-dimension, respectively. For the even radix two-dimensional mesh, the worst-case channel load of YXY-Routing is in X and Y dimensions. UTURN consists of two parts: XYX-Routing with 0.5 probability and YXY-Routing with also 0.5 probability. Thus, for odd radix twodimensional mesh, the worst-case channel load of UTURN in X- dimension is 0.5 )+0.5 )=. Similarly, the worstcase channel load of UTURN in Y-dimension is also 4. 4 So, the the worst-case channel load of UTURN is and 4 the worst-case throughput is in odd radix two-dimensional 4 mesh. For even radix two-dimensional mesh, the worst-case channel load of UTURN in X-dimension and Y-dimension are all 0.5 )+ 0.5 )=. So, the the worst-case channel load of UTURN is and the worst-case throughput is in even radix two-dimensional mesh. 4 As shown in Section II-B, the networ capacity for mesh is when the radix is odd and 4 when the radix is even. Therefore, for an odd radix two-dimensional mesh, UTURN achieves a worst-case 4 4 throughput bound of / = +)/ +) of networ capacity, which is higher than half of networ capacity for all radix. For an even radix two-dimensional mesh, UTURN achieves a worst-case throughput of ` ` / 4 =0.5, which is optimal. Fig. shows that UTURN outperforms VAL, DOR and OTURN in worst-case throughput for different odd radix networs. The worstcase throughput in even radix case is optimal and we do not show the even results in Fig.. Different with UTURN, the optimal routing including TURN) proposed in [9] does not have a closed-form description, which requires solving a separate linear programming for each networ radix. The linear programming grow so fast that it is difficult to scale to large networs [7]. Note that UTURN achieves the same optimal worst-case throughput as the optimal routing that can be obtained with the method proposed in [9]. Normalized Throughput Fig.. 0.6 0.5 0.4 0.3 0. VAL DOR 0. Our algorithm OTURN Optimal Routing 0 3 5 7 9 3 Networ Radix ) Worst-case throughput for odd radix meshes. ) Latency Analysis: For latency analysis, we use the average hop count H aver) for a given routing algorithm R to measure the average latency. We assume that the traffic between all sourcedestination pairs are all the same when computing H aver) [6]. The average hop count for DOR in -ary -mesh networ is, 3 where each component represents the average hop count in 3 each dimension using minimal-length routing [][6]. In UTURN, minimal-length routing is used in three segments. However, as mentioned in Section III-A, there is a degenerated case when y = y, which has a probability of /. In this degenerated case, there is only one minimal-length routing segment. Therefore, the average hop count of UTURN is given as follows: H aveuturn) = 3 3 )+ 3 = 3 3 ) 3) In order to quantify the penalty factor in average latency for given routing algorithm R over DOR, we normalize the average hop count 49

of R to DOR. So the penalty factor for UTURN is 3, which is less than.5 for all radix. Compared with VAL, which has a penalty factor of, UTURN achieves better performance in both throughput and latency. UTURN is a non-minimal routing, which has.5x average hop count when compared with OTURN and DOR. In comparison with DOR and OTURN, UTurn offers a latency-bandwidth tradeoff for mesh networs where it offers better throughput at the expense of a higher latency cost. Fig. 3. routing. a) DOR-XY routing b) DOR-YX routing Traffic patterns which achieve maximal load lower bound in DOR IV. ASYMMETRIC MESHES Networ Capacity: SectionII-B discusses the networ capacity for -ary n-mesh networs, namely symmetric meshes. We can get the networ capacity in asymmetric meshes using the same analysis method with symmetric meshes [][6]. Suppose max = max x, y) in the asymmetric meshes, the maximal sustainable channel loads under uniform traffic are max Υ 4 max is even asym = 4) max is odd max 4 max where Υ asym is the inverse of the networ capacity. A. Throughput Analysis for Different Algorithms In the following, we will analyze the worst-case throughput for different routing algorithms in asymmetric meshes with the assumption of max = x. For the case of max = y, these algorithms just need to exchange the operations between X and Y dimensions. VAL: For VAL routing, the two-stage minimal routing gets twice uniform loads. Thus, the worst-case channel loads of VAL is Υ asym, which results in normalized worst-case throughput T WCVAL)=. 5) DOR: For DOR routing, the worst-case throughput can be different for various y in the meshes where max = x > y >. However, we can get a maximal load bound for different y: max max is even L maxdor) + max 6) max is odd The result for the even max case is easy to find under the traffic: the left half part nodes are sources and the right half part nodes are destinations. For the odd max case, the result can be found under the traffic patterns showed in Fig. 3, where the red nodes represent sources and the green nodes represent destinations. For both DOR- XY and DOR-YX routings, the channel load represented by blac arrows achieve the maximal load bound of +max. Eqn. 6 just shows the lower bounds. Absolutely, there are worse cases for different y. For example, when x y =, the maximal loads are max for both even and odd max, which are showed in Fig. 4. Therefore, according to the Eqn. 6, the upper bound of normalized worst-case throughput for DOR routing is T WCDOR) max is even max max < max is odd OTURN: For OTURN, we can also get a maximal load bound as following: L maxoturn) max, max is even or odd. 8) Fig.5 shows the traffic patterns for this load lower bound in both even and odd max cases. For 7 6 mesh Fig. 5a)) and 6 5 mesh Fig. 5b)), the channel loads represented by blac arrows are 3.5 and 7) Fig. 4. a) DOR-XY routing 7 x 6) b) DOR-YX routing6 x5) Examples where the maximal loads are max in DOR routing. 3, respectively. Therefore, according to the Eqn. 8, the upper bound of normalized worst-case throughput for OTURN routing is T WCOTURN) max is even max max < max is odd a) OTURN routing 7 x 6) b) OTURN routing6 x5) Fig. 5. Traffic patterns which achieve maximal load lower bound in OTURN routing. UTURN: Section III-A describes the UTURN algorithm for the symmetric case. According to the analysis of Claim in the Section III-B, we can get the maximal channel loads in X and Y dimensions L x and L y) in the mesh x > y for YXY and XYX routing, which are showed in Table I. For meshes x > y, the maximal channel load is decided by L x because L x >L y for all the cases in Table I. YXY-routing achieves excellent worst-case channel loads when x > y: max max is even L maxyxy) = max 0) max is odd Thus, the normalized worst-case throughput for YXY-routing in the mesh x > y is T WCYXY) = max is even max+ max > max is odd 9) ) For meshes y > x, XYX-routing achieves the same worst-case throughput with YXY-routing in meshes x > y, namely, Eqn.. Therefore, UTURN only implements YXY-routing in x > y meshes. In x < y meshes, UTURN only implements XYXrouting. Thus, UTURN achieves the worst-case throughput of Eqn.. UTURN-A: UTURN achieves the optimal worst-case throughput for both odd and even radix networs and for both symmetric and asymmetric networs. In the case of asymmetric meshes, we can employ a variant of UTURN that we refer to as UTURN-A, UTURN-A is a variant of UTURN that optimizes for average case performance in asymmetric meshes. 430

TABLE I MAXIMAL CHANNEL LOADS IN X AND Y DIMENSIONS FOR XYX-ROUTING AND YXY-ROUTING. x y mesh XYX-routing YXY-routing even x L x = x L x = x even y L y = y L y = y even x L x = x L x = x odd y L y = y L y = y y odd x L x = x x L x = x even y L y = y L y = y odd x L x = x L x = x x odd y L y = y L y = y y better average-case throughput by 4.6%, 0.8% and 4.5%, respectively. Compared with DOR, OTURN and VAL, our UTURN achieves better average-case throughput by.9%, 8.5% and 8.8%, respectively. Compared with UTURN-A, UTURN achieves better worst-case throughput but worse average throughput. which can achieve better average case throughput than UTURN at the expense of worst-case throughput optimality. UTURN-A is implemented by using XYX-routing and YXY-routing with the same probability. According to Table I, UTURN-A achieves worst-case channel loads when x > y: Fig. 6. Worst-case throughput for different routing algorithms in different asymmetric mesh sizes. L maxuturn-a) =L xuturn-a) = LxXYX)+ LxYXY) max max is even = max ) max+) 4 max max is odd ) For meshes x < y, UTURN-A achieves the same maximal channel load with the x > y case, namely, Eqn., excepting that max = y. So, the normalized worst-case throughput is T WCUTURN-A) = max is even max is odd max+ max+ > 3) As can be seen in Eqn. 3, the worst-case throughput of UTURN- A is the same as that of UTURN for the even radix case, which is optimal. However, in the odd radix case, although the worst-case throughput of UTURN-A is better than / networ capacity, it is only near-optimal in that UTURN-A cannot achieve the worst-case throughput optimal result of UTURN as shown in Eqn. ). B. Throughput Comparison for Different Algorithms According to Equations 5, 7, 9, and 3, our UTURN and UTURN-A algorithms achieve better worst-case throughput than VAL, DOR and OTURN in asymmetric meshes. The aforementioned theoretical analysis shows that UTURN and UTURN-A achieve best worst-case throughput of / networ capacity in the asymmetric meshes with even max. For the asymmetric meshes with odd max, our algorithms UTURN and UTURN-A achieve a worstcase throughput better than / networ capacity. Fig. 6 shows the normalized worst-case throughput of different routing algorithms for different asymmetric mesh sizes. It shows that our UTURN and UTURN-A algorithms outperform existing algorithms lie VAL, DOR and OTURN in worst-case throughput, which also validates the aforementioned theoretical worst-case throughput analysis for asymmetric meshes. As for average-case throughput, our algorithms also achieves excellent performance. Fig. 7 shows the average-case throughput of different routing algorithms for different asymmetric mesh sizes. Compared with DOR, OTURN and VAL, UTURN-A achieves Fig. 7. Average-case throughput for different routing algorithms in different asymmetric mesh sizes. V. EVALUATION FOR SYMMETRIC MESHES In this section, we compare the throughput of UTURN with VAL, DOR and OTURN with respect to each traffic matrix shown in Table II. Table III and Table IV provide the evaluation results for different odd and even radix symmetric networs respectively. As shown in Table III, UTURN outperforms all other routing algorithms in worst-case throughput for odd radix mesh networs, and achieves a new optimal worst-case throughput bound of +)/+ ), which is higher than / of networ capacity. For the performance in the even radix mesh networs showed in Table IV, UTURN achieves the best worst-case throughput, namely, half of networ capacity. This also validates the throughput analysis of UTURN in Section III-B. For average-case throughput, UTURN also outperforms VAL, DOR, and OTURN, on average over the six evaluated networ sizes both odd and even radix), by 7.%, 39.7% and 8.8%, respectively. UTURN also performs very well in both odd and even radix networs under adversarial traffic patterns, namely, Transpose and DOR-WC. For Complement and Nearest-Neighbor traffic, UTURN outperforms VAL, but does worse than DOR and OTURN. For ideal uniformly Random traffic, although DOR achieves better throughput than UTURN, DOR is significantly worse in the average and worse case throughput. In summary, our evaluation shows that UTURN performs very well in different networ sizes and traffic matrices. VI. CONCLUSION UTURN is a closed-form non-minimal oblivious routing algorithm for mesh networs. We prove it achieves a new worst-case throughput bound of +)/ +) of networ capacity with odd radix, which is higher than / networ capacity. For even radix 43

Worst-Case Average- Case Transpose Random DOR-WC Complement Nearest- Neighbor TABLE II TRAFFIC MATRICES EVALUATED. Worst-case traffic matrix that results in lowest throughput Average throughput over a million random traffic matrix Pacets route from x,y) to y,x) Destination node chosen at uniform random Worst-case traffic for DOR. Pacets route from x,y) to -y-,-x-) Pacet at x,y) sent to -x-,-y-). Pacets sent to nodes one hop away with equal probability TABLE III NORMALIZED WORST-CASE &AVERAGE-CASE THROUGHPUT FOR ODD CASE 3x3 mesh VAL DOR OTURN UTURN Worst-case 0.5 0.33 0.44 0.57 Average-case 0.5 0.405 0.477 0.604 Transpose 0.5 0.33 0.67 0.80 Random 0.5 0.7 DOR-WC 0.5 0.33 0.67 0.80 Complement 0.5 0.67 0.67 0.57 Nearest-Neighbor 0.5.33.33 0.75 5x5 mesh VAL DOR OTURN UTURN Worst-case 0.5 0.3 0.48 0.55 Average-case 0.5 0.44 0.59 0.63 Transpose 0.5 0.3 0.6 0.75 Random 0.5 0.685 DOR-WC 0.5 0.3 0.6 0.75 Complement 0.5 0.6 0.6 0.55 Nearest-Neighbor 0.5.4.4.7 7x7 mesh VAL DOR OTURN UTURN Worst-case 0.5 0.86 0.49 0.53 Average-case 0.5 0.46 0.550 0.640 Transpose 0.5 0.86 0.57 0.73 Random 0.5 0.686 DOR-WC 0.5 0.86 0.57 0.73 Complement 0.5 0.57 0.57 0.533 Nearest-Neighbor 0.5 3.4 3.4.3 meshes, UTURN achieves optimal worst-case throughput. Thus, it outperforms existing algorithms such as VAL, DOR and OTURN in worst-case throughput. In addition, when compared with the optimal routings that can be obtained with the method proposed in [9] for odd radix meshes, UTURN can achieve the same optimal worst-case throughput. For average-case throughput, UTURN also outperforms VAL, DOR and OTURN on average of six simulation cases by 7.%, 39.7% and 8.8%, respectively. Moreover, we provide a closed-form expression for the average hop count of UTURN in latency analysis. In comparison with DOR and OTURN, UTURN achieves a higher throughput at the expense of an average hop count that is approximately.5x minimal. For asymmetric meshes, both theoretical analysis and simulation show that our UTURN and UTURN-A algorithms achieve better worst-case and average case throughput than other existing algorithms VAL, DOR and OTURN. VII. ACKNOWLEDGMENTS This wor was partially supported by the Creative Research Groups of National Natural Science Foundation NNSF) under grant No. 6000 and 63305, the National S&T Major Project under No. 0ZX03004-00-0 and 009ZX03006-007-0, Research Fund of Tsinghua University under No. 008099, and ZTE Corporation. TABLE IV NORMALIZED WORST-CASE &AVERAGE-CASE THROUGHPUT FOR EVEN CASE 4x4 mesh VAL DOR OTURN UTURN Worst-case 0.5 0.33 0.5 0.5 Average-case 0.5 0.48 0.54 0.64 Transpose 0.5 0.33 0.67 0.80 Random 0.5 0.7 DOR-WC 0.5 0.33 0.67 0.80 Complement 0.5 0.5 0.5 0.5 Nearest-Neighbor 0.5. 6x6 mesh VAL DOR OTURN UTURN Worst-case 0.5 0.3 0.5 0.5 Average-case 0.5 0.47 0.556 0.65 Transpose 0.5 0.3 0.6 0.75 Random 0.5 0.68 DOR-WC 0.5 0.3 0.6 0.75 Complement 0.5 0.5 0.5 0.5 Nearest-Neighbor 0.5 3 3.7 8x8 mesh VAL DOR OTURN UTURN Worst-case 0.5 0.86 0.5 0.5 Average-case 0.5 0.479 0.57 0.65 Transpose 0.5 0.86 0.57 0.73 Random 0.5 0.67 DOR-WC 0.5 0.86 0.57 0.73 Complement 0.5 0.5 0.5 0.5 Nearest-Neighbor 0.5 4 4.43 The authors would also lie to acnowledge the support of NSF CCF-6667. REFERENCES [] W. Dally and B. Towles. Principles and practices of interconnection networs. Morgan Kaufmann, 004. [] S. Guang, L. Shijun, J. Depeng, L. Yong, S. Li, Y. Zhang, and Z. Lieguang. Performance-aware hybrid algorithm for mapping ips onto mesh-based networ on chip. IEICE TRANSACTIONS on Information and Systems, 945):000 007, 0. [3] T. Nesson and S. Johnsson. Romm routing on mesh and torus networs. In Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, pages 75 87. ACM, 995. [4] D. Seo, A. Ali, W. Lim, N. Rafique, and M. Thottethodi. Near-optimal worst-case throughput routing for two-dimensional mesh networs. In ACM SIGARCH Computer Architecture News, volume 33, pages 43 443. IEEE Computer Society, 005. [5] H. Sullivan and T. Bashow. A large scale, homogeneous, fully distributed parallel machine, i. In ACM SIGARCH Computer Architecture News, volume 5, pages 05 7. ACM, 977. [6] R. Sunam Ramanujam and B. Lin. Randomized partially-minimal routing on three-dimensional mesh networs. Computer Architecture Letters, 7):37 40, 008. [7] R. Sunam Ramanujam and B. Lin. Weighted random oblivious routing on torus networs. In Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networing and Communications Systems, pages 04. ACM, 009. [8] B. Towles and W. Dally. Worst-case traffic for oblivious routing functions. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 8. ACM, 00. [9] B. Towles, W. Dally, and S. Boyd. Throughput-centric routing algorithm design. In Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, pages 00 09. ACM, 003. [0] L. Valiant and G. Brebner. Universal schemes for parallel communication. In Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages 63 77. ACM, 98. 43