Deadlock-free XY-YX router for on-chip interconnection network

Similar documents
Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

NOC Deadlock and Livelock

Deadlock and Livelock. Maurizio Palesi

Evaluation of NOC Using Tightly Coupled Router Architecture

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Lecture 3: Flow-Control

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Demand Based Routing in Network-on-Chip(NoC)

Lecture 18: Communication Models and Architectures: Interconnection Networks

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

Area and Power-efficient Innovative Network-on-Chip Architecture

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

High Performance Interconnect and NoC Router Design

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

TDT Appendix E Interconnection Networks

Low Cost Network on Chip Router Design for Torus Topology

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Efficient And Advance Routing Logic For Network On Chip

Flow Control can be viewed as a problem of

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

Ultra-Fast NoC Emulation on a Single FPGA

Dynamic Router Design For Reliable Communication In Noc

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Basic Switch Organization

Fault-adaptive routing

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

EE 382C Interconnection Networks

Lecture: Interconnection Networks

EC 513 Computer Architecture

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

AS SILICON technology enters the nanometer-scale era,

Basic Low Level Concepts

Asynchronous Bypass Channel Routers

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 22: Router Design

ISSN Vol.03, Issue.02, March-2015, Pages:

Design and implementation of deadlock free NoC Router Architecture

Prediction Router: Yet another low-latency on-chip router architecture

A Novel Energy Efficient Source Routing for Mesh NoCs

A Modified NoC Router Architecture with Fixed Priority Arbiter

Networks-on-Chip Router: Configuration and Implementation

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

NoC Test-Chip Project: Working Document

Fault-tolerant & Adaptive Stochastic Routing Algorithm. for Network-on-Chip. Team CoheVer: Zixin Wang, Rong Xu, Yang Jiao, Tan Bie

Lecture 23: Router Design

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Encoding Scheme for Power Reduction in Network on Chip Links

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Configurable Router Design for Dynamically Reconfigurable Systems based on the SoCWire NoC

ISSN Vol.03,Issue.06, August-2015, Pages:

4. Networks. in parallel computers. Advances in Computer Architecture

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

Generic Methodologies for Deadlock-Free Routing

On an Overlaid Hybrid Wire/Wireless Interconnection Architecture for Network-on-Chip

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

OASIS Network-on-Chip Prototyping on FPGA

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

A Hybrid Interconnection Network for Integrated Communication Services

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

Implementation of PNoC and Fault Detection on FPGA

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

ACCORDING to the International Technology Roadmap

Network-on-Chip Architecture

Architecture and Routing in NOC Based FPGAs

Buffer Pocketing and Pre-Checking on Buffer Utilization

A dynamic distribution of the input buffer for on-chip routers Fan Lifang, Zhang Xingming, Chen Ting

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Application Specific Buffer Allocation for Wormhole Routing Networks-on-Chip

A hardware operating system kernel for multi-processor systems

Transcription:

LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ of Science and Technology 232 Gongreung ro Nowon gu Seoul 139 743 Korea a) seung lee seoultech ac kr Abstract: Specifically, based on the observation that a response is always preceded by a request in multi-processor SoCs, this letter proposes a novel deadlock-free XY-YX router for on-chip network performance improvement. In order to avoid deadlock, we add additional physical channels in the horizontal direction and optimize the priority of output channel allocation. Simulation results show the enhancement in the throughput of an NoC. Keywords: network-on-chip, interconnection network, multi-processor SoC Classification: Integrated circuits References [1] X. Li, Y. Cao, L. Wang and T. Cai: Wuhan University Journal of Natural Sciences 14 [4] (2009) 343. [2] P. Ghosal and T. S. Das: Emerging Applications of Information Technology (EAIT) (2012) 471. [3] S. E. Lee, J. H. Bahn and N. Bagherzadeh: International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (2007) 211. [4] Z. Fang, E. G. Hallnor, B. Li, M. Leddige, D. Dai, S. E. Lee, S. Makineni and R. Iyer: IEEE Comput. Archit. Lett. 9 [2] (2010) 49. [5] D. Seo, A. Ali, W. Lim, N. Rafique and M. Thottehodi: International Symposium on Computer Architecture (ISCA) (2005) 432. [6] R. Gindin, I. Cidon and I. Keidar: First International Symposium on Networks-on-Chips (NOCS) (2007) 253. 1 Introduction Network-on-chips (NoCs) have been studied as an alternative for bus architecture to communicate among large-scale IPs efficiently. While designing a NoC structure, a wide variety of topologies should be considered that can comprise a network and a routing algorithm to select a path, from a source to a destination, for effective communication. Traditional XY routing, which is often used in torus and mesh topology, has the advantage of deadlock- and livelock-free characteristics and it is easy to design. However, there is a drawback that the traffic is not 1

distributed evenly during communication, resulting in congestion. Therefore, modified XY routing techniques that switch between deterministic or adaptive mode depending on the circumstances of the congested network have been proposed [1, 2, 3]. Research combining XY routing and YX routing has recently been performed [4, 5, 6]. The objective of these studies is to determine the ratio of the two optimal routing techniques considering the traffic circumstances in order to obtain better performance when implementing the combined XY-YX routing algorithm. This letter proposes a novel deadlock-free XY-YX router architecture combining the XY and YX routing in accordance with the packet information and analyzes the performance of our proposal according to the XY and YX packet ratios. 2 Conventional XY-YX routing architectures XY-YX routing can be used in various applications. As an example, it has the benefits of performance improvement and power reduction by predicting response packet paths in advance and eliminating wake-up delay [4]. However, designers of XY-YX combined routing networks should consider the fact that its performance varies depending on the degree of the traffic congestion. In other words, efforts to adjust the transmission ratios of the XY and YX packets are required in order to combine the two routing techniques. Moreover, deadlock, which occurs when combining the two routing techniques, also has to be prevented in the network. The algorithms in [5, 6] try to control the transmission ratios of XY and YX packets to maintain optimal load balancing. This technique requires multiple virtual channels per physical channel to resolve the deadlock. Our proposal avoids deadlock by adding two disjoint horizontal channels instead of using virtual channels. Though our approach for deadlock freedom requires additional resources to build two physical channels, it can reduce the complexity in a routing algorithm, which should control multiplexing of virtual channels to escape a deadlock situation if virtual channels are utilized for this purpose. Additionally, the overhead to add physical channels can counterbalance the cost to allocate virtual channel buffers and associated control logic. Therefore, our approach of providing physical channels in the horizontal direction can be beneficial in its own way. The use of horizontal channels is constrained by the direction of delivered data. That is, each horizontal channel is exclusively used depending on XY or YX routing of the delivered packet. 3 Deadlock free XY-YX router design Figure 1 illustrates a single router with seven input and six output channel ports employed in every node of the network. There are two east and two west channel pairs reaching each router, providing two disjoint subnetworks for deadlock freedom. The proposed deadlock-free XY-YX router consists of an arbiter and a crossbar switch and supports wormhole flow control for packet transmission in a mesh topology. There is an input FIFO queue per each input channel. All incoming packets into the router have a single-bit signal to determine whether XY or YX routing is applied. The arbiter allocates the outgoing channel for incoming packets and sends 2

Fig. 1. Deadlock-free XY-YX router architecture signals to the crossbar switch. The proposed router is easy to implement based on an existing XY router and has higher performance with less overhead. As shown in [4], the XY-YX router can also be used as a lowpower architecture so that it has a power-efficient feature at the system level. For the outgoing channel allocation, the router applies a fixed priority scheme to reduce the complexity of the router. Initially, the possible incoming channels have a descending order of priority in a clock-wise direction for each outgoing channel (referred to as origin hereafter). However, we have observed that vertical channels suffer from high congestion due to the additional channels in the horizontal direction. Thus, the output selection policy is modified to have higher priority in vertical channels than in horizontal channels (referred to as modified hereafter), forwarding packets coming from the north and south, firstly. In addition, in order to prevent a situation in which certain packets remain in the input queue for a long time, we adopt an aging counter, which increases the priority of the packet over time. 4 Experimental results In order to evaluate the performance of our XY-YX routing algorithm, we have developed a router written in Verilog TM HDL. For the measurement of throughput and adjusting incoming traffic, we adopted a standard interconnection network measurement setup where the packet generation is placed in front of an infinite depth source queue, and the input timing of each packet is measured whenever it is generated. In this simulation, the packet length is fixed to eight flits (one head flit and seven body flits) even though the packet format supports packets of various sizes. Each graph represents offered traffic (flit/node/cycle) on the X-axis and average latency (cycles) on the Y-axis. Figure 2 illustrates the performance of the origin priority of XY-YX routing (origin), the modified priority of XY-YX routing (modified), and the existing XY routing according to XY-YX data ratios in 8 8 mesh topology. The origin router exhibits better performance than existing XY routing and the modified router outperforms in terms of throughput for all traffic patterns, demonstrating the feasibility and superiority of our proposal. The work in [5] tried to balance the XY and YX traffic in order to 3

Fig. 2. The 8 8 mesh network comparison based on XY- YX ratios: xy-50% vs. yx-50% (Fig. 2 a), xy-10% vs. yx-90% (Fig. 2 b), xy-30% vs. yx-70% (Fig. 2 c), and xy-70% vs. yx-30% (Fig. 2 d) improve the performance of the XY-YX router. Splitting the traffic evenly between XY and YX routes in toggle XY (TXY) routing does not always achieve optimal load balancing. Moreover, TXY leads to out-of-order arrivals, requiring large re-order buffers. In order to ensure the in-order arrivals, [6] proposed the weighted ordered toggle (WOT) algorithm, which assigns XY and YX routes to source-destination pairs in a way that reduces the maximum network capacity for a given traffic pattern. In the routing algorithm in [6], the routes are calculated by using the information of the actual traffic pattern. Algorithms in [5, 6] try to balance the traffic of XY and YX packets to maintain optimal load balancing in XY-YX routing in 4

order to improve the performance of the router. In our experiment, we extracted the throughput of the router varying the ratios of XY and YX packets in the network. As shown in Figure 2, our modified routing algorithm outperforms the origin XY-YX routing algorithm over all transmission ratios of XY and YX packets, demonstrating better performance than in [5, 6]. Moreover, our router supports in-order arrivals without virtual channels, reducing the hardware cost to avoid deadlock. In summary, the proposed router architecture enhances the throughput of the existing XY-YX routing algorithms, reducing the additional hardware cost in avoiding deadlock. 5 Conclusions In this letter, an XY-YX router architecture for flexible on-chip interconnection is proposed and implemented. The main idea is the use of XY and YX routing for different packets and avoiding deadlock by adding two disjointed horizontal channels instead of using virtual channels, thereby reducing hardware cost. Furthermore, the priority of outgoing channel allocation was optimized in order to increase the throughput of the NoC, thereby increasing the accepted load. Experimental results demonstrated the feasibility and superiority of our proposal. Acknowledgments This work was supported in part by Seoul National University of Science and Technology, Korea and IDEC (EDA Tool). 5