Power and Area Efficient NOC Router Through Utilization of Idle Buffers

Similar documents
A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Novel Topology-Independent Router Architecture to Enhance Reliability and Performance of Networks-on-Chip

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

ISSN Vol.03,Issue.06, August-2015, Pages:

A Literature Review of on-chip Network Design using an Agent-based Management Method

Evaluation of NOC Using Tightly Coupled Router Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Deadlock-free XY-YX router for on-chip interconnection network

A Novel Energy Efficient Source Routing for Mesh NoCs

Low Cost Network on Chip Router Design for Torus Topology

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Ultra-Fast NoC Emulation on a Single FPGA

Design and Analysis of On-Chip Router for Network On Chip

Fault-adaptive routing

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

PERFORMANCE ANALYSES OF SPECULATIVE VIRTUAL CHANNEL ROUTER FOR NETWORK-ON-CHIP

Efficient Multicast Communication using 3d Router

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Design of an Efficient Communication Protocol for 3d Interconnection Network

Connection-oriented Multicasting in Wormhole-switched Networks on Chip

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Network-on-Chip Architecture

4. Networks. in parallel computers. Advances in Computer Architecture

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

OASIS Network-on-Chip Prototyping on FPGA

Efficient And Advance Routing Logic For Network On Chip

High Data Transmission with Shared Buffer Routers Using RoShaQ Architecture

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

The Design and Implementation of a Low-Latency On-Chip Network

Network on Chip Architecture: An Overview

Deadlock-Avoidance Technique for Fault-Tolerant 3D-OASIS-Network-on-Chip

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

Partitioning Methods for Multicast in Bufferless 3D Network on Chip

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Network-on-chip (NOC) Topologies

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

A Survey of Techniques for Power Aware On-Chip Networks.

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

ISSN Vol.03, Issue.02, March-2015, Pages:

TECHNOLOGY scaling is expected to continue into the deep

Multi-level Fault Tolerance in 2D and 3D Networks-on-Chip

Lecture 23: Router Design

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Akash Raut* et al ISSN: [IJESAT] [International Journal of Engineering Science & Advanced Technology] Volume-6, Issue-3,

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

Analyzing the Performance of NoC Using Hierarchical Routing Methodology

Basic Low Level Concepts

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Flow Control can be viewed as a problem of

High Performance Interconnect and NoC Router Design

A dynamic distribution of the input buffer for on-chip routers Fan Lifang, Zhang Xingming, Chen Ting

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

NoC Test-Chip Project: Working Document

Lecture 18: Communication Models and Architectures: Interconnection Networks

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

Phastlane: A Rapid Transit Optical Routing Network

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

Low-Power Interconnection Networks

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Abstract. Paper organization

Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Quality-of-Service for a High-Radix Switch

Topologies. Maurizio Palesi. Maurizio Palesi 1

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

High-Level Simulations of On-Chip Networks

WITH THE CONTINUED advance of Moore s law, ever

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP

A Comprehensive Digital Cross Connect (DCS) for Buffered and Unbuffered Switching Applications

Lecture 22: Router Design

Transcription:

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Mr. Kamalkumar S. Kashyap 1, Prof. Bharati B. Sayankar 2, Dr. Pankaj Agrawal 3 1 Department of Electronics Engineering, GRRCE Nagpur India 2 Department of Electronics Engineering, GRRCE Nagpur India 3 Department of Electronics Engineering, RCOEM Nagpur India ABSTRACT: The network-on-chip (NOC) has been proposed for system-on chip (SOC) based applications to achieve the better performance with low power consumption as compaired with typical onchip bus architecture. Power consumption can be decreased by reducing the size of the router buffers. However, as reducing router buffers alone will significantly reduces performance, we compensate by utilizing the newly proposed dual-function virtual channel buffers that allow flits to be stored on wires when required. This paper presents a novel virtual-channel (VC) sharing technique for NOC architecture. The proposed architecture improves the utilization of resources to enhance the performance with minimal overheads. Resource sharing for on-chip network is critical to reduce the chip area and power consumption. Thus virtual channel buffer sharing by other router ports has been proposed to enhance the performance of on-chip communication. Keywords: Networks-on-Chip (NOC), Virtual Channel (VC), Processing Element Recovery (PE), Partial Virtual Channel Sharing (PVC). [1] INTRODUCTION A typical NOC based system consists of processing elements (PE), network interfaces (NI), routers and channels. The router further contains switch, buffers and routing logic as shown in Figure 1. Buffers consume near about 64% of the total node (router + link) leakage power for all process technologies, which makes it the largest power consumer in any NOC system. All links in NOC can be simultaneously used for data transmission, which provides a high level of parallelism. It is an attractive solution to replace the conventional communication architectures such as shared buses or point-to-point dedicated links by NOC. NOC provides better scalability than on-chip buses because as more resources are introduced to a system, also more routers and links are introduced to connect them to the network [4]. Kashyap, Sayankar and Agrawal 31

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Buffers consume the largest fraction of dynamic and leakage power of the NOC node (router + link) [3]. Storing a packet in buffer consumes far more power as compared to its transmission [3]. Thus, increasing the utilization of buffers and reduction in their number and size with efficient autonomic control reduces the area and power consumption. Also the Wormhole flow control [2] has been proposed to reduce the buffer requirements and enhance the system throughput. Figure: 1. Conventional virtual channel router architecture. However, one packet may be occupy or cover several intermediate switches at the same time. This introduces the problem of deadlocks and livelocks [8]. Thus to avoid this problem the use of virtual channel is introduced. Virtual channel flow control exploits an array of buffers at each input port. By allocating different packets to each of these buffers, flits from multiple packets may be sent in an interleaved manner over a single physical channel. This improves the throughput and reduces the average packet latency. [2] RELATED WORK Buffer management is not a recent issue but still needs attention. This comes as a performance improvement to utilize the unused buffers at some time instant. Lately, buffer sharing has been analyzed and compared to the existing router architectures.[1] This paper proposed combine two techniques of adaptive channel buffers and router pipeline bypassing to simultaneously reduce power consumption and improve performance. This paper simulation results of the proposed methodology combining the two techniques, yield a overall power reduction of 62% over the baseline and improve performance (throughput and latency) by more than 10%. 32

Increasing the utilization of buffers and reduction in their number and size with efficient autonomic control reduces the area and power consumption. In this paper, Wormhole flow control has been proposed to reduce the buffer requirements and enhance the system throughput[4]. Lan et. al [6] addresses the buffer utilization by making the channels bidirectional and shows significant improvement in system performance. But in this case, each channel controller has two additional tasks: dynamically configuring the channel direction and to allocate the channel to one of the routers, sharing the channel. Also, there is a 40% area overhead over the typical NOC router architecture due to double crossbar design and control logic. The extra tasks for channel controller and increased crossbar size contribute to the power consumption as well Nicopoulos et al.[9] presents a Dynamic Virtual Channel Regulator (ViChaR) for NOC routers. The authors address the buffer utilization by using the unified buffered structure (UBS) instead of individual and statically partitioned FIFO buffers. It provides each individual router port with a variable number of VCs according to the traffic load. The architecture provides around 25% improvement in system performance at a small cost of power consumption. The architecture enhances the buffer utilization under heavy traffic load at the port. If there is no load at the port, the buffer resources cannot be utilized by other loaded ports. Ramanujam et. al [10] introduces a distributed shared buffer (DSB) NOC router architecture. The proposed architecture shows a significant improvement in throughput at the expense of area and power consumption due to extra crossbar and complex arbitration scheme. Kodi et. al [11] illustrates the impact of repeater insertion on inter-router links with adaptive control and eliminating some of the buffers in the router. The approach saves appreciable amount of power and area without significant degradation in the throughput and latency. But there is still some scope to increase the buffer utilization inside the router by using the architecture which we propose here. The main motivation of this work is to propose a NOC architecture with considerations of all the issues discussed above. A NOC architecture with an ideal tradeoff between the performance and the mentioned issues guides to propose the partial sharing of VC buffers. Kashyap, Sayankar and Agrawal 33

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Figure: 2. Base router architecture The drawback of using VCs stands in a more complex control protocol, as data corresponding to different messages which is multiplexed on the physical channel must be eventually separated [8]. Another important issue that needs attention is the utilization tradeoff. VCs are proposed to increase the utilization of physical channels. By inserting the VC buffers, we increase the physical channel utilization but utilization of inserted VC buffers is not considered. It can be observed that if there is no communication on some channel at some time instant and at the same time, neighboring channel is overloaded, free buffers of one channel cannot contribute for congestion control by sharing the load of neighboring channel. [3] PROPOSED ARCHITECTURE To enhance the performance of typical VC architectures, new VC buffers should be inserted because a congested port cannot utilize the resources of neighbouring free port. VC utilization can be enhanced by sharing free buffers of a port with neighbouring overloaded input ports. The improved VC utilization directs to reduce the number of VC buffers and sustain the system performance. VC utilization can be further improved by sharing them among all the input ports. However, full sharing increases the control logic complexity and power consumption due to a larger input crossbar. Thus a tradeoff between resource utilization, design complexity and power consumption is needed. Instead of sharing the VC buffers among all the input ports, PVS approach shares the VC buffers among few input ports according to the communication requirements. With this technique, the buffer utilization is increased and utilization level approaches close to the fully shared architecture without significant silicon area and power consumption overhead because of reduced input crossbar size. Overall, PVS approach is a tradeoff between system 34

throughput, resource utilization and power consumption. The PVS architecture has been explained in detail by [9]. NI is a generic interface, which needs to be standardized. Main tasks of the NI are packetization and de-packetization of data. Different services can be introduced in NI architecture, such as multicasting and error monitoring. The queuing buffer for the PE can be considered as the part of NI on input port of the switch. Thus each PE has its own dedicated buffer, which simplifies the control logic and enhance the throughput without any area overhead. PI is the core specific interface, which acts as a clock synchronizer. In PVS approach, the input crossbars and control logic are responsible for buffer allocation and receiving the data packets. Input architecture uses the distributed routing logic where as central VC allocator is used for the group of channels sharing the buffers. The proposed architecture, sharing of buffers between neighbouring input ports, is shown in Fig.2. The router architecture can be divided into two parts, the Input and the Output. The Input part is responsible for buffer allocation and receiving the packets from neighbouring routers. The Output part computes the route and transmits the packets accordingly. Within the router, the Input and the Output parts do not communicate at all. Both parts simply write and read from the buffers. Fig.3 shows the signals of both parts to write and read from buffers and also the external interface of router. Kashyap, Sayankar and Agrawal 35

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Figure: 3. Proposed output part of router architecture Figure: 4. System Level Input and Output Controllers Signalling Specifications of Router Architecture. For two channels sharing the buffers, input PVS architecture is shown in Figure.3. The VC allocator for one group of channels sharing the buffers operates according to the buffers status and does not communicate with other VC allocators. The task of VC allocator is to keep track of free buffers and allocate them to the incoming traffic. After allocation, routing logic computes the route for the packet and selects the port on output crossbar for packet transmission. The routing logic is a fault tolerant approach because of distributed nature. If one routing element is faulty, only the corresponding buffers and channels are affected. Distributed routing logics also reduce the communication overhead and power consumption. [4] DISCUSSION In typical NOC mesh, processing nodes can be divided into three different categories according to the position : corner, edge and central nodes. Routers on the basis of these categories require three, four and five I/O ports respectively. Because of the distributed and 36

independent control logic, router can be configured to fulfil these requirements without any extra effort by simply removing the extra hardware. Thus the architecture is fully optimized for any number of I/O ports. The number of I/O ports can be increased to any number by simply duplicating the hardware on input side and increasing the crossbar size on output side. At the same time, the buffer utilization is increased which significantly reduces the new buffer requirements. Thus decreasing the buffer size by 4 buffer slots (25%) leads to a power savings of 25.72% as compared to the baseline. Power reduces by 40.77% when the buffer size is reduced to 50% of the baseline [3]. The proposed architecture can be integrated into the existing automated NOC design flow without requiring extra effort. An automated design flow can utilize the adoptability feature of this architecture to generate an optimized router module according to the application and design requirements. Therefore, the proposed architecture can play a significant role in maintaining the performance of NOC based systems regardless of the topology. [5] CONCLUSIONS AND FUTURE WORKS PVS-NOC architecture has better trade-off between resource utilization, system performance and power consumption than conventional VC based architectures. Thus the proposed architecture was simulated with uniform, transpose, and negative exponential distribution (NED) synthetic benchmarks. The performance was compaired with typical virtual channel (VC) based NOC architecture and FVS-NOC architecture. Simulation results showed that average packet latency of PVS NOC was significantly lower compared to typical VC based NOC architecture. Thus typical VC based network architecture are mostly preferred. For video conference encoder application, PVS-NOC showed a better tradeoff between silicon area and power consumption as compared to the typical NOC architecture. Thus we are going to see a similar approach that can be taken for the further development of NOC systems, in order to raise the level of abstraction towards application layers. A very important step is to utilize the buffers for routing purposes within the cluster without injecting the packets to the network. It will help to compensate the overheads and enhance the system performance. Apart from presented results, power consumption analysis and realistic traffic analysis are future topics for analysis and implementation for the proposed router architecture. REFERENCES [1] Power and Area Efficient Design of Network-on-Chip Router Through Utilization of Idle Buffers Khalid Latif1;3, Tiberiu Seceleanu2, Hannu Tenhunen, 2010 17 th IEEE International Conference and Workshops on Engineering of Computer based system. [2] Generic low-latency Network on chip router architecture for FPGA computing Systems Ye Lu, John McCanny, Sakir Sezer 2011 21st International Conference on Field Programmable Logic and Applications. Kashyap, Sayankar and Agrawal 37

Power and Area Efficient NOC Router Through Utilization of Idle Buffers [3] Design of Energy-Efficient Channel Buffers with Router Bypassing for Network-on-Chips (NOCs) Avinash Kodi, Ahmed Louri, Janet Wang2009 10th International conference on Quality Electronic Design. [4] Designing a High Performance and Reliable Networks-on-Chip using Network Interface Assisted Routing Strategy Khalid Latif, Amir-Mohammad Rahmani, Tiberiu Seceleanu, Hannu Tenhunen 2012 15th Euromicro Conference on Digital System Design. [5] Enhancing Performance Sustainability of Fault Tolerant Routing Algorithms in NOC-Based Architectures Khalid Latif, Amir-Mohammad Rahmani, Kameswar Rao Vaddina, Tiberiu Seceleanu, Pasi Liljeberg, Hannu Tenhunen 2011 14th Euromicro Conference on Digital System Design. Congestion Mitigation Using Flexible Router Architecture for Network-on- Chip Mostafa S. Sayed, A. Shalaby, M. El-Sayed Ragab, Victor Goulart in Ieee 2012. [6] A Modular Router Architecture Desgin For Network on Chip Brahim Attia, Wissem Chouchene, Abdelkrim Zitouni, Noureddine Abid,and Rached Tourki 2011 8th International Multi-Conference on Systems, Signals & Devices. [7] Application-Specific Network-on-Chip Architecture Based on Small-World Peilei Bao, Huaxi Gu, Bin Li, and Keming Du 2012 IEEE International Conference on Information Science and Technology Wuhan, Hubei, China; March 23-25, 2012 [8] Y. C. Lan et al. BiNOC: A bidirectional NOC architecture withdynamic self-reconfigurable channel. In Proceedings of International Symposium on Networks-on-Chip (NOCS), pages 266 275, 2009. [9] R. S. Ramanujam et el. Design of a high-throughput distributed shared-buffer NOC router. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS), pages 69 78, 2010. [10] Enhancing Performance of NOC-Based Architectures using Heuristic Virtual-Channel Sharing Approach Khalid Latif1;3, Amir-Mohammad Rahmani1;3, Kameswar Rao Vaddina2011 35th IEEE Annual Computer Software and Applications Conference. [11] A. Kodi, A. Louri, J. Wang. Design of energy-efficient channel buffers with router bypassing for network-onchips (NOCs). Proceedings of International Symposium on Quality of Electronic Design (ISQED), pp.826-832, March 2009. Author[s] brief Introduction: Name: Mr. Kamalkumar S. Kashyap PG Student, GHRCE Nagpur Corresponding Address- 38

Samata Colony, Tilak Nagar, Bramhapuri, Dist. Chandrapur (MH) Pin No. 441206 Mob. No. 9665709380; 9405150701 Kashyap, Sayankar and Agrawal 39