Temperature and Traffic Information Sharing Network in 3D NoC

Similar documents
A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee

A Novel Non-cluster Based Architecture of Hybrid Electro-optical Network-on-Chip

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Clustering-Based Topology Generation Approach for Application-Specific Network on Chip

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Deadlock-free XY-YX router for on-chip interconnection network

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Traffic- and Thermal-Aware Run-Time Thermal Management Scheme for 3D NoC Systems

Lecture 3: Flow-Control

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Method of Identifying the P2P File Sharing

On an Overlaid Hybrid Wire/Wireless Interconnection Architecture for Network-on-Chip

Energy Efficient and Congestion-Aware Router Design for Future NoCs

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN

Lecture 18: Communication Models and Architectures: Interconnection Networks

Design of Low Power Multi-mode Router for Network-on-chip in Dark Silicon Era

Network-on-chip (NOC) Topologies

Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC

4. Networks. in parallel computers. Advances in Computer Architecture

Networks-on-Chip Router: Configuration and Implementation

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Network on Chip Architecture: An Overview

High Performance Interconnect and NoC Router Design

WITH the development of the semiconductor technology,

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

A Survey of Techniques for Power Aware On-Chip Networks.

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Design of network adapter compatible OCP for high-throughput NOC

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Deadlock and Livelock. Maurizio Palesi

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

NOC Deadlock and Livelock

Demand Based Routing in Network-on-Chip(NoC)

Topology Optimization of 3D Hybrid Optical-Electronic

Flow Control can be viewed as a problem of

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection

Network-on-Chip Architecture

Fault-adaptive routing

A Literature Review of on-chip Network Design using an Agent-based Management Method

Evaluation of NOC Using Tightly Coupled Router Architecture

ACO-BASED FAULT-AWARE ROUTING ALGORITHM FOR NETWORK-ON-CHIP SYSTEMS

TDT Appendix E Interconnection Networks

Improvement of Buffer Scheme for Delay Tolerant Networks

OASIS Network-on-Chip Prototyping on FPGA

Design of an Efficient Communication Protocol for 3d Interconnection Network

A Novel Semi-Adaptive Routing Algorithm for Delay Reduction in Networks on Chip

Implementation of PNoC and Fault Detection on FPGA

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

Transaction Level Model Simulator for NoC-based MPSoC Platform

Fault-tolerant & Adaptive Stochastic Routing Algorithm. for Network-on-Chip. Team CoheVer: Zixin Wang, Rong Xu, Yang Jiao, Tan Bie

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 7: Flow Control - I

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Buffer Pocketing and Pre-Checking on Buffer Utilization

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Implementation and Analysis of Hotspot Mitigation in Mesh NoCs by Cost-Effective Deflection Routing Technique

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

Evaluation of Effect of Packet Injection Rate and Routing Algorithm on Network-on-Chip Performance

CATRA- Congestion Aware Trapezoid-based Routing Algorithm for On-Chip Networks

A dynamic distribution of the input buffer for on-chip routers Fan Lifang, Zhang Xingming, Chen Ting

Design and Implementation of Multistage Interconnection Networks for SoC Networks

AC : HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Performance Oriented Docket-NoC (Dt-NoC) Scheme for Fast Communication in NoC

Lecture: Interconnection Networks

Research on Heterogeneous Communication Network for Power Distribution Automation

Topologies. Maurizio Palesi. Maurizio Palesi 1

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Topologies. Maurizio Palesi. Maurizio Palesi 1

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

NoC Test-Chip Project: Working Document

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Design and Analysis of On-Chip Router for Network On Chip

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

EECS 578 Interconnect Mini-project

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

Transcription:

, October 2-23, 205, San Francisco, USA Temperature and Traffic Information Sharing Network in 3D NoC Mingxing Li, Ning Wu, Gaizhen Yan and Lei Zhou Abstract Monitoring Network on Chip (NoC) status, such as the traffic and temperature distribution, plays a great role in improving the network performance. With the increase of the network scale, it is becoming more and more difficult to monitor the global network status in time. To avoid the status transmission introducing too much network load while keep the transmission cost limited, we construct a cost-effective auxiliary network based on 3D mesh NoC. The router microarchitecture and the packet structure have been modified to meet the requirement. Multicast communication based routing algorithm is proposed to improve the transmission efficiency. Experiments show that, in 9x9x4 3D NoC, under the uniform traffic distribution, our global status sharing scheme can reduce the 25% transmission latency compared to Global Congestion Awareness (GCA). And status information can be transmitted more frequently than GCA with less impact on the transmission of ordinary packets. Index Terms Network on Chip, status monitor, global sharing scheme W I. INTRODUCTION ith the increasing number of cores on a single chip, buses and point to point connections are unable to meet the requirements such as high performance, low latency and power efficiency. Networks on Chip (NoC) [] has become a scalable and flexible interconnect solution for current multi-core architecture [2]. As technology evolves, 3D Integrated Circuits (ICs) are regarded as the promising technology to integrate more devices [3]. Therefore, 3D NoC systems are widely discussed recently. With the continuous decrease of system chip area and feature size and development of communication requirements, heat dissipation and congestion problems on chip become non-ignorable [4]. The temperature and traffic Manuscript received July 0, 205; revised July 30, 205. This work was supported in part by the National Natural Science Foundation of China under Grants 6376025 and 60608, and the Industry-academic Joint Technological Innovations Fund Project of Jiangsu under Grant BY203003-. M. Li is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 2006, China.(email: 074762836@qq.com) N. WU is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 2006, China (e-mail: wunee@nuaa.edu.cn). G. YAN is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 2006, China, and with the College of Mathematics, Physics and Information Engineering, Anhui Science and Technology University, Fengyang, 23300,China.(email: xucs_yan@26.com) L. ZHOU is with the College of Information Engineering, Yangzhou University, Yangzhou, 225009, China. (e-mail: tomcat800607@26.com) management mechanism is a key solution in improving the NOC performance, among which, the temperature and traffic monitoring mechanism is of great importance. However, with the scale increase of 3D NoC, it is very difficult to achieve real-time and effective monitoring of the network. Recently, several distributed routing schemes that rely on regional or global congestion information were proposed [5-7]. Gratz introduces Regional Congestion Awareness (RCA), an approach that propagates congestion information across the network in a scalable manner, to improve the ability of adaptive routers to spread network load [7]. RCA does not require centralized tables, all-to-all communication, or in-band signaling that contributes to congestion. RCA can propagate the congestion information among the adjacent nodes along the opposite direction of propagation of normal Packet. Congestion information each node received is the sum of the congestion in the direction of congestion information transmission. However the node can t get global congestion information. Adaptive routing algorithm cannot evaluate the performance of all possible paths. To expand the view of the network, Ramakrishna introduces Global Congestion Awareness (GCA) [8]. GCA is a globally-aware routing scheme for NoC that provides each router with a timely and complete view of the congestion status of the whole network. GCA creates a centralized table which reflects the whole network traffic distribution for each node and collects all the nodes congestion information by all-to-all communication. However, this information sharing scheme will increase the amount of data and the transmission delay of the network. Moreover, the transmission of the congestion information will occupy a lot of bandwidth and increase the congestion risk. Chao and his teammates propose a mechanism to propagate the temperature information of the whole network [9]. They simplify temperature information into a trigger flag which is 0 or. When global sharing, the temperature information of the whole network can be propagated within four flits. This global sharing scheme greatly reduces the additional data in the network. However, temperature information is condensed. Every single value stored in the temperature map can only represent whether a route exceeds a temperature threshold and stops packet exchange. This scheme may not meet the accuracy requirements of adaptive routing algorithms for selecting the coldest path. In this paper, we adopt the 3D-mesh topology as a prototype. To overcome the defect of the traditional globally-aware scheme, this paper proposes Temperature and Traffic Information Sharing Network (TTISN). Additionally, we propose Temperature and Traffic Information

, October 2-23, 205, San Francisco, USA Transmission Routing Algorithm (TTITR) which is based on multicast communication to release analysis results in the TTISN. The rest of this paper is organized as follows. Section II describes the architecture of the TTISN, the architecture of the router, the format of the status packets and the multicast supporting routing policy. In Section III,the simulation experiments and results of TTISN will be given. Section IV is the conclusion of this paper. II. DESIGN OF THE SHARING POLICY traffic load can be reduced. For most of adaptive temperature or traffic balanced routing algorithms, average layer temperature or average layer traffic is the only information affecting the vertical routing. Thus, for a global sharing mechanism, it just needs to transmit the average status information of different layers. When the status information is gathered, the last aggregation nodes will get the average temperature or average traffic of a layer. Then the packets containing the average values are exchanged vertically by using the original network. Normal node 2 Link between normal nodes A. Architecture For some large-scale 3D NoC with hundreds of cores, traditional globally-aware schemes can aggravate congestion. Because of the delivery of a lot of temperature or traffic information, Globally-aware schemes can cause great transmission delay. 0 9 4 Aggregation node Link between Aggregation nodes 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 40 4 42 43 44 45 46 47 48 49 50 5 52 53 54 55 56 57 58 59 60 6 62 63 64 65 66 67 68 69 70 7 72 73 74 75 76 77 78 79 Fig. 2. Top view of TTISN 80 8 Fig.. Distribution of aggregation nodes in NoC A typical architecture of 3D-Mesh NoC is shown in Figure. The size of the NOC is up to 9*9*4, which means traditional globally-aware schemes have to gather the temperature or traffic information of 324 nodes and release all the information to each node. This will increase the traffic load in the network and increase network latency. When the load of network is great, the information will be greatly delayed. This would reduce the temperature or traffic balancing ability of adaptive routing policy based on the information. Therefore, it is necessary to propose a transfer strategy which takes up lower network bandwidth and makes lower delay in the transmission of the network status information. Future many-core architectures with hundreds of cores can be managed more efficiently in a decentralized way [0]. In this work, we divide the NOC into regions. All the status information will be collected locally and exchanged among regions. We propose a special network called TTISN which is used to transmit the network status information. This network is an auxiliary network built on the basis of the NOC. As shown in Figure 2, every layer in the three-dimensional NOC is divided into regions. In every region we choose one node to collect status data from all the nodes of the region. Then all the chosen nodes are linked to each other to form a new network. In this auxiliary network, the temperature or traffic information is transmitted on a Hamilton path. By constructing the auxiliary network, the maximum hops and In order to build an auxiliary network for status information propagation, specialized router architecture is required for the aggregation nodes. All the status information within the sub net are transmitted to the aggregation node and then packed into a status packet. Therefore, the aggregation router requires a status message buffer and a status information packer. Additionally, aggregation node router has a special architecture of Routing Control Logic2 for routing in auxiliary network. The router architecture is shown in Figure 3. East West North South Up Down Local East2 West2 North2 South2 B. Packet Structure Routing control logic Crossbar Crossbar2 TEM /TRA packer Routing Control logic2 There are five kinds of data packets, which are ordinary data packet, temperature information upload packet, East West North South Up Down Local East2 West2 North2 South2 Fig. 3. Aggregation node router architecture

, October 2-23, 205, San Francisco, USA temperature information download packet, traffic information upload packet and traffic information download packet. Temperature information upload packet is used to upload the temperature information of each node to the aggregation node. Temperature information download packet is downloaded by each node to refresh the whole network temperature map. The traffic information packets are similar to the temperature information packets and have two types. 0-2 Packet Type 3- Source ID 2-20 Destination ID 2-24 Packet Length 25-3 Reserve Tail Header Packet structure is shown in Figure 4. In order to be able to distinguish between different data packets, the bits 0-2 in head flit are used to indicate the data packet type. 00 is for the temperature information upload packet. 0 is for the temperature information download packet. 0 is for traffic information upload packet. is for traffic information download packet. The rest of the values indicate that the packet is the ordinary data packet. There are 324 nodes in the network. We have to use 9 bits to represent the node ID. The 3- bits indicate the source node ID. The 2-20 bits are the 0 9 Fig. 4. Packet structure data upload packets path in subnet download packets path in subnet 2 2 3 2 20 2 (a) upload packets path in auxiliary net download packets path in auxiliary net 4 7 38 4 44 65 68 7 (b) Fig. 5. Routing path in a layer destination node ID. The packet length information is saved in 2-24 bits. The last 6 bits are reserved and these bits are 0 in general. When temperature or traffic upload packets are transmitted in the auxiliary network, the temperature information of the sub network is added to the end of the upload packets. Eventually a complete network status information packet is formed when the upload packet reaches the last aggregation node. When a network status information packet is formed, the last bit of the Packet Type is set to to form the download packet. The download packet must include the status information of the whole network, so that the packet can update temperature or traffic information table of every node in the network. C. Routing Algorithm Due to the different functions and different transmission directions of the data packets, we have to design a routing algorithm suitable for various data packets delivery. The transmission of the network status information packet is divided into two stages, which are the data upload phase and the data download phase. The process of uploading or downloading data can be divided into the transmission process in subnet and the transmission process in the auxiliary network. In the upload phase, the network status information packet is sent to the aggregation node by using the XY routing algorithm. After receiving the status information of the nodes in the subnet, the aggregation node saves the information to a data set. When the status information is transmitted to the auxiliary network, the router uses the Hamilton routing Pseudo code of thermal and traffic information transmission routing algorithm input: CurrentNode (Xc,Yc,Zc) Aggregation Node(Xg,Yg,Zg) DestinationNode (Xd,Yd,Zd) CurrentNodeType CT PacketType PT output: Next Node : if PT is upload packet then 2: if CT is aggregation node then 3: if this aggregation node is not the last one then 4: add the local subnet status information to packet and transmit the new packet to next aggregation node 5: else 6: construct the download packet 7: end if 8: else 9: transmit the packet to the aggregation node along the XY path 0: end if : else if PT is download packet then 2: if CT is aggregation node then 3: if this aggregation node is not the first one then 4: copy the packet to local subnet and then transmit to the previous aggregation node 5: else 6: copy the packet to local subnet 7: end if 8: else 9: transmit the packet to the normal node along the Hamilton path 20: end if 2:else 22: transmit the packet by using normal routing algorithm 23:end if 24:end Fig. 6. Pseudo code of routing algorithm

, October 2-23, 205, San Francisco, USA algorithm to select the path and the status information of local subnet is added to the upload packet. Eventually, all the status information is gathered and saved to the temperature or traffic information download packet. In the download process, the status information packets are transmitted in multicast mode. When transmitted in the auxiliary network, the state information packets are copied and sent to the next aggregation node. The copies of the packets are transmitted to each node in the subnet and the status tables are updated. The routing path for the status information in a layer is shown in Figure 5. Figure 5(a) shows the routing path in the subnet and Figure 5(b) shows the routing path in the auxiliary network. The routing algorithm applied to transmit temperature and traffic information is shown in Figure 6. The definitions of parameters are as follows, the coordinate of current node (Xc,Yc,Zc), the coordinate of destination node (Xd,Yd,Zd), the coordinate of aggregation node (Xg,Yg,Zg), the type of current node CT, and the received packet type PT. III. EXPERIMENTS AND RESULTS In this chapter, we will evaluate the performance of TTISN. Performance evaluation experiments are implemented on the basis of the Noxim platform. It is necessary to modify the structure of the NOC and design the structure of the router. We build a 3D mesh NOC with the size of 9x9x4 on the basis of Noxim and build the auxiliary in the network. The simulation experiments are to investigate the role of TTISN in reducing the traffic for status information transmission. In this experiment, Noxim simulation configuration parameters configuration are shown in TABLE I. TABLE I NOXIM SIMULATION CONFIGURATION Characteristic Fig. 7. Load latency at different injection rate Configuration Topology 3D mesh Size 9*9*4 Switching mechanism Wormhole Routing algorithm XYZ Traffic workload Uniform Virtual channel 2 Buffer depth 32 Packet length 3~24 Data Width 32 Simulation period 00,000 Fig. 8. Load latency at different status information injection period In the first experiment, the performance of two kinds of status information transmission strategy on the network under different information injection rate is reflected. In the experiment, the injection period of temperature and traffic information is 500 cycles and the length of the ordinary data packet is 3~8. By changing the injection rate, the average latency of different transmission strategies under different injection rate is obtained. As shown in Figure 7, temperature and traffic information transmitted in TTISN only increases about 5% transmission delay. Compared to GCA, TTISN reduces the network delay of about 25%. The second experiment is to investigate the influence of the status information injection period on the network transmission. Figure 8 shows the impact of the status information input period on the network information transmission when the injection rate is 0.05. When network status information transmission is not transmitted frequently, it has little impact on the ordinary information transmission. However the performance difference between GCA and TTISN is obvious when the state information is transmitted more frequently. While the injection period of the status information reaches 450 cycles, the average latency increases rapidly. The Network delay of TTISN is stable between 35 and 60 while the status information injection period is bigger than 200 cycles. IV. CONCLUSION In this paper, we modify the architecture of 3D NoC to construct the global sharing network of temperature or traffic information. Based on the modified network, we implement a global sharing routing algorithm based on multicast mechanism. Experiment results show that in the 9x9x4 3D NoC, under the uniform traffic distribution, our global sharing scheme can reduce the 25% transmission latency compared to GCA and can transmit the status information more frequently than GCA with less impact on the transmission of ordinary packets. REFERENCES [] L. Benini and G. D. Micheli, Networks on chips: a new SoC paradigm, IEEE Computer, vol 35, pp. 70 78, Jan. 2002. [2] J. Heisswolf, A. Weichslgartner and A. Zaib, Hardware Supported Adaptive Data Collection for Networks on Chip, 27th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp.53 62, 203 [3] Y.R. Huang, J.H. Pan and Y.C. Lu, Thermal-Aware Router-Sharing Architecture for 3D Network-on-Chip Designs, in Circuits and

, October 2-23, 205, San Francisco, USA Systems (APCCAS): IEEE Asia Pacific Conference, pp. 087 090, Dec. 200. [4] Z. Wu, N. Wu and L. Zhou, Adaptive Thermal and Traffic Balanced Routing Algorithm for 3D NoC, in Proceedings of the World Congress on Engineering and Computer Science: The World Congress on Engineering and Computer Science 204, pp30-34. [5] R. S. Ramaujam, and B. Lin, Destination-based adaptive routing on 2D mesh networks, in Proc. 6th ACM/IEEE symposium on architectures for networking and communications systems, 200. [6] L.P. Tedesco, T. Rosa and F. Clermidy, Implementation and evaluation of a congestion aware routing algorithm for networks-on-chip, in Proc. 23rd symposium on integrated circuits and system design, 200. [7] P. Gratz, B. Grot, and S.W. Keckler, Regional Congestion Awareness for Load Balance in Networks-on-Chip, in Proc. IEEE The international symposium on high-performance computer architecture, 2008, pp. 203-24. [8] M. Ramakrishna, P.V. Gratz and A. Sprintson, GCA:Global Congestion Awareness for Load Balance in Networks-on-Chip, Networks on Chip (NoCS), Seventh IEEE/ACM International Symposium on, pp. -8. [9] C. H. Chao, K. Y. Jheng, H. Y. Wang, J. C. Wu and An-Yeu Wu, Traffi c- and thermal-aware run-time thermal management scheme for 3D NoC systems, in Proceedings of the AC M/IEEE International Symposium on Networkson-Chip (NoCS), pp. 223-230, May 200. [0] S. Kobbe, L. Bauer, D. Lohmann, W. Schroder-Preikschat, and J. Henkel. DistRM: Distributed resource management for on-chip manycore systems, In Hardware/Software Codesign and System Synthesis(CODES+ISSS), 20 Proceedings of the 9th International Conference on, pp. 9 28, 20.