Study of Dynamically-Allocated Multi-Queue Buffers for NoC Routers

Size: px
Start display at page:

Download "Study of Dynamically-Allocated Multi-Queue Buffers for NoC Routers"

Transcription

1 Study of Dynamically-llocated Multi-Queue Buffers for NoC Routers Yung-Chou Tsai Department of lectrical ngineering National Tsing Hua University Hsinchu, Taiwan Yarsun Hsu Department of lectrical ngineering National Tsing Hua University Hsinchu, Taiwan bstract large portion of area and power in Network-on- Chip (NoC) routers is consumed by buffers, and hence these costly storage resources must be utilized well. However, some early related literatures are not suitable for modern NoC router architecture as well as various complicated traffic loads anymore. In this work, we refine the dynamically-allocated multi-queue (DMQ) buffer organization and propose a new one that can accommodate multiple packets more than the number of virtual channels, named DMQ with multiple packets (DMQ-MP). The DMQ-MP scheme can solve certain data transmission issues under some circumstances, such as heavy network congestion or short packets, to improve performance. We also introduced two methods applicable to DMQ-based buffers, which are adding priorities for switch allocation and reserving virtual channels for high-priority packets. xperimental results show that DMQ-MP routers can have up to 24.52% higher saturated throughput than SMQ and DMQ counterparts. Keywords-buffer; virtual channel; router; Network-on-Chip; I. INTRODUCTION Buffers are usually added in NoC router nodes to provide temporal storage space for arriving data to wait for resources to traverse these router nodes. In most cases, staticallyallocated multi-queue (SMQ) buffers in input-queued switches are adopted due to their simple flow control mechanism and small hardware overhead. However, each queue of SMQ buffers contains a fixed amount of its own buffer spaces and never shares them with other queues. This kind of static allocation on input buffers results in storage resource wastage for lack of flexibility on buffer management. For the same input buffer, it is possible that some queues are overloaded in need of more buffer spaces while some queues are nearly empty with plenty of buffer spaces unused. specially buffers consume a great portion of power and area in a NoC router [1], and thus these costly buffer resources must be effectively utilized when designing router architecture. Tamir and Gregory [2] first proposed a novel buffer organization named DMQ to provide better flexibility of buffer utilization. This DMQ buffer organization can dynamically partition buffer storage with linked lists to handle variable length packets for achieving higher performance. nother approach using self-compacting buffers (SCB) to implement DMQ switches was introduced in [3] to reduce the hardware overhead and complexity. Liu et al. [4] improve the DMQ switches adopting SCB by letting two sets of virtual channels in different dimensions share buffer space. special architecture called highperformance input-queued switch (HIPIQS) also uses a DMQ organization, pipelined access to multi-bank input buffers with chucks, and many small additional cross-point buffers, to deliver high performance [5]. However, current NoC architecture designs prevalently use wormhole switching to relax the constraints on buffer size, virtual channel flow control to avoid deadlocks and head-of-line (HoL) blockings, and mesh-based topology to fit chip layouts. The above mentioned searches are not suited to the requirements of modern NoCs anymore and therefore have to be resurveyed. Rezazad et al. [6] showed that the optimal number of virtual channels and buffer length of mesh-based interconnection networks highly depends on the traffic pattern as follows. Under light traffic, the buffer structure extends virtual channel depth for continual transfers to improve latencies; under heavy traffic, the buffer structure dispenses many virtual channels for congestion avoidance to increase throughput. Based on these concepts, two buffer structures of dynamically changing their number of virtual channels were proposed [7][8]. However, these two buffer structures have to put a lot of effort to manage the varying number of virtual channels, such as tracking tables, and that makes them hard to scale. In addition, arbiters used in virtual channel allocations must be simplified to prevent the hardware overhead from dramatically increasing as the number of virtual channels goes up. In this paper, we propose the DMQ-MP scheme by allowing multiple packets that are more than the number of virtual channels coexisting inside a DMQ-based input buffer. Because this scheme breaks the limitation of one packet per virtual channel in conventional virtual channel routers, it can overcome the possible issues resulting from waiting for switching packets, handling short packets, and HoL blockings. We believe that DMQ-MP is the easiest and feasible method to improve network performance and buffer utilization without sophisticated hardware design modifications. Besides, it is suitable to some applications like reserving virtual channels for high-priority packets.

2 II. MOTIVTION. Packet Switching Latencies In conventional SMQ and DMQ buffers, one virtual channel typically only holds one packet each time for easy control and preventing HOL blockings. Therefore, if all virtual channels contain packets whose tail flits have entered but not left yet, the remaining free buffer resources are wasted until some packet is gone and another new packet arrives, as illustrated in Figure 1a. ven though the DMQ mechanism lets the remaining buffer resources be used by packets that have not completed their delivery as shown in Figure 1b, bringing more flits of the these long packets merely helps little to deal with the following blanks after any other packet leaves. The duration between the two departures of the tail flit belonging to the current packet and the head flit belonging to the next packet is pretty long. This duration includes the time of handling the following procedures: notifying upstream router that this virtual channel has become free, allocating a new packet to this virtual channel, delivering the flits through the switch and physical link into this virtual channel, doing routing computation and output virtual channel allocation of its downstream router, and possibly incurring stalls due to contention. This kind of packet-switching latency can t be neglected because data transmission within the related virtual channel stops and unable to provide any contribution to throughput during this period of time. Usually the packet-switching latencies can be hidden by using multiple virtual channels: some virtual channels may still work well while some virtual channels are reloading their new packets. However, if the interconnection network is seriously congested and most of virtual channels in routers are halted by contention stalls, any loss of virtual channels caused by switching packets will make the situation worse and the influence of these packet-switching latencies becomes more apparent. The way solving this issue is to find out how to shorten the packet-switching latencies, and thus one straightforward method is to bring the next incoming packet into the input buffer in advance without waiting for a virtual channel released to be free and standing by. This helps halted virtual channels returning back to work normally as soon as possible so that more packets can be provided to choose to traverse the switch fabric. B. Traffic Loads with Short Packets specially in some scenarios, interconnection networks carry traffic loads mixing long data packets with short control packets (e.g., commands, acknowledgments, etc.). ach short packet still occupies one virtual channel but brings only few flits to make the physical channel and unused buffers idle in most of the time. Short packets also make virtual channels switching more frequently than long ones, and have more chances of letting virtual channels halted due to waiting for the arrivals of new packets. When encountering some blocking, each blocked short packet will stay in its virtual channel and thus be distributed into one of many different routers. However, this situation can be improved by allowing several short packets which stop forwarding to be compacted in a few routers as long as there are enough buffer spaces available in these routers. Having more packets contained in the input buffers not only leads to higher buffer utilization rate but also releases more available virtual channel entities for transmitting more packets in the whole interconnection network. B() (a) SMQ T(D) H(D) (c) SMQ-MP H(F) B() (b) DMQ (d) DMQ-MP Figure 1. Comparison of buffer organizations. ach flit is labeled with its type (H, B, and T standing for head flit, body flit, and tail flit respectively) and the packet it belongs to (i.e., the smaller letter within the parentheses). or or C. Dynamic Input Buffer Management Based on the reasons mentioned above, to make an input buffer accommodate multiple packets whose amount is more than its number of virtual channels can take some advantages. However, in order to keep more associated packet information due to the increasing number of packets, extra hardware such as registers used for state fields as well as pointers and control logics must be added. If applying this method to the SMQ organization and then becoming the SMQ with multiple packets (SMQ-MP) organization as shown in Figure 1c, it is obviously unsuitable and impractical. First of all, letting the same virtual channel commonly shared by multiple packets will result in HOL blocking problems. lthough moving the blocked packet from its current virtual channel to another free virtual channel can overcome the HOL blocking, but the price on the extra hardware cost and complexity is too expensive. In addition, like the buffer allocation in SMQ buffers is partitioned statically, these extra registers used for state fields and pointers waste along with reserved buffer resources if their associated packets belonging to the dedicated virtual channels are absent. On the contrary, the DMQ organization is more suitable to this scheme due to its linked-list buffer structure, as shown in Figure 1d. In the original DMQ buffer, each set of state fields and pointers is dedicated to one virtual channel. However, if the number of packets in a buffer is no more limited by the number of virtual channels but by a quota of packets, an input buffer must reserve one set of state fields and pointers for each packet, not for each virtual channel. Because every packet is stored as a linked list and buffer spaces as well as virtual channels are used among packets, even if the number of packets in use is far less than the reserved quota of packets in a buffer at all, there are no buffer resources waste for unused packets except the associated sets of packet state fields and pointers. T(D) H(D)

3 III. TH DMQ-MP SCHM. Overview of DMQ Buffers DMQ buffer is structured as linked lists for dynamically adjusting the depths of virtual channels for more efficiently utilizing the input buffer. In this way, the precious memory resources are shared by all virtual channels in the same input unit, and busy virtual channels can get more buffer spaces than idle ones. n evident difference between the SMQ and DMQ organization is the mechanism of credit management. The DMQ router must collect all credit information and then put it together in the virtual channel allocator to make centralized credit management with several extra counters. In practice, some restrictions also need to apply to the DMQ buffer allocation policies for avoiding deadlock and load imbalance. B. DMQ-MP Router rchitecture In DMQ-MP buffers, whenever the tail flit of a packet enters the input buffer via one of the virtual channel entries connecting with the physical link, this packet will never use this virtual channel entry from now on and certainly can tear the corresponding allocation relationship by alerting the notification signal to its upstream router. Therefore, a new packet from its upstream router can be sent out and received via this free virtual channel entry. Then the newly arriving packet begins its routing computation, and all following flits belonging to this packet are linked together. fter the result of routing computation comes out and is stored into its corresponding state field, this packet will occupy one of the unused virtual channel exits connecting with the switch fabric to request output virtual channel and switch traversal bandwidth for departure. Here we use the terms of entry and exit to distinguish the virtual channels for receiving packets from the virtual channels for sending packets. These virtual channel entries and exits are respectively only responsible to the usage rights of the physical link and the switch fabric connecting with the input buffer. In the original DMQ organization, one packet enters, occupies, and leaves its virtual channel, and the entry and the exit it uses both belong to the identical virtual channel it occupies; nevertheless, in the DMQ-MP organization, an entry or an exit of the virtual channels is just an access gateway to write or read flits, and neither entries nor exits have to coexist in pairs. It is possible that a packet (like Packet D in Figure 2a), which has brought all its flits inside the DMQ-MP buffer through an entry, merely stays alone and waits for an exit to depart. s illustrated in Figure 2b, DMQ-MP buffers just like DMQ buffers need several pairs of head and tail pointers, one extra free-list pointer, and counters for all linked lists. xcept for the original state fields used for virtual channel exits to store their virtual channel status and assigned output virtual channel number, additional state fields for all packets are also needed to keep the routing computation results and some other associated attributes, for instance, the allocation priority of the packet. ssentially every packet in a DMQ- MP buffer exists individually and must possess a unique packet ID number. These packet ID numbers are uniformly assigned and managed by the centralized virtual channel allocator, and they are always carried with the head flits of their packets to their next routers for use of recognition while manipulating the input buffers, just like the virtual channel ID number carried with every flit. VC entries 4 1 D2 3 D1 (a) 2 C2 1 B1 C1 VC exits PKT1 Head PKT1 Tail PKT2 Head PKT2 Tail PKT3 Head PKT3 Tail PKT4 Head PKT4 Tail PKT5 Head PKT5 Tail PKT6 Head PKT6 Tail (b) Memory D1 1 D2 1 4 B1 3 2 C2 C1 Figure 2. DMQ-MP buffers and linked-list memory space. In order to keep track of which packets are currently using which access gateways to move in and out of the input buffer, each virtual channel entry and exit has to record the packet ID number of the packet which resides in it. These above packet ID numbers are stored in two arrays of registers named as vc_entry and vc_exit respectively. Meanwhile, another array of registers named as vc_list is used to record the arriving order of the packets which are waiting for free virtual channel exits, and these registers assist the input unit to decide the allocation order of the virtual channel exits. There are two copies of these three register arrays for managing one input buffer: one is certainly built inside the input buffer itself for buffer storage handling, and another is located in the virtual channel allocator of its upstream router for buffer storage allocation. The information of these three array registers (i.e., packet ID numbers) are continuously updated to keep their consistency according to the transitions of the vc_free and pkt_left signals as well as the head flits of packets. The vc_free signals are exactly identical to the same signals in the DMQ router to notify its upstream router that there is a free virtual channel ready for being used again, but now they only refer to the occupation of the virtual channel entries in a DMQ-MP input buffer. On the other hand, the pkt_left signals are similar to the vc_free signals for indicating the departures of packets and merely respond to the availability of the virtual channel exits by giving notifications to the upstream router. The information about mapping a packet to a virtual channel entry is carried with the head flit of a packet and passed to the input unit of its downstream router after transmitting this head flit. While comparing the proposed DMQ-MP router with the original DMQ router, they almost have identical hardware components and behavior in appearance, but there are two huge differences inside the input units and the virtual channel allocator. The first different part is relative to the novel packet number system. very packet existing in an input buffer must have a unique ID number to distinguish itself from others. Handling any proceedings about packets will need to use these packet ID numbers. ll packet ID numbers belonging to the same input buffer are centrally managed and distributed by the virtual channel allocator of

4 its upstream router. While a packet in a virtual channel exit requests an output virtual channel to the input buffer of its downstream router, it will also need to be assigned an available packet ID number for the input buffer. Certainly this assigned packet ID number will be used until the associated packet has completely left the input buffer, and then the input unit must notify the virtual channel allocator of its upstream router via the pkt_left signal that this packet ID number is no longer used and available again. The second different part is that the DMQ-MP router induces the concept of decoupling the virtual channel entries and exits. lthough the same number of virtual channel entries and exits in a DMQ or DMQ-MP router, a packet in the DMQ router passes through the input buffer via an entry and an exit belonging to the identical virtual channel, whereas any pair of a virtual channel entry and a virtual channel exit in the DMQ-MP input buffer are irrelative and handle packets independently. To manage all of these virtual channel entries and exits in DMQ-MP buffers relies on the above-mentioned three array registers (i.e., vc_entry, vc_exit, and vc_line ) as well as two notification signals (i.e., vc_free and pkt_left ). C. Characteristics Because of the intrinsic characteristic of the linked-list data structure, a DMQ router can more easily enhanced to bring in more packets inside than a SMQ router. s mentioned previously, one major cause of improving performance is that the DMQ-MP mechanism reduces the idle time of reloading a new packet into the halted virtual channel exit which the contained packet inside just leaves. nother cause is that the DMQ-MP buffers let all virtual channel entries keep receiving packets all the time if there are enough packet sources coming from the upstream router, especially for short packets. Therefore, the DMQ-MP organization can be beneficial for the cases of short packets, large buffer capacity (relative to packet lengths), heavy traffic congestion, and small number of virtual channels. Because the DMQ-MP scheme handles buffer resource allocation in units of packets instead of virtual channels, the increased hardware costs are just several registers for pointers, counters, state fields, and virtual channel mapping arrays, as well as the extra signals and control logics for packet system management and virtual channel decoupling. Owing to the additional mapping transformations from virtual channels to packets and vice versa, the data access delay may be prolonged to make the performance degraded. Certainly, we can add more sophisticated hardware such as register pre-fetch units to avoid or compensate the loss on performance, and they are trade-offs while designing DMQ-MP routers. nother possible problem is that too many packets residing in a crowded but small DMQ-MP buffer will compress the available spaces that a packet possibly can get. If lots of flits belonging to one packet have already existed in the input buffer, bringing one more remaining flit of this packet still can t make it forward further until all of its preceding flits have left. Therefore, one simple method is adding priorities to the allocating switch process to always first bring a flit that is the most likely to depart the input buffer soon. The priority levels can be estimated by counting the minimum amount of flits which are in front of the candidate flit and leave the input buffer possibly before it. Then the switch allocator sets priorities of all requesting packets based on the counting results. Imposing the priorities on the switch allocation can effectively solve the load imbalance problem occurring inside a DMQ-based input buffer and make the data transmission ceaselessly. D. Case Discussion: Cut-in-Line for High-Priority Packets In most cases, the packets encapsulating control information are shorter than the packets encapsulating raw data in packet length. In addition, these control packets are usually much more important than raw-data packets to a certain extent and demand faster transmission by possessing higher priority level. In the case of all packets sharing the same network fabric, the simplest method to prevent highpriority packets from being blocked by other packets is reserving some network resources, for instance at least one virtual channel, for high-priority packets to establish fast transferring routes all the time. The proposed DMQ-MP scheme is especially suitable for such reservation method due to its linked-list data structure and short lengths of high-priority packets. Because the occurrence probability and the amount of high-priority packets are both in very small proportions, the high-priority dedicated virtual channels can keep the least minimum number of reserved flit buffers while idle at most of time and can be dynamically inserted more available flit buffers while they are in use. Furthermore, if a DMQ-MP input buffer has more than one high-priority packets inside, switching process for high-priority packets in the virtual exits can immediately accomplish without letting any virtual exits idle. Because many packets appear at the same time in the DMQ-MP input buffer, one conceptual cut-in-line behavior possibly happens to shorten the waiting time to the high-priority packets for the virtual channel exits. s shown in Figure 3, when a virtual channel exit is freed up (by Packet B), the control logics definitely have to choose the foremost one of the high-priority packets (Packet ) to take that virtual channel exit, whether any low-priority packets L(B) (a) (b) Mapping Registers vc_entry vc_line Mapping Registers vc_entry vc_line D D cut-in-line vc_exit B C vc_exit Figure 3. xamples of the cut-in-line behavior. ach flit is labeled with its priority (H and L standing for high-priority and low-priority respectively) and the packet it belongs to (i.e., the smaller letter within the parentheses). C

5 12 verage Latency of 12-Flit Buffers with 3VCs and 8-Flit Packets verage Latency of 12-Flit Buffers with 3VCs and 8-Flit Packets in Bit Complement Traffic 12 verage Latency of 12-Flit Buffers with 3VCs and 8-Flit Packets in Transpose Traffic Throughput of 12-Flit Buffers with 3VCs and 8-Flit Packets Throughput of 12-Flit Buffers with 3VCs and 8-Flit Packets in Bit Complement Traffic Throughput of 12-Flit Buffers with 3VCs and 8-Flit Packets in Transpose Traffic (a) Uniform Random (b) Bit Complement (c) Transpose Figure 4. Performance of routers with 12-flit buffers and carrying traffic loads of 8-flit packets under different traffic patterns. (Packet D) are in front of that chosen high-priority packet or not. This kind of cut-in-line behavior can be easily made by inserting a newly coming high-priority packet into the place after all present high-priority packets as well as before all other packets in the waiting line vc_line, rather than directly adding it to the place right behind the last packet in the waiting line. IV. XPRIMNTL RSULTS We build a cycle-accurate flit-level simulator in SystemC to carry out all experiments. The interconnection network we simulate is an 8 x 8 mesh topology adopting 4-stage router pipeline, X-Y routing and wormhole switching flow control. We set the warm-up phase of 1, cycles to wait the network into its steady state, and then start to sample data during the measurement phase of 1, cycles. Unless otherwise specified, all buffer schemes are tested under the synthetic uniformly distributed random traffic pattern and the maximum number of packets in a DMQ-MP buffer is unlimited. The latency of a packet is the time interval measured from the time the head flit of the packet is generated by the traffic generator of a source node to the time the last flit of the packet leaves the network. The throughput is the average accepted traffic amount by a destination node per cycle.. Traffic Pattern Figure 4 shows the performance results of routers with buffer capacity of 12 flits and 3 virtual channels conveying 8-flit packets under different traffic patterns. No matter what kind of traffic pattern is, the DMQ-MP organization always has the best performance among all three buffer structures. specially when the network starts to get saturated, the dynamic buffer allocation and reduction of packet switching latencies in DMQ-MP organization makes the whole network accommodate more flits and reach a higher throughput value. Owing to the balanced loads in uniform random traffic distribution, the saturated throughput of the DMQ-MP routers is steadily about 8.53% higher than the other two. ven under the bit complement and transpose traffic patterns, the peak throughput improvements are still 9.56% and 1.56% respectively. B. Packet Length and Buffer Structure Compared with the result under random traffic in Figure 4a, we make a series of experiments of changing the ratio of packet length to buffer capacity that determines the expected amount of packets possibly residing in a buffer. s the result shown in Figure 5a, when packet length is relatively small to buffer capacity, the DMQ-MP routers can hold as many short packets as possible but the SMQ and DMQ ones can t due to the limitation of one packet per virtual channel. The saturated throughputs of the DMQ-MP routers are 24.52% higher than the other two for 4-flit packets. Then if the buffer size is increased to 18 flits with the same amount of virtual channels, the performance improvement between DMQ-MP routers and the other two expands as the input buffer capacity grows. s shown in Figure 5b, the improvement for the DMQ-MP routers relative to the DMQ ones in the saturated throughput rises to 11.24%. This proves that providing more sufficient buffer spaces for DMQ-MP routers to accommodate more packets at the same time can make them achieve better performance. If the buffers are kept to have the same average number of 4 flits per virtual channel, the difference of performance improvement between DMQ-MP and the others shrinks as adding more virtual channels. s shown in Figure 5c, the improvement of saturated throughput drops to 4.79%. The mechanism of multiple packets in the DMQ-MP routers facilitates the input unit to fast reload new packets to any idle

6 12 verage Latency of 12-Flit Buffers with 3VCs and 4-Flit Packets 12 verage Latency of 18-Flit Buffers with 3VCs and 8-Flit Packets 12 verage Latency of 16-Flit Buffers with 4VCs and 8-Flit Packets SMQ: 4-Flit Packets SMQ: Buff18, 3VCs SMQ: Buff16, 4VCs 11 DMQ: 4-Flit Packets 11 DMQ: Buff18, 3VCs 11 DMQ: Buff16, 4VCs DMQ-MP: 4-Flit Packets DMQ-MP: Buff18, 3VCs DMQ-MP: Buff16, 4VCs Throughput of 12-Flit Buffers with 3VCs and 4-Flit Packets.45 Throughput of 18-Flit Buffers with 3VCs and 8-Flit Packets.45 Throughput of 16-Flit Buffers with 4VCs and 8-Flit Packets SMQ: 4-Flit Packets SMQ: Buff18, 3VCs SMQ: Buff16, 4VCs.35 DMQ: 4-Flit Packets DMQ-MP: 4-Flit Packets.4 DMQ: Buff18, 3VCs DMQ-MP: Buff18, 3VCs.4 DMQ: Buff16, 4VCs DMQ-MP: Buff16, 4VCs (a) 4-Flit Packets (b) 18-Flit Buffer Size (c) 4 Virtual Channels Figure 5. Performance of routers with short packets, large buffer size, and more virtual cahnnels. virtual channel exits and keep as many virtual channels unceasingly running as possible. However, if there are already many virtual channels within a buffer, these efforts that the DMQ-MP scheme does become less apparent relative to the DMQ organization. C. Maximum Number of Packets Intuitively, setting the parameter maximum number of packets must depend on the buffer capacity and the length of packets. Hence, we choose a large 32-flit buffer with 4 virtual channels for this experiment. In Figure 6a, the results show that there is almost no performance improvement when the maximum number of packets is greater than 6. This is because the probability of more than one virtual channel entry, which have been freed up by these long packets, becoming available at the same moment is pretty low. Therefore, basically choosing the maximum number of packets one or two larger than the number of virtual channels for being standby is enough in most common cases, and this means that building the DMQ-MP router needs only few additional registers and control logics. D. Priorities for Switch llocation We pick a small 12-flit buffer with 3 virtual channels to test the proposed method of adding priorities for switch allocation. The main purpose of this method is trying to arrange the limited free buffer resources to packets which are almost running out of flits and desirous of supplies. The priorities are divided into three levels according to the counting values of to 3, 4 to 5, and greater than 5. In Figure 6b, this scheme seems to have no effects on the DMQ routers, but it improves the performance of the DMQ-MP routers when traffic loads are highly congested. lthough the saturated throughput of DMQ-MP routers only increases 1.65% after applied the priority scheme, this simple method indeed can prevent load imbalance and be easily integrated into any routers with the design of handling packet priority.. Virtual Channel Reservation for High-Priority Packets In the previous case discussion, the DMQ-based buffers should more fitting than the SMQ-based ones to the method of reserving certain high-priority dedicated virtual channels. Furthermore, high-priority packets inside the DMQ-MP buffers have bigger chances of doing cut-inline behaviors to pass through as fast as possible. Hence we design this experiment to compare the performance of DMQ and DMQ-MP routers with or without reservations of virtual channels to deal with high-priority packets. The buffers are set to 16-flit with 4 virtual channels, and the traffic loads are made up by 1% 1-flit high-priority and 99% 8-flit low-priority packets. If using high-priority reservation scheme, at least one of these virtual channels is reserved for high-priority packets; otherwise, all virtual channels are identical and can be used by any kinds of packets. In Figure 6c, average latencies of high-priority and lowpriority packets are saturated simultaneously because both kinds of packets injected by the same router node commonly come from the identical source queue. There are no differences among the performance results of these four sets while the packet injection rates are low. This observation proves that the dynamic buffer allocation mechanism of DMQ and DMQ-MP organizations indeed reduces the impact of taking some virtual channels for high-priority reservations on performance. Besides, this reservation scheme can facilitate an interconnection network to keep the average latency of high-priority packets almost constant until the injection rate reaches a higher value. The curves of average latency for low-priority packets in both organizations adopting the reservation scheme rise earlier than their counterparts without using the scheme. It is

7 verage Latency of 32-Flit DMQ-MP Buffer with 4VCs and 8-Flit Packets verage Latency of 12-Flit Buffers with 3VCs and 8-Flit Packets 12 verage Latency of 16-Flit Buffers with 4VCs and Different Priorities 4 Packets MX DMQ DMQ: HP 11 5 Packets MX 11 DMQ with 3 Priority Levels 11 DMQ: LP 6 Packets MX DMQ-MP DMQ with HP-First: HP 1 7 Packets MX 1 DMQ-MP with 3 Priority Levels 1 DMQ with HP-First: LP 8 Packets MX DMQ-MP: HP DMQ-MP: LP DMQ-MP with HP-First: HP DMQ-MP with HP-First: LP Throughput of 32-Flit DMQ-MP Buffer with 4VCs and 8-Flit Packets 4 Packets MX 5 Packets MX 6 Packets MX 7 Packets MX 8 Packets MX Throughput of 12-Flit Buffers with 3VCs and 8-Flit Packets DMQ DMQ with 3 Priority Levels DMQ-MP DMQ-MP with 3 Priority Levels Throughput of 16-Flit Buffers with 4VCs and Different Priorities DMQ: HP DMQ: LP DMQ with HP-First: HP DMQ with HP-First: LP DMQ-MP: HP DMQ-MP: LP DMQ-MP with HP-First: HP DMQ-MP with HP-First: LP (a) Maximum Number of Packets (b) Priorities for Switch llocation (c) VC Reservation for HP Packets Figure 6. Performance of DMQ-based routers with different schemes. HP and LP stand for high-priority and low-priority respectively. because the maximum amount of virtual channels which can be used by the low-priority packets decreases if using the reservation scheme; this also affects the saturated accepted throughputs. DMQ-MP routers can sustain low-latency transferring services for high-priority packets at a higher traffic injection rate than DMQ routers after they use the reservation scheme. This is because not only a DMQ-MP buffer can accommodate more high-priority packets but also the cut-in-line behaviors happen to speed up the highpriority packets passing through the buffer internally. V. CONCLUSION In this paper, we introduced a novel DMQ-MP organization for input buffers which can accommodate multiple packets more than the number of virtual channels. Because DMQ-MP buffers are not constrained by the limitation of one packet per virtual channel existing in these old-fashioned SMQ and DMQ buffers, they can well utilize those scarce buffer storage and physical link resources. In addition, the DMQ-MP scheme can bring standby packets in advance to reduce the waiting latencies of switching packets in virtual channel exits. The original DMQ organization solves the problem of shallow virtual channel depth while using small SMQ input buffers, and the proposed DMQ-MP scheme further solve the problem of few working virtual channels for DMQ routers while carrying heavy traffic loads. In addition, we discussed two methods applicable to DMQ-based buffers, which are adding priorities for switch allocation and reserving virtual channels only for high-priority packets. We performed some experiments on three buffer organizations and proved that the DMQ-MP scheme can take advantages in many scenarios. These experimental results showed that DMQ- MP routers can have up to 24.52% higher saturated throughput than SMQ and DMQ counterparts. CKNOWLDGMNT The authors thank the support from NSC under grants and and MO under grant 12-C S1-22. RFRNCS [1] Y. Hoskote, S. Vangal,. Singh, N. Borkar, and S. Borkar, 5- GHz Mesh Interconnect for a Teraflops Processor, I Micro, vol. 27, no. 5, pp , 27. [2] Y. Tamir and G. L. Frazier, Dynamically-llocated Multi-Queue Buffers for VLSI Communication Switches, I Transactions on Computers, vol. 41, no. 6, pp , [3] J. Park, B. W. O Krafka, S. Vassiliadis, and J. Delgado-Frias, Design and valuation of a DMQ Multiprocessor Network with Self-Compacting Buffers, In Proceedings of CM/I Conference on Supercomputing, pp , [4] J. Liu and J. G. Delgado-Frias, Shared Self-Compacting Buffer for Network-On-Chip Systems, In Proceedings of I International Midwest Symposium on Circuits and Systems, pp. 26-3, 26. [5] R. Sivaram, C. B. Stunkel, and D. K. Panda, HIPIQS: High- Performance Switch rchitecture Using Input Queuing, I Transactions on Parallel and Distributed Systems, vol. 13, no. 3, pp , 22. [6] M. Rezazad and H. Sarbazi-zad, The ffect of Virtual Channel Organization on the Performance of Interconnection Networks, In Proceedings of the 19th I International Parallel and Distributed Processing Symposium, 25. [7] C.. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das, ViChaR: Dynamic Virtual Channel Regulator for Network-on-Chip Routers, In Proceedings of International Symposium on Microarchitecture, pp , 26. [8] M. Lai, Z. Wang, L. Gao, H. Lu, and K. Dai, Dynamically- llocated Virtual Channel rchitecture with Congestion wareness for On-Chip Routers, In Proceedings of CM/I Design utomation Conference, pp , 28.

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS 1 SARAVANAN.K, 2 R.M.SURESH 1 Asst.Professor,Department of Information Technology, Velammal Engineering College, Chennai, Tamilnadu,

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov

Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov I Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov I. INTRODUCTION nterconnection networks originated from the

More information

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 34 (2014 ) 552 558 2014 International Workshop on the Design and Performance of Network on Chip (DPNoC 2014) Packet-based

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Abstract. Paper organization

Abstract. Paper organization Allocation Approaches for Virtual Channel Flow Control Neeraj Parik, Ozen Deniz, Paul Kim, Zheng Li Department of Electrical Engineering Stanford University, CA Abstract s are one of the major resources

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

A dynamic distribution of the input buffer for on-chip routers Fan Lifang, Zhang Xingming, Chen Ting

A dynamic distribution of the input buffer for on-chip routers Fan Lifang, Zhang Xingming, Chen Ting Advances in Computer Science Research, volume 5 4th International Conference on Electrical & Electronics Engineering and Computer Science (ICEEECS 26) A dynamic distribution of the input buffer for on-chip

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS A Thesis by SONALI MAHAPATRA Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Oriana Riva, Department of Computer Science ETH Zürich 1 Today Flow Control Store-and-forward,

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

A DAMQ SHARED BUFFER SCHEME FOR NETWORK-ON-CHIP

A DAMQ SHARED BUFFER SCHEME FOR NETWORK-ON-CHIP A DAMQ HARED BUFFER CHEME FOR ETWORK-O-CHIP Jin Liu and José G. Delgado-Frias chool of Electrical Engineering and Computer cience Washington tate University Pullman, WA 99164-2752 {jinliu, jdelgado}@eecs.wsu.edu

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Chronos Latency - Pole Position Performance

Chronos Latency - Pole Position Performance WHITE PAPER Chronos Latency - Pole Position Performance By G. Rinaldi and M. T. Moreira, Chronos Tech 1 Introduction Modern SoC performance is often limited by the capability to exchange information at

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Network-on-Chip Micro-Benchmarks

Network-on-Chip Micro-Benchmarks Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of

More information

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics Lecture 14: Large Cache Design III Topics: Replacement policies, associativity, cache networks, networking basics 1 LIN Qureshi et al., ISCA 06 Memory level parallelism (MLP): number of misses that simultaneously

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

CH : 15 LOCAL AREA NETWORK OVERVIEW

CH : 15 LOCAL AREA NETWORK OVERVIEW CH : 15 LOCAL AREA NETWORK OVERVIEW P. 447 LAN (Local Area Network) A LAN consists of a shared transmission medium and a set of hardware and software for interfacing devices to the medium and regulating

More information

Connection-oriented Multicasting in Wormhole-switched Networks on Chip

Connection-oriented Multicasting in Wormhole-switched Networks on Chip Connection-oriented Multicasting in Wormhole-switched Networks on Chip Zhonghai Lu, Bei Yin and Axel Jantsch Laboratory of Electronics and Computer Systems Royal Institute of Technology, Sweden fzhonghai,axelg@imit.kth.se,

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Power and Area Efficient NOC Router Through Utilization of Idle Buffers

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Power and Area Efficient NOC Router Through Utilization of Idle Buffers Mr. Kamalkumar S. Kashyap 1, Prof. Bharati B. Sayankar 2, Dr. Pankaj Agrawal 3 1 Department of Electronics Engineering, GRRCE Nagpur

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Lecture (05) Network interface Layer media & switching II

Lecture (05) Network interface Layer media & switching II Lecture (05) Network interface Layer media & switching II By: ElShafee ١ Agenda Circuit switching technology (cont,..) Packet switching technique Telephone network ٢ Circuit switching technology (cont,..)

More information

Operating Systems Unit 6. Memory Management

Operating Systems Unit 6. Memory Management Unit 6 Memory Management Structure 6.1 Introduction Objectives 6.2 Logical versus Physical Address Space 6.3 Swapping 6.4 Contiguous Allocation Single partition Allocation Multiple Partition Allocation

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Optical Packet Switching

Optical Packet Switching Optical Packet Switching DEISNet Gruppo Reti di Telecomunicazioni http://deisnet.deis.unibo.it WDM Optical Network Legacy Networks Edge Systems WDM Links λ 1 λ 2 λ 3 λ 4 Core Nodes 2 1 Wavelength Routing

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

Input Buffering (IB): Message data is received into the input buffer.

Input Buffering (IB): Message data is received into the input buffer. TITLE Switching Techniques BYLINE Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA. 30332 sudha@ece.gatech.edu SYNONYMS Flow Control DEFITION

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

AS SILICON technology enters the nanometer-scale era,

AS SILICON technology enters the nanometer-scale era, 1572 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010 An SDRAM-Aware Router for Networks-on-Chip Wooyoung Jang, Student Member, IEEE, and David

More information

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS 28 CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS Introduction Measurement-based scheme, that constantly monitors the network, will incorporate the current network state in the

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Switched Network Latency Problems Solved

Switched Network Latency Problems Solved 1 Switched Network Latency Problems Solved A Lightfleet Whitepaper by the Lightfleet Technical Staff Overview The biggest limiter to network performance is the control plane the array of processors and

More information

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS A Dissertation by MIN SEON AHN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network This Lecture BUS0 - Computer Facilities Network Management Switching networks Circuit switching Packet switching gram approach Virtual circuit approach Routing in switching networks Faculty of Information

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

UNIT- 2 Physical Layer and Overview of PL Switching

UNIT- 2 Physical Layer and Overview of PL Switching UNIT- 2 Physical Layer and Overview of PL Switching 2.1 MULTIPLEXING Multiplexing is the set of techniques that allows the simultaneous transmission of multiple signals across a single data link. Figure

More information

Application of SDN: Load Balancing & Traffic Engineering

Application of SDN: Load Balancing & Traffic Engineering Application of SDN: Load Balancing & Traffic Engineering Outline 1 OpenFlow-Based Server Load Balancing Gone Wild Introduction OpenFlow Solution Partitioning the Client Traffic Transitioning With Connection

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Lecture 7: Flow Control - I

Lecture 7: Flow Control - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical

More information

What Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues

What Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues 168 430 Computer Networks Chapter 13 Congestion in Data Networks What Is Congestion? Congestion occurs when the number of packets being transmitted through the network approaches the packet handling capacity

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Chapter 8 Virtual Memory What are common with paging and segmentation are that all memory addresses within a process are logical ones that can be dynamically translated into physical addresses at run time.

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Chapter 4 NETWORK HARDWARE

Chapter 4 NETWORK HARDWARE Chapter 4 NETWORK HARDWARE 1 Network Devices As Organizations grow, so do their networks Growth in number of users Geographical Growth Network Devices : Are products used to expand or connect networks.

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

Analyzing the Receiver Window Modification Scheme of TCP Queues

Analyzing the Receiver Window Modification Scheme of TCP Queues Analyzing the Receiver Window Modification Scheme of TCP Queues Visvasuresh Victor Govindaswamy University of Texas at Arlington Texas, USA victor@uta.edu Gergely Záruba University of Texas at Arlington

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

An Efficient Scheduling Scheme for High Speed IEEE WLANs

An Efficient Scheduling Scheme for High Speed IEEE WLANs An Efficient Scheduling Scheme for High Speed IEEE 802.11 WLANs Juki Wirawan Tantra, Chuan Heng Foh, and Bu Sung Lee Centre of Muldia and Network Technology School of Computer Engineering Nanyang Technological

More information

Wide area networks: packet switching and congestion

Wide area networks: packet switching and congestion Wide area networks: packet switching and congestion Packet switching ATM and Frame Relay Congestion Circuit and Packet Switching Circuit switching designed for voice Resources dedicated to a particular

More information

WITH THE CONTINUED advance of Moore s law, ever

WITH THE CONTINUED advance of Moore s law, ever IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2011 1663 Asynchronous Bypass Channels for Multi-Synchronous NoCs: A Router Microarchitecture, Topology,

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Nandini Sultanpure M.Tech (VLSI Design and Embedded System), Dept of Electronics and Communication Engineering, Lingaraj

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 14 Congestion Control Some images courtesy David Wetherall Animations by Nick McKeown and Guido Appenzeller The bad news and the good news The bad news: new

More information

Congestion in Data Networks. Congestion in Data Networks

Congestion in Data Networks. Congestion in Data Networks Congestion in Data Networks CS420/520 Axel Krings 1 Congestion in Data Networks What is Congestion? Congestion occurs when the number of packets being transmitted through the network approaches the packet

More information

What Is Congestion? Effects of Congestion. Interaction of Queues. Chapter 12 Congestion in Data Networks. Effect of Congestion Control

What Is Congestion? Effects of Congestion. Interaction of Queues. Chapter 12 Congestion in Data Networks. Effect of Congestion Control Chapter 12 Congestion in Data Networks Effect of Congestion Control Ideal Performance Practical Performance Congestion Control Mechanisms Backpressure Choke Packet Implicit Congestion Signaling Explicit

More information

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses. 1 Memory Management Address Binding The normal procedures is to select one of the processes in the input queue and to load that process into memory. As the process executed, it accesses instructions and

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Mark Sandstrom ThroughPuter, Inc.

Mark Sandstrom ThroughPuter, Inc. Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom

More information