MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs

Size: px
Start display at page:

Download "MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs"

Transcription

1 MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs Jose Duato 1, Sudhakar Yalamanchili 2, M. Blanca Caminero 3, Damon Love 2, Francisco J. Quiles 3 Abstract This paper presents the architecture of a router designed to efficiently support traffic generated by multimedia applications. The router is targeted for use in clusters and LANs rather than in WANs, the latter being served by communication substrates such as ATM. The distinguishing features of the proposed router architecture are the use of small fixed-size buffers, a large number of virtual channels, linklevel virtual channel flow control, support for dynamic modification of connection bandwidth and priorities, and coordinated scheduling of connections across all output channels. The paper begins with a discussion of the design choices and architectural trade-offs made in the current MultiMedia Router (MMR) project. The performance evaluation section presents some preliminary results of the coordinated scheduling of constant bit rate (CBR) traffic streams. 1. Introduction 1.J. Duato is with the Dept. of Information Systems and Computer Architecture, Universidad Politecnica de Valencia, P.O.B. 2212, Valencia, SPAIN, {jduato@gap.upv.es} 2.D. Love and S. Yalamanchili are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, {dlove, sudha}@ece.gatech.edu 3.B. Caminero and F. J. Quiles are with the Dept. of Computer Science, Escuela Politecnica Superior de Albacete SPAIN. {blanca, paco}@info-ab.uclm.es In the past few years we have seen an explosive growth in network-based multimedia applications. Example applications include web servers, video-on-demand servers, telemedicine, immersive environments, interactive simulations, and collaborative design environments. The data are often distributed and these applications individually require substantial bandwidth to meet real-time interactive constraints. The network bandwidth must also be shared by other applications that may not have as demanding constraints. The physical constraints of cluster/lan interconnects as well as the applications that utilize them produces different trade-offs in router design from those made in wide area networks (WANs). As a result we arrive at different and more effective architectural solutions for cluster/lan routers. The key issue is the ability to provide quality of service (QoS) guarantees at multiprocessor cut-through latencies. This makes it difficult to use existing substrates such as Gigabit Ethernet and ATM [22]in which message traffic encounters relatively large latencies as compared to networks such as Myrinet [5] or Tandem ServerNet [16,18]. Traditional router technology developed for highspeed multiprocessor networks is optimized for low latency and for best-effort traffic. However, these networks are not designed to permit concurrent guarantees for communication performance for multiple applications. The primary objective of the Multimedia Router (MMR) project is the design and implementation of a single-chip router optimized for multimedia applications. The goal is to provide architectural support to enable a range of quality of service (QoS) guarantees at latencies comparable to state-of-the-art multiprocessor cut-through routers. To achieve this goal we must provide solutions to many difficult hardware resource management and scheduling problems while constraining required resources to permit effective single-chip implementations. This paper presents some specific trade-offs made in the architecture of the MMR single-chip multimedia router and the results of some preliminary simulation experiments. 2. Application Requirements The main distinguishing features of multimedia communication environments are: Very long data streams Wide range of bandwidth requirements Large number of connections Jitter sensitive Latency tolerant

2 Short control messages Multimedia traffic may also coexist with best-effort traffic generated by other applications. The MMR should handle this hybrid traffic efficiently, satisfying the QoS requirements of multimedia traffic, minimizing the average latency of best-effort traffic, and maximizing link utilization. Constant bit rate (CBR) connections are relatively easy to handle. Bandwidth requirements can be met by allocating the requested bandwidth while establishing the connection. Variable bit rate (VBR) connections are more difficult to handle. When establishing a VBR coapproximate knowledge of the average and maximum bandwidth requirements. For example, it is possible to compute the average and maximum bandwidth requirements for a stored compressed movie. However, it is not possible to obtain exact values for real-time transmission of compressed video. Thus, the designer has to carefully select several qualitative and quantitative network design parameters. Qualitative parameters include network topology, switching technique, routing algorithm, admission control strategy, bandwidth allocation algorithm, link/switch scheduling algorithm, buffer organization, buffer management, and crossbar organization. Quantitative parameters include network size, link bandwidth, router degree (number of ports), router clock frequency, buffer size, and number of virtual channels. Physical Input Channels VCM+LS VCM+LS VCM+LS VCM+LS Phit Buffers Crossbar Switch Switch Scheduler Routing and Arbitration Unit VCM - Virtual Channel Memory LS - Link Scheduler Figure 1. The Architecture of the MultiMedia Router (MMR) Phit Buffers Physical Output Channels 3. Router Architecture Figure 1 illustrates the architecture of the MMR. The following subsections describe some basic trade-offs that were made in the individual components. 3.1 Switching Technique Multimedia traffic often requires a bounded jitter on long data streams. Switching techniques that reserve resources on the fly, like wormhole switching, are not well suited for such communication because the time a message is blocked waiting for a busy resource is not bounded. Connection oriented schemes with scheduling support within the routers are better suited to meeting jitter requirements over long data streams. On the other hand, control messages and best-effort traffic will benefit from low-latency switching techniques like wormhole or virtual cut-through (VCT) switching. The overhead of connection oriented schemes is excessive for these messages. A good trade-off can be achieved by using a hybrid switching technique [12, 29]. Long data streams can be transmitted by using circuit switching or a variant thereof by first establishing a connection from source to destination and then forwarding the data. Control messages and best-effort traffic can be transmitted by using wormhole or VCT switching. However, we would like both types of messages to use the same pool of link and buffer resources at a node without partitioning of resources among switching classes (e.g., as in [23]). Among the connection-oriented schemes, we have selected pipelined circuit switching (PCS) because it is simple, can be combined with wormhole or VCT [3], uses flow control to prevent data losses, requires small buffer storage associated with each virtual channel, and the overhead produced by control information is relatively low compared to LAN connection oriented schemes such as ATM. PCS for long data streams is combined with VCT for control and best-effort messages. In both cases the data is organized as a sequence of flow control digits or flits [9]. While the use of large flits amortizes flow control and scheduling delays, it also increases latency and buffer storage requirements. Latency can be reduced by pipelining flit transmission at a finer granularity. Pipelining can be done at the phit level, as in the Cray T3D router [25], where a phit is the amount of information that can be transmitted in parallel through a communication link. As serial links are frequent in LAN environments, we assume that pipelining is performed at the word level, where word size is equal to the width of the router internal data paths.

3 Control Word Decoder demux Buffer Buffer Buffer Buffer 3.2 Buffer Organization RAM RAM RAM RAM Address Generator Figure 2. Organization of Virtual Channel Memory When a connection is established, a virtual channel is reserved at each link in the path from source to destination. In order to support a large number of connections concurrently, buffers at each link must be organized as a large set of virtual channels. Virtual channels have been traditionally organized as a set of queues linked by a multiplexor. As indicated in [8], router delays can increase substantially when a large number of virtual channels are multiplexed onto physical links. This is due in part to the multiplexor and virtual channel controller delays. Moreover fully de-multiplexed crossbars [9] (i.e., one virtual channel per crossbar port) become prohibitively expensive in silicon area as the number of virtual channels increases. Thus, we pursue a different buffer organization to support a large number of virtual channels. The major buffer organizations that must be considered are central buffers, output buffers, and input buffers. We have considered these organizations and our current analysis [13] argues for input buffers modified as follows to remove head-of-line blocking. The MMR will use virtual channels organized as a set of interleaved RAM modules. Each flit is low-order interleaved across memory modules. Flits belonging to the same virtual channel are stored in adjacent sets of memory locations. The number of memory modules and flit size must be selected to balance memory access time, link speed, and crossbar switching delay, while masking flow control and scheduling delays. The read address required to retrieve flits is supplied by the link scheduler. The write address is obtained from the virtual channel flow control circuitry indicating the virtual channel identifier for the next flit transmission. As shown in Figure 1, small phit buffers are mux From Link Scheduler used for link buffers and are deep enough to store all the phits that arrive during a decoding period (i.e., during the computation of the memory address to store those phits). Phit buffers also allow low-latency routing of short messages using VCT, provided that there is no contention (i.e., the requested output link is free). Similarly, phit buffers allow a fast processing of probes and acknowledgments when establishing a connection. There is also some state information stored with each virtual channel that is used for scheduling. The collective resources are referred t o a s virtual channel m emory(vcm). By designing pipelined memory buffer systems we can match increasing external link speeds to decreasing intra-router delays. The variable parameters that can be adjusted include flit sizes, number of memory banks and the virtual channel depth. 3.3 Switch Organization In most routers the internal switch is implemented as a crossbar: multiplexed, partially multiplexed and fully de-multiplexed [9]. Although some routers use a fully demultiplexed crossbar [1], this organization becomes prohibitive when the number of virtual channels is large. Even for a relatively small number of virtual channels, some commercial routers use a multiplexed crossbar [6]. The MMR uses a multiplexed crossbar where the internal switch is a crossbar with as many ports as communication links. It reduces silicon area by V and V 2, respectively, with respect to a partially multiplexed and a fully de-multiplexed crossbar, where V is the number of virtual channels per link. The main drawback of a multiplexed crossbar is that arbitration is needed every time an input link switches from one virtual channel to another. Arbitration is required at the input side to select a virtual channel from each input link. Arbitration is also required at the output side because several virtual channels may request the same output link although this can be hidden by overlapping with the transmission of a previous flit if flits are large enough. Switch reconfiguration overhead can also be similarly hidden. An interesting feature of multiplexed crossbars is that buffers are not required at the output side. As switch output ports are directly connected to output links, flits are directly transmitted through the switch and the corresponding output link. However, a few phit buffers can be used to pipeline information through the switch and the link. Finally, serialization is required if internal data paths are wider than physical links.

4 3.4 Packet and Flit Transmission To fully exploit switch and link bandwidth while simplifying router design, the MMR synchronously assigns switch ports and output links to the requesting virtual channels. Flit transmission is organized as a sequence of flit cycles. During each flit cycle, all the input links with ready flits start by transmitting a control word containing the identifier of the virtual channel to which the next flit belongs. Then they synchronously transmit one flit through the switch and the corresponding output links. Concurrently, arbitration is performed for the assignment of switch ports and output links for the next flit cycle. Once the current flit transmission has finished, the switch is reconfigured. This operation requires one clock cycle. During reconfiguration, no data are transmitted. The MMR uses this cycle to perform pending transmissions of routing probes, backtracking probes, and acknowledgments for connection establishment [1]. Once the switch has been reconfigured, the next flit cycle starts. Although transmission is synchronous inside each router, it should be noted that different routers work asynchronously. Synchronous flit transmission is efficient for data streams but not for control and best-effort messages. These messages are transmitted asynchronously through the switch using VCT switching. Given the large flit sizes ( bits), the following discussion assumes that packet size is equal to flit size. Note that the unit of flow control and buffer management in VCT is a packet. Therefore, flow control units have the same size in PCS and VCT, thus simplifying router design. Packet headers are routed as soon as they reach a router and are forwarded to the routing and arbitration unit. If the requested switch input port and output link are free (i.e., they are not transmitting any flit during the current flit cycle) and there are free virtual channels in the requested output link, control packets are immediately forwarded because they have a higher priority than data streams. This transmission is not synchronized at flit cycle boundaries. As the transmission of control packets may take longer than the rest of the current flit cycle, the corresponding switch port and output link will be considered busy during link arbitration for the next flit cycle. If the requested switch port and/or output link are busy but there are free virtual channels at the next router, a virtual channel is reserved and the packet is stored in the corresponding buffer at the current router. It will be synchronously scheduled together with flits from data streams. If there are no free virtual channels at the next router, the packet is blocked and stored in the corresponding buffer at the current router. Best-effort packets are also routed as soon as the header reaches the routing and arbitration unit. However, best-effort packets have lower priority than data streams. If the requested output link has free virtual channels at the next router, a virtual channel is reserved. Otherwise, the packet is blocked. In both cases, the packet is stored in the corresponding buffer at the current router. Best-effort packets are synchronously scheduled together with flits from data streams. When a control or a best-effort packet is completely transmitted, the corresponding virtual channel is released. 3.5 Routing and Arbitration Unit The routing and arbitration unit executes the routing algorithm. The routing algorithm determines the path followed by the probes when establishing a connection, and the path taken by the best-effort packets. For best effort packets, the MMR uses a fully adaptive routing algorithm that has been proposed for wormhole networks with irregular topology [26,27] and is valid for VCT switching. Exhaustive profitable backtracking (EPB) [17] will be used when establishing connections. This algorithm performs an exhaustive search of the minimal paths in the network until a valid path is found or the probe backtracks to the source node. In order to avoid searching the same links twice, a history store associated with each input virtual channel records all the output links that have already been searched [17]. An implementation of such a protocol has been described in [1]. The routing and arbitration unit keeps the channel mappings between input and output virtual channels for established connections [17]. Virtual channels are specified by indicating the physical link and the virtual channel on that link. Direct and reverse channel mappings are stored. Direct mappings are required to forward data flits. Reverse mappings are used by backtracking headers and returned acknowledgments. Mappings are also used to propagate status information. 4. Bandwidth Allocation and Link/Switch Scheduling The MMR supports QoS for virtual connections realized with virtual channels. Support for QoS guarantees within the MMR takes the form of solutions to three basic problems: bandwidth allocation, link scheduling, and switch scheduling. The bandwidth allocation scheme allocates bandwidth to each connection when it is established while link

5 and switch scheduling strategies must operate in a tightly coupled manner to make the most effective use of the network bandwidth. The major challenge in a single chip MMR is that these strategies must have compact and fast implementations. 4.1 Link Operation Link bandwidth and switch port bandwidth are split into flit cycles: the time taken for a flit to be transmitted through the router and across the physical link. Flit cycles are grouped into rounds also referred to as frames. The number of flit cycles in a round is an integer multiple K (K > 1) of the number of virtual channels per link. Bandwidth for a connection is allocated as an integer number of flit cycles/round. Thus, a greater value of K provides a higher flexibility for bandwidth allocation. However, it may increase jitter on a connection since rounds take longer to complete. Therefore, the selected value for K is a trade-off between flexibility and jitter. The data structures used for supporting fast scheduling decisions are a set of status bit vectors, where each bit in a vector is associated with a single virtual channel. Bit vectors provide information about different conditions for all the virtual channels in the router. Examples of status bit vectors include: flits_available, input_buffer_full, CBR_service_requested, CBR_bandwidth_serviced, VBR_bandwidth_serviced, etc. A bit on one of these bit vectors is updated every time the status of a virtual channel changes. For example, if a given virtual channel has no flits available to be transmitted and a flit belonging to that virtual channel arrives, the corresponding flits_available status bit is updated in the corresponding bit vector. In general status bit vectors are either associated with input or output virtual channels depending on the implementation of the link scheduling algorithm. The basic idea here is to trade space (silicon) for time (scheduling decisions). Using status bit vectors each physical input link can quickly determine the set of input virtual channels at that link which satisfy some conditions with simple highly parallel bit operations. For example, we can quickly determine the virtual channels with flits_available and credits_available, by performing the logical AND of the corresponding bit vectors. Similarly, the sets of channels satisfying other more complex conditions can also be quickly obtained. 4.2 Bandwidth Allocation When a connection is being established in the MMR the source node generates a routing probe that tries to establish a connection by setting up a path from source to destination, reserving link bandwidth and buffer space along that path. If resource reservation is successful the connection is established and the request is granted. If resources cannot be reserved along the whole path, the connection fails and all the resources reserved during the construction of the path are released. Using a backtracking search, alternative paths through the network can be pursued. For CBR connections each probe carries information about the requested bandwidth measured in flit cycles/round. Each output link requires an associated register that keeps track of the total number of flit cycles/round that have been allocated. This register is incremented by the requested number of flit cycles when link bandwidth is allocated and decremented when a connection is removed. A CBR connection can only be allocated if the total number of flit cycles that have been allocated (including the current request) does not exceed the number of flit cycles in a round. Note that it is possible to reserve some bandwidth/round for best-effort traffic in order to prevent starvation of best-effort packets. For VBR connections, the problem is more difficult to solve since the amount of data varies over time. In order to deal with the varying requirements of different connections, a probe establishing a VBR connection will carry the permanent and peak bandwidth for that connection. These values may be estimates depending on the available knowledge of the connection behavior. To support bandwidth allocation for VBR connections, each output link requires an additional register to that used for CBR connections. This second register stores the total peak bandwidth requested by all of the connections using that link. These two registers are incremented by the permanent and peak bandwidth, respectively, when a connection is established and decremented by those values when a connection is removed. A VBR connection will only be accepted if i) the value of the first register plus the permanent bandwidth of the current connection does not exceed the number of flit cycles in a round, and ii) the value of the second register does not exceed the product of the number of flit cycles in a round and a concurrency factor. The concurrency factor is stored in a separate register and is set during power on. Note that the proposed bandwidth allocation mechanism does not guarantee that the connection will be assigned the requested peak bandwidth. Providing such a guarantee could waste a large fraction of

6 link bandwidth especially if peak bandwidths are worst case estimates. The probability that the peak bandwidth will be available at any point in time depends on the concurrency factor. The concurrency factor is a trade-off between the ability to make QoS guarantees, the number of connections that can be concurrently serviced, and link utilization. During data transmission, a policing protocol operates by limiting the injection of new flits into the network in such a way that each connection does not use higher link bandwidth than that allocated to it when the connection was established. The injection of flits for best-effort packets is automatically limited since it only uses bandwidth that is available after satisfying the requirements of connections that are guaranteed some minimal QoS. The MMR uses flow control to prevent flits from being discarded. Additionally, flit buffers are relatively small, producing a fast propagation of flow control information. Eventually, flow control may propagate backward up to the network interface at the source node, limiting the injection of new flits. Policing within the MMR would only be required if there was no policing at the interface. 4.3 Link Scheduling As described in the preceding discussion, the basic mechanism to support QoS in the MMR consists of allocating bandwidth to each connection when it is established and guaranteeing that the allocated bandwidth will be available during data transmission via link/switch scheduling. For CBR connections the link scheduler requires state to store the bandwidth allocated to each virtual channel. The link scheduling algorithm operates on a round basis and ensures that no virtual channel consumes more bandwidth than allocated. For VBR connections, link scheduling becomes more complex. Each virtual channel requires state information storing the permanent and peak bandwidth for that connection, respectively. Additionally, each virtual channel stores the priority associated with the data being transmitted. That priority can be dynamically modified by sending control words from the network interface, for example, based on application specific information. The link scheduling algorithm first assigns all the flit cycles in a round for CBR connections. Then, it assigns the permanent bandwidth to every VBR connection. Note that the bandwidth allocation mechanism guarantees that all the VBR connections can be assigned the permanent bandwidth. Now the link scheduling algorithm considers priorities. Starting from the highest-priority connection, bandwidth (flit cycles in a round) is allocated to each connection ensuring that no virtual channel consumes more than its peak bandwidth. Thus, at each intermediate router the link scheduler allocates flit cycles/round giving priority to CBR connections and permanent bandwidth on VBR connections followed by VBR connections in priority order. Conflicts at output ports will cause flits on some connections to be delayed. The MMR link scheduling approach recognizes that low-priority VBR connections may not be able to deliver all flits on time. However the presence of flow control enables delay information to propagate back to the source interface where appropriate action can be taken. For example, an interface may detect that a connection transmitting a low priority compressed video frame makes little progress at some point in time. The network interface may decide to abort the transmission of that frame. By doing so, less bandwidth is wasted in the transmission of a frame that will not meet the deadline. Also, note that the excess bandwidth (difference between peak and permanent bandwidths) requested by VBR connections is serviced in turn, completely servicing the excess bandwidth of one connection before moving to the next one. The idea here is that it is preferable to service the excess bandwidth of most VBR connections completely at the risk of not servicing some of them at all. Certainly other service disciplines are possible. Finally, we note that using control words along a connection we can dynamically vary the bandwidth requirements of a connection. This may be initiated by the source interface of a connection in response to external (CPU initiated) events or in response to actual performance that is experienced on a connection. The response may involve a change in data rate, selective dropping of data packets, or injection limitation. By using command encodings similar to those used in Myrinet [5] we can encapsulate dynamic bandwidth management in the flow control mechanism. The complex bandwidth control functions can be implemented in the network interfaces or source CPUs where there is a great deal of flexibility and application specific information. In return, the routers remain compact and fast. 4.4 Switch Scheduling Switch scheduling refers to the process of determining which input ports are connected to which output ports in a flit cycle for the transmission of a single flit per port. Switch scheduling must be tightly coupled with link scheduling. Ideally virtual channels on input links must be

7 selected in such a way that there are no conflicts in the use of switch output ports (note that each switch output port is connected to a physical link). For each round, the scheduling algorithm is invoked every flit cycle servicing connections and best-effort traffic. Switch scheduling schemes can be classified as inputdriven or output-driven. Input-driven schemes first consider the set of virtual channels in each input link. For each set, the link scheduling algorithm determines the virtual channel(s) that should transmit a flit during the next flit cycle. The direct channel mapping store indicates the requested output link for each selected virtual channel. As requests from different input links may contend for the same output link, some arbitration is required at each output port. This is the scheme used in the Intel Cavallino router [6]. On the other hand, output-driven schemes consider the set of input virtual channels requesting a given output link. For each set, the link scheduling algorithm determines the virtual channel that should transmit a flit during the next flit cycle. As contention may occur for the use of switch input ports, some arbitration is required for the assignment of switch input ports. For fully de-multiplexed switches output-driven schemes provide superior performance [15]. However, for a large number of virtual channels, a fully de-multiplexed crossbar is infeasible. For multiplexed crossbars the choice between input-driven and output-driven scheduling is not clear. In the former the state information associated with competing channels is located in the same router. This potentially enables intelligent decisions in arbitrating between competing requests from input ports. In outputdriven schemes the state information associated with competing channels are naturally in distinct routers. Separating the state information from the buffer location would appear to result in more complex flow of control information with the resulting overhead. Therefore, the MMR uses input-driven switch scheduling. In the MMR we consider switch scheduling algorithms that attempt to schedule all ports concurrently and synchronously set the switch. These scheduling decisions can be classified according to mechanism for arbitrating among multiple requests for an output port. Arbitration can be performed by using static priorities, dynamic priorities or random selection. The MMR utilizes a dynamic priority biasing scheme motivated by the priority biasing scheme proposed in [7,2]. The priorities of the flits at the head of an input virtual channel are updated periodically as often as every flit cycle. The unique aspect of this scheme is that the rate at which these priorities grow is a function of the QoS metric used for the corresponding connection. The result is a more equitable distribution of bandwidth across connections in a manner that is dependent upon the type of service guarantees rather than simply the time spent by the packet in the network. Dynamic re-computation of priorities can be a time consuming task and the challenge is to develop solutions that can be implemented efficiently in area and time. Finally, switch scheduling algorithms can be classified according to the number of candidates offered by the link scheduling algorithm from each group of virtual channels on an input link. For example, instead of selecting a single virtual channel from each input link, the router can select a set of candidates. This set is simply obtained as the result of some operations with bit vectors (for instance, the set of input virtual channels at that link with flits_available, credits_available for flit transmiss ion, CBR_service_requested and not CBR_Completely_Serviced). Using bit vectors, identification of the set of channels can be quite fast. Having more candidates available per switch port increases the probability of fully utilizing the switch bandwidth in a flit cycle. However, the process of arbitrating among multiple requests is now more complex and time consuming. The challenge is in deciding how many candidates should be considered at an input port to maximize switch bandwidth with minimal impact on switch cycle time. Concurrently with the transmission of a flit, the scheduling algorithm computes the set of virtual channels that will transmit a flit during the next flit cycle. The switch is then reconfigured according to the computed input-output port assignment and the next flit cycle starts. This process is repeated until the round is completed. In summary, the MMR scheduler attempts to maximize the probability of assigning virtual channels to every output link during each flit cycle by using sets of candidates (4-8) at each input port and fast priority biasing schemes. Our inclination is to favor simpler, faster schedulers over more complex schedulers that might make better decisions but which lengthen the switch cycle time. 5. Simulation Studies This section provides some preliminary results for link and switch scheduling. The goal here was to determine if the partitioning of scheduling functionality in the MMR (as shown in Figure 1), the use of large flits, and support for a large number of virtual channels could produce good jitter and delay characteristics in a cut-through router architecture. Encouraging results would point us in the directions for further refinement.

8 Simulation experiments were conducted using a C++ discrete event simulator that models a single router. The following experiments represent an 8x8 router with 256 virtual channels/input port, 1.24 Gbps physical links and 128-bit flits. The behavior for slower links speeds, such as 622 Mbps and 155 Mbps, were qualitatively the same and therefore these latter results are excluded. The simulations were run until steady state was reached and statistics gathered over approximately 1, router cycles. Connections were randomly selected from the set (64 Kbps, 128 Kbps, 1.54Mbps, 2Mbps, 5Mbps, 1Mbps, 2Mbps, 55Mbps, 12Mbps) and assigned to random input and output ports on the router. The offered load is computed as the percentage of switch bandwidth demanded by all connections through the router. These preliminary experiments were conducted on CBR connections and did not include best-effort and control messages. Admission control can guarantee that connections are established only if bandwidth is available on a link and the inter-arrival time on a connection is constant. This study was limited to CBR connections to enable easier interpretation of the interaction between link and switch scheduling and gain some insight into potential bottlenecks. Delay is computed as difference between the times a flit is ready to be transmitted through the switch and the time it actually leaves the switch. The jitter on a connection is defined as the difference in the delays of successive flits on a connection. The flit cycle time is determined by the physical link speed and the flit size. 5.1 Scheduling Algorithms We have studied a simple link scheduling algorithm that uses a biased priority based on the ratio of the delay experienced by a flit at the switch and the inter-arrival time on the connection. This priority is recomputed for all connections (head flit) each flit cycle. High speed connections clearly have their priorities grow at a faster rate. For comparison purposes we include results from an algorithm that represents the scheduling in the Autonet switch [2,24]. This algorithm differs in how the candidates are selected at input links and in how conflicts for output ports are arbitrated. To capture ideal performance we also implemented a perfect switch to provide a lower bound on the delay and jitter, and upper bound on switch utilization. When multiple inputs of a perfect switch request the same output port, the flits from these input ports are transmitted to that output port in one flit cycle. Effectively the switch internal bandwidth is N times the link bandwidth for an NxN switch. There are no port conflicts in a perfect switch and therefore no switch scheduling overheads. 5.2 Simulation Results Jitter (router cycles) jitter vs. Offered Load 1.24 Gb Link 1 C Biased 2 C Biased 1 C Fixed 2 C Fixed Figure 3. Jitter vs. Offered Load: Fixed and Biased Priorities In general, we have observed that using a larger number of candidates is effective in increasing switch utilization and is not significantly affected by the priority scheme. This is because while priorities affect which flits are transmitted in a flit cycle they do not directly affect how many inputs can be concurrently transmitting in the same cycle. In contrast the jitter and delay characteristics of individual connections are very sensitive to the priority scheme. The jitter characteristics for fixed and biased priority schemes are illustrated in Figure 3. The vertical axis is represented in router cycles (equivalently flit cycles) which is the time to transmit a flit across the router or link. We chose to represent the jitter in terms of flit cycles since flits emerge from the network at flit cycle boundaries and jitter occurs as an integer number of flit cycles. With 1.24 Gbps links and 128-bit flits a flit cycle is approximately 13 ns. We see that priority biasing consistently performs better especially with a higher number of candidates. These jitter values are averaged over a large range of connection speeds. Actual jitter values for high-speed connections will be even less and those for low-speed connections will be relatively higher. While we may not be too concerned with relatively higher jitter values on a 64 Kbps connection we expect that jitter values on a 1 Mbps connection will be of more concern. Thus, overall the proposed scheme appears very encouraging for use in multimedia routers. Jitter (router cycles) Jitter vs. Offered Load 1.24 Gb Link 4 C Biased 8 C Biased 4 C Fixed 8 C Fixed

9 25 2 Delay vs. Offered Load 1.24 Gb Link 1 C Biased 2 C Biased 1 C Fixed 2 C Fixed 7 6 Delay vs. Offered Load 1.24 Gb Link 4 C Biased 8 C Biased 4 C Fixed 8 C Fixed 7 6 Delay vs. Offered Load 1.24 Gb Link biased Fixed DEC Perfect Jitter vs. Offered Load 1.24 Gb Link biased Fixed DEC Perfect Delay (microseconds) 15 1 Delay (microseconds) 4 3 Delay (microseconds) 4 3 Jitter (router cycles) Figure 4. Delay vs. Offered Load: Fixed and Biased Priorities Figure 4 illustrates the behavior of delay expressed in microseconds as a function of offered load (note the plots for 1 and 2 candidates are clipped to avoid scaling problems). It is apparent that prior to saturation the delay characteristics using a biased priority are consistently better than that of the fixed priority scheme. The differences are particularly pronounced in the region just prior to saturation. For example with two candidates and at 7% load, the biased scheme produces an average delay of.82 microseconds while with fixed priority we have ~5 microseconds. With 8 candidates delays for biased priorities are consistently in the range of.4-.6 microseconds while the fixed priorities realize delays on the order of 1-2 microseconds. Saturation does not appear to occur before 95% load. Figure 5 compares the delay and jitter characteristics of four algorithms. The plots for biased and fixed priority use 8 candidates. The results are quite favorable with the use of 8 candidates, closely tracking the performance of the perfect switch. While the Autonet algorithm realizes very good jitter characteristics at high loads (>8%), the biased priority scheme maintains extremely low jitter values ranging from.168 router cycles at 8% load to.51 router cycles at 95% load. 6. Concluding Remarks The primary (longer term) objective of the Multimedia Router (MMR) project is the design and implementation of a single-chip router optimized for multimedia applications. This paper focused in delineating the initial trade-offs that were made and on the scheduling frame- Figure 5. Delay and Jitter vs. Offered Load: Fixed and Biased Priorities, Autonet, Perfect Switch work being employed. Targeting 1-2 Gbps links and 128- bit flit sizes, the crossbar must be capable of computing switch settings at a rate of 64 ns-128 ns. At the heart of the proposed algorithm is a dynamic priority update or priority biasing scheme. Other novel features/goals of MMR include use of flow control for dynamic bandwidth management, fixed-size buffers for VBR traffic, cachelike memory design for the virtual channel memory, and the partitioning of switch and link scheduling functionality. Preliminary performance evaluation results indicate that the use of biased priorities is consistently better below switch saturation. These numbers are predicated on the availability of a router that can schedule a switch in a single cycle. In the example here this would imply a cycle time of 13 ns which is matched to the rate at which 128- bit quantities arrive on a 1.24 Gbps link. The MMR architecture was formulated to permit concurrency and pipelining. The link schedulers can all operate in parallel and be pipelined with the switch scheduler. While we have been concerned with the study of the functional behavior of the switch scheduler, we now turn our attention to supported VBR traffic and best-effort traffic and the hardware implementation to meet these timing constraints. 7. References [1] J. D. Allen, et al., Ariadne - an adaptive router for fault-tolerant multicomputers, Proceedings of the 21st International Symposium on Computer Architecture, pp , April 1994.

10 [2] T. E. Anderson et. al., High speed switch scheduling for local area networks, Technical Report SRC research report 99, DEC. Also in ACM Transactions on Computer Systems, November, [3] B. V. Dao, J. Duato and S. Yalamanchili, Configurable flow control mechanisms for fault-tolerant routing, Proceedings of the 22nd International Symposium on Computer Architecture, pp , June [4] S. Balakrishnan and F. Ozguner, A priority-based flow control mechanism to support real-time traffic in pipelined direct networks, Proceedings of the 1996 International Conference on Parallel Processing, vol. I, pp , August [5] N. J. Boden, et al., Myrinet - A gigabit per second local area network, IEEE Micro, pp.~29--36, February [6] J. Carbonaro and F. Verhoorn, Cavallino: The teraflops router and NIC, Proceedings of Hot Interconnects Symposium IV, August [7] A. Chien and J. H, Kim, Approaches to Quality of Service in High Performance Networks, Lecture Notes in Computer Science: Proceedings of the Workshop on Parallel Computer Routing and Communication, Springer Verlag (pubs.), pp.1-2, June [8] A. A. Chien, A cost and speed model for k-ary n-cube wormhole routers, Proceedings of Hot Interconnects 93, August [9] W. J. Dally, Virtual-channel flow control, IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 2, pp , March [1] W. J. Dally, et. al., The Reliable Router: A reliable and high-performance communication substrate for parallel computers, Lecture Notes in Computer Science: Proceedings of the Workshop on Parallel Computer Routing and Communication, Springer Verlag (pubs.) pp , May [11] J. Duato, A new theory of deadlock-free adaptive routing in wormhole networks, IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 12, pp , December [12] J. Duato, P. Lopez, F. Silla and S. Yalamanchili, A high performance router architecture for interconnection networks, Proceedings of the 1996 International Conference on Parallel Processing, vol. 1, pp , August [13] J. Duato, S. Yalamanchili, B. Caminero, D. Love, F. J. Quiles, MMR: Architecture and Trade-offs in a High Performance Multimedia Router, Technical Report, Computer Architecture and Systems Laboratory, Georgia Institute of Technology available from [14] D. Ferrari, D. Verma A scheme for real-time channel establishment in wide-area networks, IEEE Journal on Selected Areas in Communications, April 199. [15] M. L. Fulgham and L. Snyder, A comparison of input and output driven routers, Proceedings of Euro- Par 96, vol. 1, pp , August [16] D. Garcia and W. Watson, ServerNet II, Lecture Notes in Computer Science, Proceedings of the Workshop on Parallel Computer Routing and Communication, Springer Verlag (pubs.), pp , June [17] P. T. Gaughan and S. Yalamanchili, A family of faulttolerant routing protocols for direct multiprocessor networks, IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 5, pp , May [18] R. Horst, TNet: A Reliable System Area Network, IEEE Micro, pp , February [19] M. G. H. Katevenis, et al., ATLAS I: A single-chip ATM switch for NOWs, Proceedings of the Workshop on Communications and Architectural Support for Network-based Parallel Computing, February [2] J.H.~Kim, Bandwidth and latency guarantees in lowcost, high-performance networks, Ph. D. Dissertation, University of Illinois at Urbana-Champaign, [21] A. Mekkittikul and N. McKeown, A practical scheduling algorithm to achieve 1% throughput in inputqueued switches, Proceedings INFOCOM, pp , April, [22] M. Prycker, Asynchronous transfer mode: solution for broadband ISDN, Ellis Horwood Limited, Chichester, West Susex, PO191EB, England, [23] J. Rexford, J. Hall and K. G. Shin, A Router Architecture for Real-Time Point-to-Point Networks, Proceedings of the International Symposium on Computer Architecture, May [24] M. D. Schroeder et al., Autonet: A high-speed, selfconfiguring local area network using point-to-point links, Technical Report SRC research report 59, DEC, April 199. [25] S. L. Scott and G. Thorson, Optimized routing in the Cray T3D, Lecture Notes in Computer Science: Proceedings of the Workshop on Parallel Computer Routing and Communication, Springer Verlag (pubs.), pp , May [26] F. Silla, et al., Efficient adaptive routing in networks of workstations with irregular topology, Lecture Notes in Computer Science: Proceedings of the Workshop on Communications and Architectural Support for Network-based Parallel Computing, pp , February [27] F. Silla and J. Duato, Improving the efficiency of adaptive routing in networks with irregular topology, Proceedings of the 1997 Conference on High Performance Computing, December [28] C. B. Stunkel, et al., The SP2 high-performance switch, IBM Systems Journal, vol. 34, no. 2, pp , February [29] Y. L. Chen and J.-C. Liu, A hybrid interconnection network for integrated communication services, Proceedings of the 11th International Parallel Processing Symposium, pp , April 1997.

11 MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs Jose Duato 1, Sudhakar Yalamanchili 2, M. Blanca Caminero 3, Damon Love 2, Francisco J. Quiles 3 Abstract This paper presents the architecture of a router designed to efficiently support traffic generated by multimedia applications. The router is targeted for use in clusters and LANs rather than in WANs, the latter being served by communication substrates such as ATM. The distinguishing features of the proposed router architecture are the use of small fixed-size buffers, a large number of virtual channels, linklevel virtual channel flow control, support for dynamic modification of connection bandwidth and priorities, and coordinated scheduling of connections across all output channels. The paper begins with a discussion of the design choices and architectural trade-offs made in the current MultiMedia Router (MMR) project. The performance evaluation section presents some preliminary results of the coordinated scheduling of constant bit rate (CBR) traffic streams. 1. Introduction 1.J. Duato is with the Dept. of Information Systems and Computer Architecture, Universidad Politecnica de Valencia, P.O.B. 2212, Valencia, SPAIN, {jduato@gap.upv.es} 2.D. Love and S. Yalamanchili are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, {dlove, sudha}@ece.gatech.edu 3.B. Caminero and F. J. Quiles are with the Dept. of Computer Science, Escuela Politecnica Superior de Albacete SPAIN. {blanca, paco}@info-ab.uclm.es In the past few years we have seen an explosive growth in network-based multimedia applications. Example applications include web servers, video-on-demand servers, telemedicine, immersive environments, interactive simulations, and collaborative design environments. The data are often distributed and these applications individually require substantial bandwidth to meet real-time interactive constraints. The network bandwidth must also be shared by other applications that may not have as demanding constraints. The physical constraints of cluster/lan interconnects as well as the applications that utilize them produces different trade-offs in router design from those made in wide area networks (WANs). As a result we arrive at different and more effective architectural solutions for cluster/lan routers. The key issue is the ability to provide quality of service (QoS) guarantees at multiprocessor cut-through latencies. This makes it difficult to use existing substrates such as Gigabit Ethernet and ATM [22]in which message traffic encounters relatively large latencies as compared to networks such as Myrinet [5] or Tandem ServerNet [16,18]. Traditional router technology developed for highspeed multiprocessor networks is optimized for low latency and for best-effort traffic. However, these networks are not designed to permit concurrent guarantees for communication performance for multiple applications. The primary objective of the Multimedia Router (MMR) project is the design and implementation of a single-chip router optimized for multimedia applications. The goal is to provide architectural support to enable a range of quality of service (QoS) guarantees at latencies comparable to state-of-the-art multiprocessor cut-through routers. To achieve this goal we must provide solutions to many difficult hardware resource management and scheduling problems while constraining required resources to permit effective single-chip implementations. This paper presents some specific trade-offs made in the architecture of the MMR single-chip multimedia router and the results of some preliminary simulation experiments. 2. Application Requirements The main distinguishing features of multimedia communication environments are: Very long data streams Wide range of bandwidth requirements Large number of connections Jitter sensitive Latency tolerant

Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router

Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router B. Caminero C. Carrión F. J. Quiles J. Duato S. Yalamanchili Dept. of Computer Science. Escuela Politecnica Superior.

More information

Physical Input Links. Small Buffers

Physical Input Links. Small Buffers 1 Performance Evaluation of the Multimedia Router with MPEG-2 Video Trac M. B. Caminero, F. J. Quiles, J. Duato, D. Love, S. Yalamanchili Abstract The Multimedia Router (MMR) architecture is aimed at providing

More information

University of Castilla-La Mancha

University of Castilla-La Mancha University of Castilla-La Mancha A publication of the Department of Computer Science Traffic Scheduling Solutions with QoS Support for an Input-Buffered MultiMedia Router by Blanca Caminero, Carmen Carrión,

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Deadlock- and Livelock-Free Routing Protocols for Wave Switching

Deadlock- and Livelock-Free Routing Protocols for Wave Switching Deadlock- and Livelock-Free Routing Protocols for Wave Switching José Duato,PedroLópez Facultad de Informática Universidad Politécnica de Valencia P.O.B. 22012 46071 - Valencia, SPAIN E-mail:jduato@gap.upv.es

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Input Buffering (IB): Message data is received into the input buffer.

Input Buffering (IB): Message data is received into the input buffer. TITLE Switching Techniques BYLINE Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA. 30332 sudha@ece.gatech.edu SYNONYMS Flow Control DEFITION

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,

More information

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ P. López, J. Flich and J. Duato Dept. of Computing Engineering (DISCA) Universidad Politécnica de Valencia, Valencia, Spain plopez@gap.upv.es

More information

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ A First Implementation of In-Transit Buffers on Myrinet GM Software Λ S. Coll, J. Flich, M. P. Malumbres, P. López, J. Duato and F.J. Mora Universidad Politécnica de Valencia Camino de Vera, 14, 46071

More information

Investigating QoS Support for Traffic Mixes with the MediaWorm Router

Investigating QoS Support for Traffic Mixes with the MediaWorm Router Investigating QoS Support for Traffic Mixes with the MediaWorm Router Ki Hwan Yum Aniruddha Vaidya Chita R. Das Anand Sivasubramaniam Department of Computer Science and Engineering The Pennsylvania State

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1 EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

A Scalable Multiprocessor for Real-time Signal Processing

A Scalable Multiprocessor for Real-time Signal Processing A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch

More information

A New Hardware Efficient Link Scheduling Algorithm to Guarantee QoS on Clusters

A New Hardware Efficient Link Scheduling Algorithm to Guarantee QoS on Clusters A New Hardware Efficient Link Scheduling Algorithm to Guarantee QoS on Clusters J.M. Claver 1, M.C. Carrión 2, M. Canseco 1, M.B. Caminero 2, and F.J. Quiles 2 1 Dept. of Computer Science and Engineering.

More information

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks J. M. Martínez, P. López, J. Duato T. M. Pinkston Facultad de Informática SMART Interconnects Group Universidad

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY 2003 1 Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing José Flich, Member, IEEE, Pedro López, Member, IEEE Computer

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

DiffServ Architecture: Impact of scheduling on QoS

DiffServ Architecture: Impact of scheduling on QoS DiffServ Architecture: Impact of scheduling on QoS Abstract: Scheduling is one of the most important components in providing a differentiated service at the routers. Due to the varying traffic characteristics

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied

More information

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ J. Flich, P. López, M. P. Malumbres, and J. Duato Dept. of Computer Engineering

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Modelling a Video-on-Demand Service over an Interconnected LAN and ATM Networks

Modelling a Video-on-Demand Service over an Interconnected LAN and ATM Networks Modelling a Video-on-Demand Service over an Interconnected LAN and ATM Networks Kok Soon Thia and Chen Khong Tham Dept of Electrical Engineering National University of Singapore Tel: (65) 874-5095 Fax:

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J. Rooney 1 José G. Delgado-Frias 2 Douglas H. Summerville 1 1 Dept. of Electrical and Computer Engineering. 2 School of Electrical Engr. and Computer

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

A New Proposal to Fill in the InfiniBand Arbitration Tables Λ

A New Proposal to Fill in the InfiniBand Arbitration Tables Λ A New Proposal to Fill in the InfiniBand Arbitration Tables Λ F. J. Alfaro, JoséL.Sánchez ept. de Informática Escuela Politécnica Superior Universidad de Castilla-La Mancha 071- Albacete, Spain ffalfaro,

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Abstract. Paper organization

Abstract. Paper organization Allocation Approaches for Virtual Channel Flow Control Neeraj Parik, Ozen Deniz, Paul Kim, Zheng Li Department of Electrical Engineering Stanford University, CA Abstract s are one of the major resources

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

University of Castilla-La Mancha

University of Castilla-La Mancha University of Castilla-La Mancha A publication of the Computing Systems Department Implementing the Advanced Switching Fabric Discovery Process by Antonio Robles-Gomez, Aurelio Bermúdez, Rafael Casado,

More information

Performance Analysis of Storage-Based Routing for Circuit-Switched Networks [1]

Performance Analysis of Storage-Based Routing for Circuit-Switched Networks [1] Performance Analysis of Storage-Based Routing for Circuit-Switched Networks [1] Presenter: Yongcheng (Jeremy) Li PhD student, School of Electronic and Information Engineering, Soochow University, China

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Fragmenting and Interleaving Real-Time and Nonreal-Time Packets

Fragmenting and Interleaving Real-Time and Nonreal-Time Packets CHAPTER 16 Fragmenting and Interleaving Real-Time and Nonreal-Time Packets Integrating delay-sensitive real-time traffic with nonreal-time data packets on low-speed links can cause the real-time packets

More information

Asynchronous Transfer Mode (ATM) ATM concepts

Asynchronous Transfer Mode (ATM) ATM concepts Asynchronous Transfer Mode (ATM) Asynchronous Transfer Mode (ATM) is a switching technique for telecommunication networks. It uses asynchronous time-division multiplexing,[1][2] and it encodes data into

More information

FDDI-M: A SCHEME TO DOUBLE FDDI S ABILITY OF SUPPORTING SYNCHRONOUS TRAFFIC

FDDI-M: A SCHEME TO DOUBLE FDDI S ABILITY OF SUPPORTING SYNCHRONOUS TRAFFIC FDDI-M: A SCHEME TO DOUBLE FDDI S ABILITY OF SUPPORTING SYNCHRONOUS TRAFFIC Kang G. Shin Real-time Computing Laboratory EECS Department The University of Michigan Ann Arbor, Michigan 48109 &in Zheng Mitsubishi

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how

More information

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Fabrizio Petrini Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, England e-mail: fabp@comlab.ox.ac.uk

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing In-Order Packet Delivery in Interconnection Networks using Adaptive Routing J.C. Martínez, J. Flich, A. Robles, P. López, and J. Duato Dept. of Computer Engineering Universidad Politécnica de Valencia

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

Congestion Management in Lossless Interconnects: Challenges and Benefits

Congestion Management in Lossless Interconnects: Challenges and Benefits Congestion Management in Lossless Interconnects: Challenges and Benefits José Duato Technical University of Valencia (SPAIN) Conference title 1 Outline Why is congestion management required? Benefits Congestion

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Performance of a Switched Ethernet: A Case Study

Performance of a Switched Ethernet: A Case Study Performance of a Switched Ethernet: A Case Study M. Aboelaze A Elnaggar Dept. of Computer Science Dept of Electrical Engineering York University Sultan Qaboos University Toronto Ontario Alkhod 123 Canada

More information

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch *

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch * Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch * K. J. Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 3360 Abstract - Input-buffered

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Distributing Bandwidth Between Queues

Distributing Bandwidth Between Queues CHAPTER 5 Developing a queuing strategy is an important step in optimizing network functionality and services. Equally important is ensuring that bandwidth is shared fairly among the competing traffic

More information

Multicomputer distributed system LECTURE 8

Multicomputer distributed system LECTURE 8 Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in

More information

Network Control and Signalling

Network Control and Signalling Network Control and Signalling 1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 links 5.2 switches

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Journal of Electronics and Communication Engineering & Technology (JECET)

Journal of Electronics and Communication Engineering & Technology (JECET) Journal of Electronics and Communication Engineering & Technology (JECET) JECET I A E M E Journal of Electronics and Communication Engineering & Technology (JECET)ISSN ISSN 2347-4181 (Print) ISSN 2347-419X

More information

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines

More information

Network Model for Delay-Sensitive Traffic

Network Model for Delay-Sensitive Traffic Traffic Scheduling Network Model for Delay-Sensitive Traffic Source Switch Switch Destination Flow Shaper Policer (optional) Scheduler + optional shaper Policer (optional) Scheduler + optional shaper cfla.

More information

6. Parallel Volume Rendering Algorithms

6. Parallel Volume Rendering Algorithms 6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Top-Level View of Computer Organization

Top-Level View of Computer Organization Top-Level View of Computer Organization Bởi: Hoang Lan Nguyen Computer Component Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

Module 16: Distributed System Structures

Module 16: Distributed System Structures Chapter 16: Distributed System Structures Module 16: Distributed System Structures Motivation Types of Network-Based Operating Systems Network Structure Network Topology Communication Structure Communication

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Performance and Evaluation of Integrated Video Transmission and Quality of Service for internet and Satellite Communication Traffic of ATM Networks

Performance and Evaluation of Integrated Video Transmission and Quality of Service for internet and Satellite Communication Traffic of ATM Networks Performance and Evaluation of Integrated Video Transmission and Quality of Service for internet and Satellite Communication Traffic of ATM Networks P. Rajan Dr. K.L.Shanmuganathan Research Scholar Prof.

More information

Congestion Management in HPC

Congestion Management in HPC Congestion Management in HPC Interconnection Networks Pedro J. García Universidad de Castilla-La Mancha (SPAIN) Conference title 1 Outline Why may congestion become a problem? Should we care about congestion

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

The GLIMPS Terabit Packet Switching Engine

The GLIMPS Terabit Packet Switching Engine February 2002 The GLIMPS Terabit Packet Switching Engine I. Elhanany, O. Beeri Terabit Packet Switching Challenges The ever-growing demand for additional bandwidth reflects on the increasing capacity requirements

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information