Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router

Size: px

Start display at page:

Download "Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router"

Jayson Small
6 years ago
Views:

1 Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router B. Caminero C. Carrión F. J. Quiles J. Duato S. Yalamanchili Dept. of Computer Science. Escuela Politecnica Superior. Univ. de Castilla-La Mancha. 7 - Albacete, SPAIN. blanca, carmen, Dept. of Computer Engineering (DISCA). Univ. Politecnica de Valencia Valencia, SPAIN. jduato@gap.upv.es School of Electrical and Computer Engineering. Georgia Institute of Technology. Atlanta, Georgia sudha@ece.gatech.edu Abstract The primary objective of the Multimedia Router (MMR) project is the design and implementation of a compact router optimized for multimedia applications. The router is targeted for use in cluster and LAN interconnection networks, which offer different constraints and therefore differing router solutions than WANs. The goal is to provide architectural support to enable a range of Quality Of Service (QoS) guarantees at latencies comparable to stateof-the-art multiprocessor cut-through routers. One of the critical design parameters in order to provide this is the switch scheduling algorithm. The authors proposed in an earlier work an efficient crossbar arbitration scheme, the Candidate-Order Arbiter algorithm. In this paper, the performance obtained with this proposal is analyzed and compared to other well-known scheme. The results show that QoS may not be guaranteed by using a switch scheduling algorithm targeted only to maximize crossbar utilization. Moreover, simulations show that our approach is a suitable algorithm to guarantee high bandwidth utilization, up to 78%, while still providing QoS to both CBR and VBR traffic. Keywords: Quality of Service, multimedia communications, router architecture, switch scheduler Introduction The Multimedia Router (MMR) project is aimed at the design and implementation of a compact router optimized for multimedia applications. The MMR should maximize link utilization until network reaches saturation and should satisfy the QoS requirements of a large number of multimedia connections while allocating the remaining bandwidth to best-effort traffic [5]. This work was partially supported by the Spanish CICYT under Grant TIC-5-C7 At the core of the design of this Multimedia Router, there is one key element to provide QoS guarantees to the multimedia flows: the link and switch scheduling algorithm. On one hand, link scheduling is carried out by selecting a number of virtual channels per input link whose head flits have the highest priorities. On the other hand, switch scheduling performs the arbitration among the conflicting demands for the crossbar and output resources, in order to produce a conflict free matching between input and output ports. In this paper we focus on the arbitration task involved in switch scheduling, and describe our approach for addressing it [] [7] [8]. A successful arbitration scheme for the MMR must provide efficient and fair resource scheduling whereas still guaranteeing the QoS of the connections. We have proposed new algorithms based on both the output contention and the priority of the connections [3]. Nevertheless, research in switch scheduling algorithms is incomplete without an analysis of performance that compares the previous proposals with the new ones. Therefore, we have compared the performance of different alternatives proving that the switch scheduler must not only look for a maximal conflict-free matching. The rest of the paper is organized as follows. First of all, the Multimedia Router architecture is outlined, and its main characteristics are described. Second, the bandwidth allocation and the link scheduling algorithm implemented in the MMR are explained. Next, a description of different switch scheduling algorithms is presented, including the Candidate-Order Arbiter algorithm proposed by the authors. Then, simulation results are presented and discussed. Finally, in section 6 we present the concluding remarks and some future work. The Multimedia Router The general organization of the Multimedia Router is shown in Figure. A brief description of some issues re //$7. (C) IEEE

2 Physical Input Links VCM+LS VCM+LS VCM+LS VCM+LS Phit Buffers VCM - Virtual Channel Memory LS - Link Scheduler Switch Switch Scheduler Routing and Arbitration Unit Figure. MMR Architecture Phit Buffers garding its architecture are summarized in the following paragraphs. The interested reader is referred to [6] for a more detailed description of the MMR, as well as the design trade-offs. Input Buffers The MMR is able to support a large number of connections. For each of them, a virtual channel is provided, thus avoiding HOL-blocking []. Thus, a large amount of buffers will be needed. In order to optimize their implementation, the buffers are organized as modules of RAM memory, interleaved with a simple scheme (see Figure ). Control Word Decoder demux Buffer Buffer Buffer Buffer RAM RAM RAM RAM Address Generator mux From Link Scheduler Figure. Virtual Channel Memory Organization Switching Technique The MMR uses a hybrid approach, where the most suitable switching technique is used for each kind of traffic: a connection-oriented scheme (Pipelined Circuit Switching [8]) for the multimedia flows, and Virtual Cut-Through for best-effort messages [6]. For both schemes, the flow control unit has the same size and will be referred to as a flit. Flits are synchronously forwarded through the crossbar. Connection Set up Each time a connection tries to be established, the source node generates a routing probe that sets up a path from source to destination, reserving link Physical Output Links bandwidth and buffer space. The probe carries information about the requested bandwidth measured in flit cycles per round. A flit cycle is the time taken for a flit to be transmitted through the router and across the physical link. In the MMR, link bandwidth and switch port bandwidth are split into flit cycles and flit cycles are grouped into rounds or frames. The number of flit cycles in a round has been fixed as an integer multiple ( ) of the number of virtual channels per link. A CBR connection will only be accepted if the total number of flit cycles that have been allocated by all the connections using that link does not exceed the number of flit cycles in a round. A VBR connection will be set up if both a) the sum of the permanent bandwidth of all the connections using the link does not exceed the number of flit cycles in a round, and b) the total peak bandwidth requested by all the connections does not exceed the product of the number of flit cycles in a round and a concurrency factor [3]. The concurrency factor is a trade-off between the ability to make QoS guarantees, the number of connections that can be concurrently serviced, and link utilization. Flow Control The MMR has been designed to avoid data losses. In order to achieve this, per connection flow control is used. The selected scheme is credit-based flow control [] because it does not require large buffers when links are long. Note that InfiniBand [9] has also selected this flow-control method. This flow control method introduces some control overhead, which is best paid off with large flits (4 bits). Switch Organization Due to the large number of virtual channels, the MMR uses a multiplexed crossbar with as many ports as physical channels. A drawback of this organization is that arbitration is needed every time an input link changes from one virtual channel to another one. Arbitration is needed at the input side (link scheduling), to select one virtual channel from each physical channel, but it is also needed within the switch (switch scheduling), because several input channels might request the same output link. Arbitration is made concurrently with flit transmission. Large flits will also help to hide arbitration delay and amortize crossbar reconfiguration delay. The use of large flits will increase flit latency. However, this is avoided by pipelining flit transmission at the phit level. 3 Resource Scheduling Algorithm One of the problems that must be solved in the router architecture is the link and switch scheduling algorithm. This paper focuses on one specific problem, namely the ability to make effective resource scheduling decision at router switching speed. In our proposal, link and switch scheduling are partitioned into three basic decisions: candidate selection (which implements link scheduling), port ordering, and arbitration (which implement switch schedul //$7. (C) IEEE

3 ing). In this section, we detail first the behavior of our link scheduling algorithm. Next, the objectives that should be addressed by switch scheduling are outlined. Our proposal for a switch scheduling algorithm is then described with more detail in the next Section. 3. Link Scheduling Algorithm: Priority Biasing Link scheduling is carried out by selecting the virtual channels whose head flits have the highest priorities, among all the virtual channels in a physical link (, = number of ports). The key point in this scheme is that priorities are biased according to the ratio between the QoS a flit is receiving and the one it should receive. This approach combines the effect of the scheduler (measured as the delay or the jitter experienced by a flit in the queue) with the QoS requirements (measured as the bandwidth requested by the connection). The authors have proposed a link and switch scheduling algorithm, based on the concept of biased priorities [5, 7, ] that is well suited to parallelization and pipelining [3]. One of the first proposed biasing functions for the link scheduling was the Inter-Arrival Based Priority, IABP [3]. In this scheme, the priority of a flit is computed as the ratio between the queuing delay, and the Inter-Arrival Time (IAT) specified for the connection. Thus is, the priority value for each flit is related to the QoS requested by the connection, and is biased depending on the QoS received by it due to the fact that the priority grows as queuing delay grows. Moreover, priority grows faster for those flits belonging to high-bandwidth consuming connections, that is, there are more chances that they will be forwarded sooner through the switch. Results showed that this approach provides good performance for CBR traffic, differentiating the QoS received by each type of connection. The main problem with the IABP algorithm is that it is a theoretical approach not suitable for a practical implementation because of the way priorities are computed, since a division operation is needed for every virtual channel. Hardware implementations of dividers are slow and expensive, and hardly fit into our fast, compact router. So, it is necessary to devise an alternative algorithm. The idea is to apply the same rationale introduced by IABP, that is, relate the bandwidth required by the connection to the experienced queuing delay, but replacing the division with some other operation that is faster and cheaper to implement. In a second proposal [4], called SIABP (Simple-IABP), the priority value for each flit is computed in a different way. The queuing delay for a flit is stored as a counter that is increased every router cycle, that is, the queuing delay is measured as a number of router cycles. The priority biasing function is implemented as follows. The initial value for the priority is the bandwidth required by the connection. Instead of representing this by the IAT of the connection, the number of slots per frame reserved to service the average bandwidth of the connection is used. The advantage of this approach is that this magnitude is an integer value. It must be noted that computing the ratio between the queuing delay and the IAT (as IABP does) is equivalent to compute the product between the queuing delay and the bandwidth requirements. But if the product could be replaced with some shift operation, the implementation would be much simpler. So, the priority value is shifted to the left (that is, it is multiplied by two) every time the queuing delay value becomes greater than!#", that is, every time a bit in the queuing delay counter is set for the first time since it was last reset. In this way, the QoS required (represented by the initial priority value) is also related to the QoS received by the flit (the queuing delay). Thus, the SIABP function can be implemented in hardware with just a shifter and some combinatorial logic, while the IABP function needs a floating point divider. The reduction in hardware complexity and delay achieved with the SIABP algorithm was determined by using VHDL design tools. Results showed a reduction of times in terms of silicon area, and of 38 times in terms of delay, while still satisfying the QoS needs of the multimedia flows [4]. 3. Switch Scheduling Algorithms The switch scheduling challenge is to compute switch settings to establish connections between input and output ports at speeds comparable to the time to transfer a flit through the switch. Note that multimedia flows can be delayed due to conflicting demands for output ports. Hence, a careful crossbar arbitration design must be considered in order to provide the QoS requirements of the connections. That is, the choice of the switch scheduling algorithm is a critical parameter for the MMR. Recall that in the MMR, an efficient scheme must maximize the throughput of the network while guaranteeing the QoS requirements of the connections. Although the arbitration problem has been investigated by many researchers [] [4] [7], most of them have been designed for networks of multiprocessor systems and therefore, the metric applied measure their quality is different. Most of the proposals are aimed at a low cost in terms of silicon area or a maximal input-output matching, but they do not take into account either the priority of the traffic or the QoS that it is offering to each flow. Thus, in the next section we describe a specially designed switch scheduling algorithm which takes into account the QoS parameters of the connections. 4 The Candidate-Order Arbiter () As described in previous sections, candidate selection is performed during the link scheduling, that is, the flits with the highest priorities are forwarded to the switch scheduler. The switch scheduler computes a conflict-free matching among input and output ports, in a way motivated by //$7. (C) IEEE

4 Output Ports Level Candidates Level Candidates 3 3 Input Ports 3 Conflict Vector Figure 3. An example selection matrix and its corresponding conflict vector chances for parallelization and pipelining. Every input link produces a candidate vector, where the requested output port and the priority are stored for every candidate. All the candidates selected by every port are arranged into a selection matrix, which has %$& rows, and columns. The first rows store the requests made by the highest priority candidate for every input link, that is, the level one candidate requests. The next rows store the level two candidate requests, and so on. Then, a conflict vector is computed. This vector has '$( items, and stores the number of non-null entries on every row of the selection matrix. That is, the conflict vector identifies the number of conflicts that every output link has at every candidate level. Figure 3 shows an example selection matrix with its corresponding conflict vector. Next phase is port ordering. The ordering function selects output ports first by level and then in increasing order of conflict within a level. Ties are broken by randomly selecting one of the ports. The rationale is that ports with the most conflicts should be matched last since those ports have the most opportunities to be matched to an input port. The last phase is arbitration: once an output port has been selected for matching, if there are several requests for it, one of them must be chosen. The criterion to perform this is to select the candidate with the highest priority. Each time an input port and an output port are matched, all the requests involving those ports are dropped, and the selection matrix and conflict vector are recomputed. At the end of the process, a conflict-free matching between input and output ports of the switch will have been obtained. Also, at most one virtual channel from every physical link will have been selected to transmit its head flit. All the selected flits will be then forwarded synchronously through the crossbar. At the same time, the link/switch scheduling algorithm starts a new execution. See [3] for more details. In order to calibrate our approach for the switch scheduling algorithm we have analyzed it and compared to a conventional symmetric crossbar arbiter called the Wave Front Arbiter () [7]. The is based on the propagation of an arbitration wave across an array of arbitration cells, one cell per crosspoint. The wave front moves diagonally from the top left to the bottom right corner of the array. Each crosspoint of the wave front examines the request signal and will get a grant if there is no conflict. In a crosspoint, a grant is asserted to an output port iff there is a request and none crosspoint has got the grant in the same column up to it, nor in the same row on the left. That is, an input is matched to an output port if no grant has been done before for the same input and output. In [7] it is shown that is able to achieve nearly the same performance as complex theoretical schemes (statically optimal and longest-queue-first arbitrations), which cannot be implemented. And, the analysis in [4] shows that performs significantly better than other scheduling algorithms such as DSA, W, islip under uniform and non-uniform traffic pattern. Moreover, beats others scheduling algorithms such as the Parallel Iterative Matching (PIM) [] on hardware complexity. Because of these reasons, we have decided to choose for comparison with our switch scheduling proposal, the Candidate- Order Arbiter (). 5 Experimental Results Simulations have been carried out with a single MMR router configuration, with one Network Interface Card (NIC) attached to every input link. Traffic sources inject their flits into buffers located in the corresponding NIC (see Figure 4). These buffers are considered to be infinite because the host main memory can also be used if the NIC buffers become full. The MMR buffer size is limited to a few flits per virtual channel. With this configuration, accurate results regarding the delay introduced by the router, and the delay caused by the different virtual channels competing for the use of the physical link connecting to the MMR are obtained. SOURCES Network Interface Card Flit buffers (infinite) MUX + LC )*,+)-./ *.334 Physical Link Multimedia Router DEMUX... Virtual Channel Memory (small flit buffers) MUX To the crossbar port Figure 4. Conceptual representation of the NIC and MMR buffers As the buffer size in the MMR is limited, and flit loss needs to be avoided, credit-based flow control has been sim //$7. (C) IEEE

5 7 7 7 ulated between the NIC and the MMR. Note that links are quite short. Thus, the delay introduced by them when transmitting flow control information is very short, compared to the time needed to transmit a full flit. Also, note that flow control information is transmitted in a single phit (which takes a few nanoseconds), while a full flit requires many phits (which takes several hundreds of nanoseconds). The physical link controller located in the NIC only considers for forwarding to the router the flits corresponding to virtual channels which have credits available, that is, the flits that have free space in their corresponding MMR buffer. This link controller forwards the flits belonging to the different connections in a demand-driven round-robin fashion, i.e., it performs round-robin among all the connections with both flits and credits available. Simulation results suggest that this simple scheme suffices for forwarding flits to the MMR while guaranteeing QoS. The reason is that when the scheduling algorithm guarantees QoS in the router, the NIC simply adapts flit forwarding to the needs of the router, thanks to the use of small buffers and flow control. Simulations have been carried out on a 5$( router. Simulations consider 4-bit flits, and.4 Gbps 6 bit-wide links. The link/switch scheduling algorithm is implemented with four levels of candidates (see Section 3.). Tests using CBR and VBR traffic have been carried out. All the connections are considered to be active throughout all the simulation time. Their destination is chosen randomly, among the output ports of the router. The results are presented in the following subsections. 5. Evaluation with CBR traffic The first simulations have been carried out with CBR workload. The applied load is composed of a random mix of connections with low, medium and high bandwidth requirements. These bandwidth requirements are 64Kbps,.54 Mbps, 55 Mbps, respectively. Simulations have been run for 6 scheduling cycles, which is around 6 millions of router cycles. Measures of the QoS received by the connection are shown in Figure 5. In this case, the plots show the average flit latency considering both the time the flit has been waiting in the network interface and the time to go through the switch. As we can see, both switching schemes, the and the, offer similar performance for the low and medium bandwidth connections. On the other hand, results are quite different for the connections with high bandwidth requirements. Saturation is reached around 7% of link bandwidth when the scheme is used, whereas when using the algorithm saturation does not occur until 83% of offered load has been reached. This effect is due to the fact that the algorithm considers all the requests received for arbitration in a fair way. That is, the arbitration scheme does not have into account the bandwidth requirements of the connections. Then, even though the degree of crossbar matching is high, the choice of a request with low priority to go through the router before another one of higher priority has a negative impact on the overall performance. Recall that the algorithm selects output ports first by priority level of the requests. Hence the scheme allows a better bandwidth allocation of the output ports, which translates into a better quality of service for all the connections. These results must also be assessed by testing the algorithm with VBR traffic. Thus, in next section we focus our study on VBR traffic models. Average delay (microsec) Average delay (microsec) Average delay (microsec) Offered load (%) (a).64 Mbps connections Offered load (%) (b).54 Mbps connections Offered load (%) (c) 55 Mbps connections Figure 5. Average flit delay since generation for CBR traffic 5. Evaluation with VBR traffic In this subsection we compare the performance of the arbitration schemes when the traffic generation rate varies //$7. (C) IEEE

6 Table. MPEG- video sequence statistics Image Size (bits) Video Sequences Max. Min. Average Ayersroc Hook Martin Flower Garden Mobile Calendar Table Tennis Football over time for all the connections, and can cause saturation over short periods of time. The VBR traffic models used in our simulations are based on MPEG- video traces. This is a typical type of multimedia flow. The MPEG- video coding standard [9] encodes the video streams as a sequence of different frame types, I, P, and B, ordered with a predefined and repetitive pattern, called GOP (Group Of Pictures). The GOP used is IBBPBBPBBPBBPBB. I frames encode independent frames, that is, I frames do not need any other information but themselves to be decoded. P frames need the previous I frame in the sequence to be decoded, because the data they hold is related to that on the I frame. Finally, B frames need information from both previous and following P or I frames to be decoded. The bandwidth needed for each type of frame is different. I frames are the most bandwidth consuming, because they carry more information, and B frames are the least bandwidth consuming. Figure 6 illustrates the traffic pattern of a typical MPEG- video sequence. Mbits/s Flower Garden Sequence, Q= Time (miliseconds) Figure 6. Example of a typical MPEG- video sequence Every 33 milliseconds, a frame must be injected. A frame is composed of a number of flits. The number of flits that compose every frame has been extracted from real MPEG- video traces. Some data about their frame sizes are shown in Table. Two ways of injecting these flits into the NIC buffers have been considered: Back-to-Back (BB) model. In this model, all the flits are transmitted at a peak bandwidth, common to all the connections. The peak bandwidth is such that it allows the injection of the largest frame among all the connections within 33 milliseconds. Transmission of each frame starts at a frame time boundary. All the flits that compose the frame are injected at the selected peak rate, and then, the source becomes idle until the next frame time boundary. The peak bandwidth in the experiments is 5 Mbps for all the connections. Figure 7-(a) depicts this model. IAT8 stands for the Inter- Arrival Time (IAT) related to the peak bandwidth. Smooth-Rate (SR) model. In this model, the flits that compose a frame are transmitted with a different IAT for every frame, in such a way that the flits of a frame are evenly distributed within the frame time. The IAT has been computed as the ratio between 33 milliseconds and the number of flits that compose the frame. A graphical representation is shown in Figure 7-(b). Simulations have been carried out until four complete GOPs (Group Of Pictures) from every connection have been forwarded through the router. The connections sharing the same physical link have been randomly aligned, that is, they start at a random time within a GOP time. First of all, average crossbar utilization is shown in Figure 8 for both VBR models (SR and BB). Results show that performance starts to degrade around 75% of generated load when the scheme is used. However, this effect does not occur when the router uses the switch scheduling. In this case, the saturation point is reached at a generated load of around 85%. Next, the Quality of Service (QoS) received by the connections must be assessed. As the considered VBR models are MPEG- video streams, the QoS metrics are related to the performance obtained by the application data units, i.e., the frames, rather than individual flits. First of all, the average frame delay since generation is considered. Frame delay has been computed as the delay suffered by the last flit from the frame, because in this way, the measure is independent of the injection model used. Delays are measured since generation, thus computing the global time the frame spends in the network. Results are shown in Figure 9. On the left side, results for the SR injection model are shown, while the plot on the right side corresponds to the BB injection model. Notice the logarithmic scale on the vertical axis. For the SR injection model, it can be seen that when using the algorithm, frame delays are pretty low up to 78% of generated load. For the next workload point (8% load, approximately), frames experience an important increase in their average delay, although saturation has not been still reached. There are two reasons for this. First, there is more contention for the output ports within the //$7. (C) IEEE

7 9 9 IATp IATp IAT IAT frame time (33 msecs) (a) Back-to-Back (BB) injection model frame time (33 msecs) (b) Smooth-Rate (SR) injection model Figure 7. VBR injection models SR injection model BB injection model Utilization (%) Generated load (%) Utilization (%) Generated load (%) Figure 8. Average crossbar utilization for VBR traffic crossbar, and the flits take longer to be scheduled. Second, and most important, as traffic is not uniform, the router enters saturation when I frames are being transmitted, causing greater delays. Later, as the rest of frames are being transmitted, the scheduler is able to transmit all the flits with lower delays. Analyzing the algorithm we show that the saturation point is reached at around 7% of generated load, which implies a great degradation performance compared to the algorithm. On the other hand, when the BB injection model is used, average frame delays before saturation are higher than the ones obtained with the SR injection model. Anyway, saturation occurs at the same workload as with the SR injection model. Another interesting measure of the QoS received by the multimedia connections is jitter, that is, the variation in the delay experienced by two adjacent flits belonging to the same connection. As we are dealing with MPEG- video connections, and as we did previously, we will apply this parameter to the application data units, that is, the video frames. Although figures are not included in this paper, we have observed that average jitters are under 8 and microseconds for the SR and BB injection models, respectively. These are quite encouraging results, because the jitter allowed in MPEG- video transmission is around several milliseconds, that is, jitter must be low enough so that a person can see the video sequence smoothly, at a regular rate. Note that jitter values of up to several milliseconds can be absorbed at the destination by using jitter absorption techniques [5]. 6 Conclusions and Future Work We have described the design of the MultiMedia Router (MMR), which is intended to provide QoS guarantees to multimedia traffic in local area environments. The overall goal is to make a router as compact and small as possible, while still satisfying the QoS needs imposed by the multimedia flows. Key to the MMR s operation is the link and switch scheduling algorithm. Hence, in this paper particular attention has been paid to the crossbar arbitration scheme. Our proposal, the Candidate-Order Arbiter, makes two basic decisions: port ordering and arbitration. Both of these single tasks take into account the priority of the connections in order to fulfill their QoS requirements. The analysis has been carried out through simulations. Both CBR and VBR traffic types have been studied. For VBR traffic, the workload was composed of MPEG- video streams and two different injection models were tested. Simulation results show that no degradation in performance occurs until around 8% load generated when using the algorithm, while the algorithm is unable to reach this utilization level. The conclusion is that, while there have been several proposals for switch schedulers, the difficulty is in devising an algorithm that is both effective enough to deliver flits at high link speed and to provide QoS requirements of the connections. Nevertheless, we have shown that the algorithm is suitable to fulfill the QoS requirements of all the ongoing connections up to high bandwidth utilization. In order to asses the conclusions obtained, this study must be further extended to a network composed of several MMR s. Also, another related issue is the hardware cost of the algorithm. Once we have proved the algorithm //$7. (C) IEEE

8 e+6 SR injection model e+6 BB injection model Average FRAME delay (microsec) Average FRAME delay (microsec) Generated load (%) Generated load (%) Figure 9. Average frame delay since generation for VBR traffic performs better than symmetric crossbar arbiters, it is necessary to perform an analysis of its hardware complexity. References [] T.E. Anderson, et al., High Speed Switch Scheduling for Local Area Networks, Systems Research Center, April 993. [] N. J. Boden, et al., Myrinet - A gigabit per second local area network, IEEE Micro, pp. 9 36, February 995. [3] M.B. Caminero, and F.J. Quiles, J. Duato, D. Love, S. Yalamanchili, Performance Evaluation of the Multimedia Router with MPEG- Video Traffic, Proceedings on Communications and Architectural Support for Network-based Parallel Computing (CANPC 99), January 999. [4] B. Caminero, C. Carrión, F. J. Quiles, J.Duato, and S. Yalamanchili, A Cost-effective Hardware Link Scheduling Algorithm for the Multimedia Router (MMR), IEEE International Conference on Networking (ICN ), July. [5] A. Chien, J.H. Kim, Approaches to Quality of Service in High Performance Networks, Proceedings of the Workshop on Parallel Computer Routing and Communication, Lecture Notes in Computer Science, Springer-Verlag, pp.-9, June 997. [6] J. Duato, S. Yalamanchili, M.B. Caminero, D. Love, and F.J. Quiles, MMR: A high-performance multimedia router. Architecture and design trade-offs, Proceedings of the 5th Symposium on High Performance Computer Architecture (HPCA-5), pp. 3-39, January 999. [7] D. Garcia, D. Watson, ServerNet II, Proceedings of the Workshop on Parallel Computer Routing and Communication, pp. 9-36, June 996. [8] P. T. Gaughan and S. Yalamanchili, A family of faulttolerant routing protocols for direct multiprocessor networks, IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 5, pp , May 995. [9] D. Pendery, J. Eunice, InfiniBand Architecture: Bridge Over Troubled Waters, Research Note, available from the web page: [] M. J. Karol, M. G. Hluchyj and S. P. Morgan, Input versus output queuing on a space division packet switch, IEEE Transactions on Communications, December, 987. [] M. G. H. Katevenis, et al., ATLAS I: A single-chip ATM switch for NOWs, Proceedings of the Workshop on Communications and Architectural Support for Network-based Parallel Computing, February 997. [] J.H. Kim, Bandwidth and latency guarantees in low-cost, high-performance networks, Ph. D. Thesis, Department of Computer Sciences, University of Illinois at Urbana- Champaign, 997. [3] D. Love, S. Yalamanchili, J. Duato, M.B. Caminero, and F.J. Quiles, Switch Scheduling in the Multimedia Router (MMR), Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS ), May. [4] Mekkittikul, A., Scheduling Non-uniform Traffic in High Speed Packet Switches and Routers, Ph. D. Thesis, University of Stanford,998. [5] M. Perkins and P. Skelly, A hardware MPEG clock recovery experiment for variable bit rate video transmission, ATM Forum, ATM94-434, May 994. [6] P. Kermani and L. Kleinrock, Virtual Cut-through: A New Computer Communication Switching Technique, Computer Networks, Vol. 3, 979. [7] Y. Tamir, H. Chi Symmetric Crossbar Arbiters for VLSI Communication Switches, IEEE Transactions on Parallel and Distributed Systems, vol 4 No., 993. [8] K.H. Yum, A. Vaidya, C.R. Das, A. Sivasubramanian, Investigating QoS Support for Traffic Mixes with the MediaWorm Router, Proceedings of the 6th Symposium on High Performance Computer Architecture (HPCA-6), January. [9] Generic coding of moving pictures and associated audio, Recommendation H.6, Draft International Standard ISO/IEC 388-, March, //$7. (C) IEEE

University of Castilla-La Mancha

University of Castilla-La Mancha A publication of the Department of Computer Science Traffic Scheduling Solutions with QoS Support for an Input-Buffered MultiMedia Router by Blanca Caminero, Carmen Carrión,