A Low Complexity Scheduling Algorithm for a Crosspoint Buffered Switch with 100% Throughput

Size: px
Start display at page:

Download "A Low Complexity Scheduling Algorithm for a Crosspoint Buffered Switch with 100% Throughput"

Transcription

1 A Low Complexity Scheduling Algorithm for a Crosspoint Buffered Switch with 100% Throughput Yanming Shen, Shivendra S. Panwar, H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic University Abstract-Crosspoint buttered switches are emerging as the focus of research in high-speed routers. They have simpler scheduling algorithms, and achieve better performance than a bufferless crossbar switch. Crosspoint buttered switches have a buffer at each crosspoint. A cell is first delivered to a crosspoint butter, and then transferred to the output port. With a speedup of two, a crosspoint buttered switch has previously been proved to provide 100% throughput. In this paper, we propose a 100% throughput scheduling algorithm without speedup, called SQUID. With this design, each input/output keeps track of the previously served virtual output queues (VOQs)/crosspoint butters. We prove that SQUID, with a time complexity of G(log ), can achieve 100% throughput without any speedup. Our simulation results also show a delay performance comparable to outputqueued switches. We also present a novel queuing model that models crosspoint buttered switches under uniform traffic. I. ITRODUCTIO Internettraffic is growing rapidly, and this puts an increasing demand for high capacity routers. Some real-time applications, e.g., IPTV, Video-on-demand, and Voice-over-IP service have stringent delay requirements. This requires that the delays incurred in the routers should be kept small. A crosspoint buffered switch is a good solution to high performance router design, since it can provide good delay performance, and reduce the complexity of scheduling algorithms at the same time. Indeed, the delay performance is approximately the same as an output-queued switch, the minimum theoretically possible. Output queuing like performance also simplifies traffic engineering for such switches. With today's ASIC technology, crosspoint buffered switches can be implemented in a single chip. This makes crosspoint buffered switches an attractive solution compared to the traditional input-queued switch because the crosspoint buffers allow for simpler scheduling algorithms and better delay performance. Another important application for crosspoint buffered switches is that they can be used in high performance computing to interconnect thousands of processors [1]. A high throughput and low latency scalable switching fabric can improve the performance of a super computer significantly. It is therefore important to design a high throughput and low delay scheduling algorithm for crosspoint buffered switch. With a speedup of 2, the authors in [2] showed that a crosspoint buffered can provide 100% throughput. In [3], the results were extended to variable size packets. The author in [4] proved that the speedup requirement can be reduced to 2 - li. However, without speedup, previous throughput results are only limited to uniform traffic loads. Under uniform traffic, it has been shown that longest-queue-first at the input port and round-robin at the output port (LQF-RR) guaranteed 100% throughput [5]. In [6], the authors proved that a simple round-robin scheduler at both input and output ports can provide 100% throughput under uniform traffic. In [7], the authors proposed a distributed scheduling algorithm and derived a relationship between throughput and the size of crosspoint buffers. However, to achieve 100% throughput, it needed a buffer of infinite size. With state-ofthe-art technology, the total amount of buffers that can be built on chip is limited. For a switch with large number of ports, the buffer size associated with an input-output pair should be kept small. In [8], Tassiulas studied randomized algorithms that achieve 100% throughput for an input-queued switch. The approach works as follows. In each time slot, a random feasible solution to the maximum weighted matching problem is obtained. If the value of the new solution is higher than the value of the current solution, the latter is replaced. Using this approach guarantees achieving 100% throughput under the condition that the probability that the new solution is equal to the maximum weight matching is strictly greater than zero. A de-randomized version of this algorithm was proposed by Giaccone et al. [9], where a Hamiltonian walk is applied instead of randomly generating a new schedule. However, such approaches introduce a large delay. Several approaches [9], [10] have been proposed for input-queued switch to reduce the delay. This technique can be applied to a crosspoint buffered switch to achieve 100% throughput as well. With buffers at crosspoints, we can reduce the complexity significantly, and at the same time achieve a much better delay performance than input-queued switches. In [11], we presented a 100% throughput scheduling algorithm, SQUISH (Stable QUeue Input-output Scheduler with Hamiltonian walk). With SQUISH, a crosspoint buffered switch can achieve 100% throughput without speedup and a finite crosspoint buffer size. In this paper, we propose another 100% throughput scheduler for crosspoint buffered switches, called SQUID (Stable Queuing/Implementable Design). We again assume a crosspoint buffered switch with finite crosspoint buffers and no speedup. We consider a stochastic packet arrival process in which the rate corresponds to an admissible, but unknown traffic matrix. As discussed in Section III-C, compared with /08/$ IEEE 189

2 VOQs Inpu_ti--+--_~ Input i Outputj Input-queued switch Outputj Crosspoint buffered switch Fig. 3: Emulating an input-queued switch. Fig. 1: Crosspoint buffered switch. Fig. 2: The scheduling phases for crosspoint buffered switch. SQUISH, SQUID can reduce message exchange significantly. This makes SQUID an implementable and scalable design. The rest of the paper is organized as follows. In section II, we briefly describe the crosspoint buffered switch architecture. In section III, we propose the scheduling algorithm and prove its stability. In section IV, we present delay results derived from an analytical model of the switch under uniform traffic. In Section V, we present simulation results. Finally, section VI concludes the paper. II. THE CROSSPOIT BUFFERED SWITCH MODEL Figure 1 shows an x crosspoint buffered switch. We shall assume fixed size packet (cell) switching. Variable size packet switching can be implemented by introducing packet segmentation and reassembly. Each input maintains VOQs, one for each output. Let Qij (n) denote the queue length of VOQij at time n, n == 0,1,2,... Each crosspoint contains a finite buffer of size B. The buffer between input i and output j is denoted as CBij; Bij (n) is the buffer occupancy of CB ij at time n, Bij (n) :s B. The key role of crosspoint buffers is to separate input contention from output contention. This results in a twostage scheduling scheme. As shown in Figure 2, in the input scheduling phase, each input determines from which VOQ a cell should be transferred to the corresponding crosspoint buffer with available space. In the output scheduling phase, each output determines from which non-empty crosspoint buffer to serve a cell. When a crosspoint buffer is full, no more cells can be transferred to it. ote that if the crosspoint buffer size is unlimited, the crosspoint buffered switch is equivalent an to output queuing switch and input schedulers are not necessary, since packets can directly go to crosspoint buffers without buffering at inputs. For a practical singlechip implementation using current technology, however, the crosspoint buffers are constrained to a small number. Let Aij (n) be the number of packets that have arrived at input i destined for output j up to time slot n. Assume the arrival process {Aij (n), i, j == 1,..., } satisfies the strong law oflarge numbers (SLL), Le., with probability one,, Aij(n) n 1'\" \';J 'l,] == 1,...,. 11 m == o n~oo Definition 1: An arrival process is said to be admissible if L Aij :S 1, L Aij :S 1. (2) j Let Dij (n) be the number of departures from crosspoint buffer CBij up to time slot n. Definition 2: A switch operating under a matching algorithm is rate stable if, with probability one, 11 m Dij(n) == \';Jo, 1 n n~oo 1'\" 'l,] ==,..., for any arrival process satisfying condition (1). Let Xij (n) be the total number ofcells in VOQij and CBij at time n, Xij(n) == Qij(n) + Bij(n), X(n) == [Xij(n)]. Let S(n) == [SI(n); SO(n)] be the schedule at time n. SI(n) == [SL (n)] is the input schedule and is subject to the following constraints: L Sb(n) :S 1, S{j(n) == 0 if Bij (n) == B; (4) j S O (n) == [S8 (n)] is the output schedule and is subject to the following constraints: L Sg(n) :S 1, Sg(n) == 0 if Bij(n) == O. (5) The set of all possible schedules is denoted by II. For each schedule S E II, we define the weight for output j as Wj == Xij if j serves crosspoint buffer CBij. The total weight Ws(n) of a schedule is Ws(n) == L Wj == (So(n), X(n)), j=l where for two matrices A and B of the same size, (A, B) == Lij AijBij. ote that although the weight is calculated with the output schedule only, in fact, the input schedule comes into the weight calculation implicitly. Indeed, from Figure 2, we can see that the output schedule is performed after the input schedule. A valid output schedule is determined by the state of crosspoint buffers. This takes place after the input scheduling phase, when the state of crosspoint buffers is updated. For crosspoint buffered switches, we have the following property as compared to input-queued switches: (1) (3) 190

3 Property 1: A crosspoint buffered switch can emulate any scheduling scheme applied to a bufferless input-queued switch. As shown in Fig. 3, for an input-queued switch, at each time slot, a matching is generated, where an input i is matched to an output j, and a cell from input i is delivered to output j. For a crosspoint buffered switch, a cell from input i can also be delivered to output j. If the crosspoint buffer is empty, then input i needs to send a cell to crosspoint buffer CB ij, and then output j removes that cell from the crosspoint buffer. If the crosspoint buffer is full, then output j can remove a cell from the crosspoint buffer while input i stays idle. Based on this observation, for a crosspoint buffered switch, we define the Input-Queued Switch Maximum-weightmatching Emulation (IQSME) schedule S* as the solution of the following optimization problem: max (S(n), X(n)) S s.t. L Sij :s; 1,L Sij :s; 1. j=l S;j == 1 means (i) if B ij < B, then input i transfers a cell to crosspoint buffer CB ij if V OQij is not empty and output j serves a cell from crosspoint buffer CB ij ; (ii) if B ij == B, then input i stays idle and output j serves a cell from crosspoint buffer CB ij. S* is essentially an emulation of the maximum weight matching schedule in an input-queued switch with no crosspoint buffers, where if input i sends a cell to crosspoint buffer CB ij, then output j must serve a cell from the crosspoint buffer CB ij. Define the weight W* of S* as W*(n) == (S*(n),X(n)). We will now introduce the following lemma from [11], which will be used to prove the stability of our algorithm: Lemma 1: For any admissible traffic satisfying (1), if the weight of a scheduling algorithm at each time slot is within a bounded constant value C from the maximum weight, Le., denote the queue length of such a VOQ for input i. If there is no such VOQ, then Pi == O. Similarly, each output keeps track of the crosspoint buffer served in the previous time slot. At each time slot, a Hamiltonian walk schedule [9] is generated. Let hi be the queue length of the VOQ corresponding to the Hamiltonian walk at input i. Then the scheduling algorithm is as follows: 1) Each input calculates the difference of d i == Pi - hi, and sends this information to the scheduler. 2) The scheduler calculates the sum of all the differences. It generates a '1' if the sum is positive (or zero) and '0' if the sum is negative. Then it sends back the one-bit information to all inputs/outputs. 3) Each input/output makes decisions based on this onebit information, Le., if a '1' is received, then it stays with the previous VOQ/crosspoint buffered served, and switches to Hamiltonian walk otherwise. 4) For each input, if the selected VOQ cannot be served either because it is empty or the corresponding crosspoint buffer is full, then it serves an eligible VOQ in a round-robin order. 5) For each output, if the selected crosspoint buffer is empty, then it serves a non-empty crosspoint buffer in a round-robin order. We now proceed to prove the stability of SQUID algorithm. To this end, we prove the following Lemma first. Lemma 2: Let S(n) be the schedule used at time nand H(n + 1) be the schedule generated by the Hamiltonian walk at time n + 1. If the schedule S(n + 1) at time slot n + 1 satisfies WS(n+l)(n+l) 2:: max(ws(n) (n+l), WH(n+l)(n+l))-C1, where C 1 is a constant, then this scheduling algorithm is rate stable. proof: From Lemma 1, to prove that the algorithm is rate stable, it suffices to show that Ws(n) 2:: W*(n) - C, (6) Ws(n) 2:: W*(n) - C, then this algorithm is rate stable. The proof of this Lemma can be found in [11]. III. THE STABLE QUEUIG/IMPLEMETABLE DESIG SCHEDULER FOR CROSSBAR BUFFERED SWITCHES In this section, we present the Stable QUeuing Implementable Design (SQUID) scheduler for crossbar buffered switches. A. Crosspoint buffer size ofone We first present the SQUID algorithm for crosspoint buffered switches with a crosspoint buffer size of one. The results for a crosspoint buffer sizes greater than one will be discussed in Sec. III-B. With SQUID, each input keeps track of the VOQ which satisfies the following two conditions: (i) this VOQ was served in the last time slot; and (ii) the corresponding crosspoint buffer is not full, Le., it was served in the last time slot. Let Pi where C is a bounded constant. Since there is at most one arrival or departure from each X ij in each time slot, we obtain that for any schedule P Wp(n) 2:: Wp(n + k) - k. (7) Consider any given time T, let S* denote the IQSME at time T. By the property of Hamiltonian walk, there exists a n' E [T -!, T], such that the Hamiltonian walk at n' is the IQSME at T, Le., H(n') == S*. Thus we have WS(nl) (n') > WH(n ' )(n') - C 1 Ws* (n') - C 1 > Ws* (T) - (T - n') - C 1, (8) where the last inequality is from (7). Again, from the definition of the algorithm and (7), WS(T)(T) 2:: WS(T-l)(T)-C1 2:: W S (T-l)(T-l)--C1. 191

4 Using this repeatedly, we obtain WS(T)(T) 2: WS(n l ) (n') - (T - n') - (T - n')c1. (9) Combining (8) and (9), we have WS(T) (T) 2: Ws* (T) - 2(T - n') - (T - n')c1 - C1. Since T - n' ~!, this implies and the above is true for any T. From Lemma 1, the scheduling algorithm is therefore rate stable. Lemma 3: Let S(n) and S(n+1) be the schedules used by SQUID at time nand n + 1 respectively, then WS(n+1)(n + 1) 2: WS(n) (n + 1). proof' Case 1: L:~1 d i 2: 0: From the SQUID algorithm, we know that if L~l d i 2: 0, then an output will switch to a different crosspoint buffer only if the previously served crosspoint buffer is empty. When an output switches to a different crosspoint buffer if the selected crosspoint buffer is empty, the total weight will increase, or remain the same if all crosspoint buffers are empty. Therefore, we have WS(n+1)(n + 1) 2:: WS(n)(n + 1). Case 2: L:~1 di < 0: If L:~1 di < 0, we have, Lhi > LPi, and inputs/outputs will switch to the Hamiltonian walk schedule. The weight of the schedule S(n +1) at time n +1 satisfies the following inequality, WS(n+1) (n + 1) 2: L hi, since both inputs and outputs will select non-empty VOQs/crosspoint buffers in a round-robin order if the selected VOQs/crosspoint buffers are empty. From the schedule S(n) at time n, we can divide the outputs into two categories: J1: j E J1 if j serves crosspoint buffer CB ij and input i sends a cell to crosspoint buffer CB ij ; J 2 : j E J 2 if j serves crosspoint buffer CB ij, but input i does not send a cell to crosspoint buffer CB ij. When S(n) is applied at time n + 1, if j E J1, then Wj == Qij == Pi If j E J 2, then Wj == 0, since at the beginning of time slot n +1, the crosspoint buffer CB ij is empty, and input i does not send a cell to crosspoint buffer CB ij in time slot n + 1. Therefore, we have. WS(n) (n + 1) == LPi. From the definition of SQUID, WS(n+1)(n + 1) 2: Lhi > LPi == WS(n)(n + 1). B. Crosspoint buffer size more than one When the crosspoint buffer size is more than one, as compared to the case of size one, the only difference is how each input obtains Pi. Let Pi denote the set of VOQs for input i for which (i) both the VOQ and the corresponding crosspoint buffer was served in the last time slot, or (ii) the corresponding crosspoint buffer was served in the last time slot and it is not empty in the current time slot. There may be multiple such VOQs for a particular input port. Let Pi denote the sum of queue lengths for such VOQs for input i, Pi == L:jEPi Qij. If there is no such VOQ, then Pi == O. ow we proceed to prove the stability of SQUID with a crosspoint buffer size greater than one. First, we prove the following Lemma. Lemma 4: Let S(n) and S(n + 1) be the schedules used at time nand n + 1 respectively. Then WS(n+1)(n + 1) 2: WS(n)(n + 1) - B. proof' If L:~1 d i 2: 0, as in the proof of Lemma 3, the total weight will increase. Therefore, we have WS(n+1)(n + 1) 2: WS(n)(n + 1). If L:~1 d i < 0, then inputs/outputs will switch to the Hamiltonian walk schedule, and Let ji denote the corresponding input when output j serves crosspoint buffer CB jij (B jij > 0). If we apply the schedule S(n) at time slot n + 1, then The total weight is Wj WS(n)(n + 1) Qjij + Bjij Pji + Bjij < Pji + B. (10) (11) where the second inequality is from (11). Theorem 1: The SQUID algorithm is rate stable for any admissible arrival process satisfying the strong law of large numbers. The proof of this theorem is a straightforward extension from Lemmas 3 and 4 corresponding the cases of B == 1 and B > 1, respectively. Lhi > LPi. From the definition of SQUID, L Wj j=l < LPji +B. j=l WS(n+1)(n+ 1) > Lhi > LPi > WS(n)(n + 1) - B. 192

5 c. Implementation In our design, each slot has two message exchanges: (1) Each input sends the queue length difference information to the scheduler; and (2) the scheduler sends one bit information to all inputs/outputs. Let d max denote the maximum queue length difference among all input ports, then totally up to logd max + 1 (sign) bits need to be sent to the scheduler from each input port. The number of bits sent by the scheduler to inputs/outputs is 1. Therefore, the total bits exchanged with each line card in each time slot is at most 2 + logd max. With SQUISH [11], each input needs to send VOQ lengths to output ports in order to pick the longest during the output scheduling phase. This implies up to logd max bits needs to be sent by each input. We can see that SQUID reduces the communication overhead significantly. The scheduler only needs to calculate the sum of variables, which can be implemented with a time complexity of 10g. The Hamiltonian walk is deterministic and has a complexity of0(1). The time complexity for each input/output storing the identity of the last served queue and implementing a round-robin schedule is also 0(1). Therefore, the total time complexity of the algorithm is O(log). When the delay between linecards and the scheduler is not negligible, as is typical for a large switch [12], we can apply pipelining to address this issue. With pipelining, the scheduler can take more than one one time slot to collect the information and send out the scheduling decision. The pipelined SQUID will also provide 100% throughput as in the case of SQUISH [11]. IV. DELAY AALYSIS In this section, we analyze the average delay in the crosspoint buffered switch. The other analytical model for a crosspoint buffered switch was presented in [13]. Their model is used to obtain a saturation throughput under FIFO service without input buffering. Throughout the analysis, we consider queue states to be at the beginning of each time slot (before packet arrivals). We assume the VOQs are evenly loaded with a rate of ~. We make the simplifying assumption that, at each input port, among all the eligible VOQs, the input scheduler randomly picks a VOQ to serve. This models scheduling algorithms that are stable under uniform traffic. We first look at all the crosspoint buffers for a particular output. We consider the case when B == 1. Let a denote the number of packets in these crosspoint buffers, and let P be the arrival probability from the corresponding VOQ when a crosspoint buffer is empty. Then we have the state transition diagram for the crosspoint buffers as shown in Figure 4. For clarity, we only show transitions for the zeroth state to higher states (i.e., right transitions). Let Si denote the probability of being in state i. From flow conservation, the average arrival rate to these crosspoint buffers should be A, therefore -I A == E ( - i)psi. i=o Fig. 4: Markov chain for crosspoint buffers. )j(l-q) ~(1-)j)q Then we have )j(l-q) (1-)j)q 1-p Fig. 5: Markov chain for a VOQ. P )j(l-q) (1-)j)q )j(l-q)... ~ ~-I ~-I" L..,;i=O Si - L..,;i=1 ~Si A -E(a)" (1-)j)q Then we can solve P iteratively by picking an initial value for P, solving the Markov chain to get E(a) to generate the next value for p. ext, let us look at a particular non-empty VOQ destined to this output. Let q denote the probability that this non-empty VOQ is served in the input scheduling phase, which means that (i) its corresponding crosspoint buffer has space, and (ii) it is selected for service by the input scheduler. Then we have a Markov chain for this VOQ as shown in Figure 5. The probability PI that this VOQ is non-empty is simply A(1 - q) ( - A)q" (12) We define Po to be the probability that a crosspoint buffer is empty, then - E(a) Po == " (13) We also define the conditional probability Ps that a VOQ is selected for service when it is not empty and its corresponding crosspoint buffer has space ps == P(VOQ selectedlvoq non-empty, crosspoint buffer empty)" From these definitions, we can see that - E(a) q == POPs == PS" (14) Among the remaining VOQs, the expected number that are not empty and have the corresponding crosspoint buffers that are empty (which means those VOQs are eligible for service) can be estimated by ( -1)PIPO. Then, using our assumption that eligible VOQs are randomly picked for service, 1 1 Ps = 1 + ( - l)pipo 1 + >'(l-q)(-l)(-e(a))' (15) q(->") By solving equations (14) and (15), we can get _ 2 (1 - A)( - E(a)) q - 3(1 -,x) +,x( - l)e(a)' (16) 193

6 -to-=32, simulation ----&-=32, model =64, simulation -&- =64, model >. - =128, model ct3 lo' "'0 0) ct3 0>10" > «-+-SQUID 50 0 SQUISH ~-output-queu rl 40 "0 ~30 H Blue: SQUID Green: SQUISH ~20 Red: output-queued d ol-- ' """'====-===:ti~~===l------' j o O. ~ O. B 0.9 Fig. 6: Average delay by simulation and Markov model. Fig. 7: Average delay under uniform Bernoulli traffic, == 32, crosspoint buffer size B == 1. Since E(a) is known from the solution of the Markov chain for the crosspoint buffers, we can then get the average delay by solving the VOQ Markov chain. Figure 6 shows the average delay performance obtained by simulations using the SQUID scheduler and the model respectively. We can see that when the loading is not extremely high, then the model matches the simulation results very well. We also plot the average delay obtained from the model for == 128. Under extreme loading (A == 0.99), as the switch size increases, the model becomes more accurate. The Markov model can therefore be used to evaluate delays and buffer occupancies for large switches where simulations take too long w=o. 75, SQUID o w=o. 75, SQUISH 100 -w=q. 75, output-qu ued ~ -e-w=o.5, SQUID rl w=o.5, SQUISH ~ 80 ~ w=o.5, output-que ed ---w=o.25, SQUID 01 &0 w=o. 25, SQUISH ~ --?-w=o. 25, output-qu ued ~ 40 Blue: SQUID Green: SQUISH 20 Red: output-queued Fig. 8: Average delay under hotspot traffic with different values of w, == 32, B == 1. V. SIMULATIO In this section, we study the delay performance of the SQUID scheduling algorithm via simulations. We show a comparison on the delay performance between SQUID and an output-queued switch, which has the best possible performance. The traffic patterns studied in our simulations are uniform and non-uniform traffic with Bernoulli and bursty arrivals. In our simulations, all inputs are equally loaded on a normalized scale A E (0, 1), and we measure the fixed length packet (cell) delay. In our switch model, we do not consider segmentation and re-assembly delays. A. Uniform Bernoulli traffic For uniform traffic, the destination of a new cell is uniformly distributed among all the output ports, i.e., Aii == AI. Figure 7 shows the delay performance under uniform Bernoulli traffic. We can see that SQUID has delay performance very close to an output-queued switch. As compared to SQUISH, if the load is not close to 1, the average delays are almost the same. However, unlike SQUISH, SQUID does not need to choose the longest queue to guarantee stability. Indeed, even a simple round-robin scheduler has a delay performance close to an output queued switch [6]. However, if the traffic is nonuniform, the round-robin scheduler fails to achieve 100% throughput. We therefore study the delay performance of SQUID under nonuniform traffic next. B. onuniform traffic For the nonuniform traffic, we first try hot-spot [6] traffic. With hot-spot traffic, for any input port, Aii == 11)A, Aii == (1 - w)ai( - 1), for i i- j. By changing the factor w, we can get different nonuniform traffic patterns. Figure 8 shows the delay performance for Bernoulli traffic with W == 0.25,0.5 and We can see that for different w, when the load is less than 0.9, the average delay of SQUID is very close to an output-queued switch. When W becomes larger, which means the traffic is more unbalanced, the average delay of SQUID will increase when the load is close to 1. The reason is that SQUID selects queues using a round-robin schedule rather than queue length, thus it is not as efficient as SQUISH [11] under unbalanced traffic. Though the average delay is higher under extremely high load, SQUID can achieve 100% throughput. ote that under hotspot traffic, the roundrobin scheduler has a throughput around 85% [6] and the SBF LBF [14] has a throughput around 87%. ext, we test the delay performance of SQUID under Logdiagonal and Lin-diagonal [15] traffic patterns. With Logdiagonal, arrival rates at the same input differ exponentially, Le., Ai(i+i) == 2Ai(i+i+l)' where 0 :s; j :s; - 2. With Lindiagonal, arrival rates at the same input differ linearly, Le., Ai(i+i) - Ai(i+i+l) == 2AI( + 1), where 0 :s; j :s; - 2. Figure 9 shows the delay performance for log-diagonal 194

7 -+-Log-diagonal, SQUID Blue: SQUID o Log-diagonal, SQUISH Green: SQUISH >t':'() -Log-diagonal, output-q leued Red: output-que d rtl - e - Lin-diagonal, SQUID " ~ Lin-d~agonal, SQUISH " "d.:. o -+-Lln-dlagonal, output-q eued I 0) tj1 rtl H 0)':'(' ~ Fig. 9: Average delay under lin-diagonal and log-diagonal traffic, == 32, B == 1. Fig. 11: Average delay under hotspot traffic with different crosspoint buffer sizes, == 32, w == 0.5. and lin-diagonal traffic. We can see that the average delay performance for both traffic patterns is close to an outputqueued switch unless it is under an extremely high load. Log-diagonal traffic is more unbalanced as compared to lindiagonal traffic, thus the average delay under log-diagonal traffic is higher than the average delay under lin-diagonal traffic. c. Switch size effect Generally, for input-queued switches, the average delay increases linearly with the switch size. For output-queued switches, the delay is independent of the switch size. In this simulation, we test the delay performance with different switch sizes. Figure 10 shows the average delay with different switch sizes. For a given switch size, we simulated uniform Bernoulli, and nonuniform Bernoulli (w == 0.25 and 0.75) traffic. We can see that the average delays are almost the same for different switch sizes. ote that for a crosspoint buffered switch, even if the crosspoint buffer size is one, when the switch size increases, the total crosspoint buffer size increases. For a particular output, the amount of crosspoint buffers associated with it is (if the crosspoint buffer size is one). From simulations, we found that when the switch size increases, the average delay at the input ports will decrease. On the other hand, the average delay at the output will increase since there are more crosspoint buffers. These appear to balance out and therefore, the total delay a packet experiences at both input port and crosspoint buffer does not increase with the switch size. D. Buffer size effect When the crosspoint buffer size is infinite, a crosspoint buffered switch would become equivalent to an output-queued switch. Therefore, if we increase the crosspoint buffer size, the average delay will decrease, and eventually converge to the average delay of an output-queued switch. In this simulation, we test the average delay performance for hotspot traffic (w == 0.5) with different crosspoint buffer sizes. Figure 11 shows the average delay under hotspot traffic with different crosspoint buffer sizes. We can see as the crosspoint -+-a=l. 7, SQUID ---e--a=l. 7, ou:put-quea ':'0' -a=1.9, SQUID ~ -a=1.9, ou:put-quea,...-i -+-Bernoulli, SQUID ~ Bernoulli, output- Fig. 12: Average delay under uniform bursty traffic with different burst parameters, == 32, B == 1. buffer size increases, the average delay of SQUID gradually decreases, closer to that of an output-queued switch. E. Bursty traffic We have proved that SQUID can achieve 100% throughput for any admissible input traffic satisfying the strong law of large numbers, and the stability results are not limited to Bernoulli arrivals. In this section, we test the delay performance of SQUID under bursty traffic. For bursty traffic, all the cells in the same burst go to the same destination. The burst lengths are chosen independently from the following truncated Pareto distribution [16]: c P(A burst has a length l) == la' 1 == 1,..., 10000, where a is the Pareto parameter and c is the normalization constant; we vary a to get different average burst lengths. In our simulations, we try two different values for a, a == 1.9 and a == 1.7. When a == 1.9, the average burst length is about 9. When n == 1.7, the average burst length is about 24. Figure 12 shows the delay performance of SQUID under uniform bursty traffic as compared to output-queued switch. Generally, the average delay will increase as the average burst length increases (smaller 0). But the delays in either case are close to that of an output-queued switch. 195

8 ~" ~'---~'-----'-"='="':'=---'--J,I ~"rj-~---:'==~ ro t'--.::.:...::...:..=..=c::.-:...::..::~ '0 r~ '0 r~ '0'0 tj1 Cll H IO- ~ o OJ 34 D.':> 0," 3' 0.8 :J :Jl 0.4 (a) Uniform (b) Hotspot, W = 0.25 (c) Hotspot, W = 0.75 Fig. 10: Average delay with different switch sizes under Bernoulli traffic, B == 1. VI. COCLUSIO In this paper, we have proposed a Stable QUeuing Implementable Design (SQUID) scheduler for crosspoint buffered switches with a crosspoint buffer size as small as one and without speedup. The complexity of SQUID is O(log ). With SQUID, a crosspoint buffered switch can achieve 100% throughput under any admissible input traffic satisfying the strong law of large numbers. Our simulation studies indicate that for most realistic traffic scenarios the delay performance of SQUID is close to that of an ideal output-queued switch. Only under highly loaded non-uniform traffic is the delay significantly greater than an output-queued switch. The average delay does not increase with the switch size, as has been observed for input-queued switches. This means that SQUID has scalable delay performance and is suitable for a large-scale switch. This paper also presents a novel queuing model for crosspoint buffered switches under uniform load. This model can be used to estimate the performance of switches too large to simulate in a reasonable time. With SQUID, we need a low complexity but centralized scheduler. Most previous research on crosspoint buffered switches focused on designing distributed scheduling algorithms, where inputs and outputs can make scheduling decisions independently. However, such algorithms cannot achieve 100% throughput. And an interesting open problem is to design a distributed scheduling algorithm that provides 100% throughput. [6] R. Rojas-Cessa, E. Oki, Z. Jing, and H. 1. Chao, "CIXB-I: Combined Input-One-Cell-Crosspoint Buffered Switch," in Proceedings of IEEE Workshop ofhigh Performance Switches and Routers 2001, Dallas, TX, May [7] P. Giaccone, E. Leonardi, and D. Shah, "On the maximal throughput of networks with finite buffers and its application to buffered crossbars," in Proceedings of IEEE Infocom, Miami, FL, March [8] L. Tassiulas, "Linear complexity algorithms for maximum throughput in radio networks and input queued switches," in Proceedings of IEEE IFOCOM 1998, ew York, 1998, vol. 2, pp [9J P. Giaccone, B. Prabhakar, and D. Shah, "Toward simple, highperformance schedulers for high-aggregate bandwidth switches," in Proceedings of IEEE IFOCOM, ew York, [10] Y. Li, S. S. Panwar, and H.J. Chao, "Exhaustive service matching algorithms for input queued switches," in Proceedings ofieee Workshop on High Performance Switching and Routing, [11] Y. Shen, S. S. Panwar, and H. J. Chao, "Providing 100% throughput in a buffered crossbar switch," in Proceedings ofieee Workshop on High Performance Switching and Routing, May [12]. Chrysos and M. Katevenis, "Crossbars with minimally-sized crosspoint buffers," in Proc. IEEE Workshop on High Performance Switching and Routing (HPSR 2(07), Brooklyn, Y, USA, 30 May - 1 June [13] Mingjie Lin and ick McKeown, "The throughput ofa buffered crossbar switch," IEEE Communications Letters [14] L. Mhamdi and M. Hamdi, "MCBF: a high-performance scheduling algorithm for buffered crossbar switches," IEEE Communications Letters, vol. 7, no. 9, pp , Sept [15] A. Baranowska, G. Danilewicz, W. Kabacinski, J. Kleban, D. Parniewiez, and P. Dabrowski, "Performance evaluation of the multiple output queueing switch under different traffic patterns," in Proceedings of IEEE GLOBECOM 2005, Dec [16] C.S. Chang, D. Lee, and Y. Jou, " balanced Birkhoff-von eumann switches, part I: one-stage buffering," Computer Communications, vol. 25, pp , REFERECES [1] Y. P. Zhang, T. Jeong, F. Chen, H. Wu, R. itzsche, and G. R. Gao, "A study of the on-chip interconnection network for the ibm cyclops64 multi-core architecture," in Parallel and Distributed Processing Symposium, April [2] S-T. Chuang, S. Iyer, and. McKeown, "Practical algorithms for performance guarantees in buffered crossbars," in Proceedings ofieee Infocom, Miami FL, March [3] J. Turner, "Strong performance guarantees for asynchronous crossbar schedulers," in Proceedings of IEEE Infocom, Spain, April [4] M. Berger, "Delivering 100% throughput in a buffered crossbar with round robin scheduling," in Proceedings of IEEE High Performance Switch and Routing, Poznan, Poland, [5] T. Javidi, R. Magill, and T. Hrabik, "A high throughput scheduling algorithm for a buffered crossbar switch fabric," in Proceedings ofieee International Conference on Communications,

Efficient Queuing Architecture for a Buffered Crossbar Switch

Efficient Queuing Architecture for a Buffered Crossbar Switch Proceedings of the 11th WSEAS International Conference on COMMUNICATIONS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 95 Efficient Queuing Architecture for a Buffered Crossbar Switch MICHAEL

More information

Matching Schemes with Captured-Frame Eligibility for Input-Queued Packet Switches

Matching Schemes with Captured-Frame Eligibility for Input-Queued Packet Switches Matching Schemes with Captured-Frame Eligibility for -Queued Packet Switches Roberto Rojas-Cessa and Chuan-bi Lin Abstract Virtual output queues (VOQs) are widely used by input-queued (IQ) switches to

More information

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services Shared-Memory Combined -Crosspoint Buffered Packet Switch for Differentiated Services Ziqian Dong and Roberto Rojas-Cessa Department of Electrical and Computer Engineering New Jersey Institute of Technology

More information

Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch

Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch Ziqian Dong and Roberto Rojas-Cessa Department of Electrical and Computer Engineering New Jersey Institute of Technology

More information

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services Shared-Memory Combined -Crosspoint Buffered Packet Switch for Differentiated Services Ziqian Dong and Roberto Rojas-Cessa Department of Electrical and Computer Engineering New Jersey Institute of Technology

More information

Buffered Crossbar based Parallel Packet Switch

Buffered Crossbar based Parallel Packet Switch Buffered Crossbar based Parallel Packet Switch Zhuo Sun, Masoumeh Karimi, Deng Pan, Zhenyu Yang and Niki Pissinou Florida International University Email: {zsun3,mkari1, pand@fiu.edu, yangz@cis.fiu.edu,

More information

Int. J. Advanced Networking and Applications 1194 Volume: 03; Issue: 03; Pages: (2011)

Int. J. Advanced Networking and Applications 1194 Volume: 03; Issue: 03; Pages: (2011) Int J Advanced Networking and Applications 1194 ISA-Independent Scheduling Algorithm for Buffered Crossbar Switch Dr Kannan Balasubramanian Department of Computer Science, Mepco Schlenk Engineering College,

More information

The Concurrent Matching Switch Architecture

The Concurrent Matching Switch Architecture The Concurrent Matching Switch Architecture Bill Lin Isaac Keslassy University of California, San Diego, La Jolla, CA 9093 0407. Email: billlin@ece.ucsd.edu Technion Israel Institute of Technology, Haifa

More information

Scalable Schedulers for High-Performance Switches

Scalable Schedulers for High-Performance Switches Scalable Schedulers for High-Performance Switches Chuanjun Li and S Q Zheng Mei Yang Department of Computer Science Department of Computer Science University of Texas at Dallas Columbus State University

More information

K-Selector-Based Dispatching Algorithm for Clos-Network Switches

K-Selector-Based Dispatching Algorithm for Clos-Network Switches K-Selector-Based Dispatching Algorithm for Clos-Network Switches Mei Yang, Mayauna McCullough, Yingtao Jiang, and Jun Zheng Department of Electrical and Computer Engineering, University of Nevada Las Vegas,

More information

Globecom. IEEE Conference and Exhibition. Copyright IEEE.

Globecom. IEEE Conference and Exhibition. Copyright IEEE. Title FTMS: an efficient multicast scheduling algorithm for feedbackbased two-stage switch Author(s) He, C; Hu, B; Yeung, LK Citation The 2012 IEEE Global Communications Conference (GLOBECOM 2012), Anaheim,

More information

Efficient Multicast Support in Buffered Crossbars using Networks on Chip

Efficient Multicast Support in Buffered Crossbars using Networks on Chip Efficient Multicast Support in Buffered Crossbars using etworks on Chip Iria Varela Senin Lotfi Mhamdi Kees Goossens, Computer Engineering, Delft University of Technology, Delft, The etherlands XP Semiconductors,

More information

Throughput Analysis of Shared-Memory Crosspoint. Buffered Packet Switches

Throughput Analysis of Shared-Memory Crosspoint. Buffered Packet Switches Throughput Analysis of Shared-Memory Crosspoint Buffered Packet Switches Ziqian Dong and Roberto Rojas-Cessa Abstract This paper presents a theoretical throughput analysis of two buffered-crossbar switches,

More information

Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches

Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches Roberto Rojas-Cessa, Member, IEEE, Eiji Oki, Member, IEEE, and H. Jonathan Chao, Fellow, IEEE Abstract The scalability

More information

CFSB: A Load Balanced Switch Architecture with O (1) Complexity

CFSB: A Load Balanced Switch Architecture with O (1) Complexity 200 3rd International Conference on Computer and Electrical Engineering (ICCEE 200) IPCSIT vol. 53 (202) (202) IACSIT Press, Singapore DOI: 0.7763/IPCSIT.202.V53.No..02 CFSB: A Load Balanced Switch Architecture

More information

Using Traffic Models in Switch Scheduling

Using Traffic Models in Switch Scheduling I. Background Using Traffic Models in Switch Scheduling Hammad M. Saleem, Imran Q. Sayed {hsaleem, iqsayed}@stanford.edu Conventional scheduling algorithms use only the current virtual output queue (VOQ)

More information

F cepted as an approach to achieve high switching efficiency

F cepted as an approach to achieve high switching efficiency The Dual Round Robin Matching Switch with Exhaustive Service Yihan Li, Shivendra Panwar, H. Jonathan Chao AbsrmcrVirtual Output Queuing is widely used by fixedlength highspeed switches to overcome headofline

More information

Randomized Scheduling Algorithms for High-Aggregate Bandwidth Switches

Randomized Scheduling Algorithms for High-Aggregate Bandwidth Switches 546 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 21, NO. 4, MAY 2003 Randomized Scheduling Algorithms for High-Aggregate Bandwidth Switches Paolo Giaccone, Member, IEEE, Balaji Prabhakar, Member,

More information

Providing Performance Guarantees for Buffered Crossbar Switches without Speedup

Providing Performance Guarantees for Buffered Crossbar Switches without Speedup Providing Performance Guarantees for Buffered Crossbar Switches without Speedup Deng Pan, Zhenyu Yang, Kia Makki, and Niki Pissinou Florida International University Miami, FL 33199 {pand, yangz, makkik,

More information

4. Time-Space Switching and the family of Input Queueing Architectures

4. Time-Space Switching and the family of Input Queueing Architectures 4. Time-Space Switching and the family of Input Queueing Architectures 4.1 Intro: Time-Space Sw.; Input Q ing w/o VOQ s 4.2 Input Queueing with VOQ s: Crossbar Scheduling 4.3 Adv.Topics: Pipelined, Packet,

More information

The Arbitration Problem

The Arbitration Problem HighPerform Switchingand TelecomCenterWorkshop:Sep outing ance t4, 97. EE84Y: Packet Switch Architectures Part II Load-balanced Switches ick McKeown Professor of Electrical Engineering and Computer Science,

More information

Switch Simulations Based on Workload Pattern Generation and Smoothed Periodic Input

Switch Simulations Based on Workload Pattern Generation and Smoothed Periodic Input Switch Simulations Based on Workload Pattern Generation and Smoothed Periodic Input Qiang Zheng', Si-Min He', Shu-Tao Sun, Yan-Feng Zheng', Wen Gao' 'Institute of Computing Technology, Chinese Academy

More information

A Novel Feedback-based Two-stage Switch Architecture

A Novel Feedback-based Two-stage Switch Architecture A Novel Feedback-based Two-stage Switch Architecture Kwan L. Yeung and N. H. Liu Dept. of Electrical and Electronic Engineering The University of Hong Kong Pokfulam, Hong Kong E-mail: kyeung@eee.hku.hk

More information

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches : A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches Eiji Oki, Roberto Rojas-Cessa, and H. Jonathan Chao Abstract This paper proposes a pipeline-based concurrent round-robin

More information

1 Architectures of Internet Switches and Routers

1 Architectures of Internet Switches and Routers 1 Architectures of Internet Switches and Routers Xin Li, Lotfi Mhamdi, Jing Liu, Konghong Pun, and Mounir Hamdi The Hong-Kong University of Science & Technology. {lixin,lotfi,liujing, konghong,hamdi}@cs.ust.hk

More information

206 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY The RGA arbitration can also start from the output side like in DRR [13] and

206 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY The RGA arbitration can also start from the output side like in DRR [13] and 206 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008 Matching From the First Iteration: An Iterative Switching Algorithm for an Input Queued Switch Saad Mneimneh Abstract An iterative

More information

Router architectures: OQ and IQ switching

Router architectures: OQ and IQ switching Routers/switches architectures Andrea Bianco Telecommunication etwork Group firstname.lastname@polito.it http://www.telematica.polito.it/ Computer etwork Design - The Internet is a mesh of routers core

More information

OpenFlow based Flow Level Bandwidth Provisioning for CICQ Switches

OpenFlow based Flow Level Bandwidth Provisioning for CICQ Switches OpenFlow based Flow Level Bandwidth Provisioning for CICQ Switches Hao Jin, Deng Pan, Jason Liu, and Niki Pissinou Florida International University Abstract Flow level bandwidth provisioning offers fine

More information

Dynamic Scheduling Algorithm for input-queued crossbar switches

Dynamic Scheduling Algorithm for input-queued crossbar switches Dynamic Scheduling Algorithm for input-queued crossbar switches Mihir V. Shah, Mehul C. Patel, Dinesh J. Sharma, Ajay I. Trivedi Abstract Crossbars are main components of communication switches used to

More information

Internet-Router Buffered Crossbars Based on Networks on Chip

Internet-Router Buffered Crossbars Based on Networks on Chip Internet-Router Buffered Crossbars Based on etworks on Chip Kees Goossens,2 Lotfi Mhamdi 2 Iria Varela Senín 2 XP Semiconductors, Eindhoven, The etherlands 2 Computer Engineering, Delft University of Technology,

More information

Citation Globecom - IEEE Global Telecommunications Conference, 2011

Citation Globecom - IEEE Global Telecommunications Conference, 2011 Title Achieving 100% throughput for multicast traffic in input-queued switches Author(s) Hu, B; He, C; Yeung, KL Citation Globecom - IEEE Global Telecommunications Conference, 2011 Issued Date 2011 URL

More information

A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling

A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling Lotfi Mhamdi Computer Engineering Laboratory TU Delft, The etherlands lotfi@ce.et.tudelft.nl Abstract The crossbar fabric

More information

6.2 Per-Flow Queueing and Flow Control

6.2 Per-Flow Queueing and Flow Control 6.2 Per-Flow Queueing and Flow Control Indiscriminate flow control causes local congestion (output contention) to adversely affect other, unrelated flows indiscriminate flow control causes phenomena analogous

More information

Asynchronous vs Synchronous Input-Queued Switches

Asynchronous vs Synchronous Input-Queued Switches Asynchronous vs Synchronous Input-Queued Switches Andrea Bianco, Davide Cuda, Paolo Giaccone, Fabio Neri Dipartimento di Elettronica, Politecnico di Torino (Italy) Abstract Input-queued (IQ) switches are

More information

Router/switch architectures. The Internet is a mesh of routers. The Internet is a mesh of routers. Pag. 1

Router/switch architectures. The Internet is a mesh of routers. The Internet is a mesh of routers. Pag. 1 Router/switch architectures Andrea Bianco Telecommunication etwork Group firstname.lastname@polito.it http://www.telematica.polito.it/ Computer etworks Design and Management - The Internet is a mesh of

More information

Providing Flow Based Performance Guarantees for Buffered Crossbar Switches

Providing Flow Based Performance Guarantees for Buffered Crossbar Switches Providing Flow Based Performance Guarantees for Buffered Crossbar Switches Deng Pan Dept. of Electrical & Computer Engineering Florida International University Miami, Florida 33174, USA pand@fiu.edu Yuanyuan

More information

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch Eiji Oki * Zhigang Jing Roberto Rojas-Cessa H. Jonathan Chao NTT Network Service Systems Laboratories Department of Electrical Engineering

More information

A Split-Central-Buffered Load-Balancing Clos-Network Switch with In-Order Forwarding

A Split-Central-Buffered Load-Balancing Clos-Network Switch with In-Order Forwarding A Split-Central-Buffered Load-Balancing Clos-Network Switch with In-Order Forwarding Oladele Theophilus Sule, Roberto Rojas-Cessa, Ziqian Dong, Chuan-Bi Lin, arxiv:8265v [csni] 3 Dec 28 Abstract We propose

More information

Packet dispatching using module matching in the modified MSM Clos-network switch

Packet dispatching using module matching in the modified MSM Clos-network switch Telecommun Syst (27) 66:55 53 DOI.7/s235-7-3-8 Packet dispatching using module matching in the modified MSM Clos-network switch Janusz Kleban Published online: 9 March 27 The Author(s) 27. This article

More information

Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions

Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions Anuj Kumar, Rabi N. Mahapatra Texas A&M University, College Station, U.S.A Email: {anujk, rabi}@cs.tamu.edu

More information

Selective Request Round-Robin Scheduling for VOQ Packet Switch ArchitectureI

Selective Request Round-Robin Scheduling for VOQ Packet Switch ArchitectureI This full tet paper was peer reviewed at the direction of IEEE Communications Society subject matter eperts for publication in the IEEE ICC 2011 proceedings Selective Request Round-Robin Scheduling for

More information

Multicast Traffic in Input-Queued Switches: Optimal Scheduling and Maximum Throughput

Multicast Traffic in Input-Queued Switches: Optimal Scheduling and Maximum Throughput IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 11, NO 3, JUNE 2003 465 Multicast Traffic in Input-Queued Switches: Optimal Scheduling and Maximum Throughput Marco Ajmone Marsan, Fellow, IEEE, Andrea Bianco,

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,900 6,000 20M Open access books available International authors and editors Downloads Our authors

More information

Performance Analysis of Cell Switching Management Scheme in Wireless Packet Communications

Performance Analysis of Cell Switching Management Scheme in Wireless Packet Communications Performance Analysis of Cell Switching Management Scheme in Wireless Packet Communications Jongho Bang Sirin Tekinay Nirwan Ansari New Jersey Center for Wireless Telecommunications Department of Electrical

More information

Variable-Size Multipacket Segments in Buffered Crossbar (CICQ) Architectures

Variable-Size Multipacket Segments in Buffered Crossbar (CICQ) Architectures Variable-Size Multipacket Segments in Buffered Crossbar (CICQ) Architectures Manolis Katevenis and Georgios Passas Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH)

More information

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches Xike Li, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE* Abstract The distributed shared memory (DSM) packet switching

More information

The Crosspoint-Queued Switch

The Crosspoint-Queued Switch The Crosspoint-Queued Switch Yossi Kanizo Dept. of Computer Science Technion, Haifa, Israel ykanizo@cs.technion.ac.il David Hay Dept. of Electronics Politecnico di Torino, Turin, Italy hay@tlc.polito.it

More information

Designing Buffer Capacity of Crosspoint-Queued Switch

Designing Buffer Capacity of Crosspoint-Queued Switch Designing Buffer Capacity of Crosspoint-Queued Switch Guo Chen, Dan Pei, Youjian Zhao, Yongqian Sun To cite this version: Guo Chen, Dan Pei, Youjian Zhao, Yongqian Sun. Designing Buffer Capacity of Crosspoint-Queued

More information

EP2210 Scheduling. Lecture material:

EP2210 Scheduling. Lecture material: EP2210 Scheduling Lecture material: Bertsekas, Gallager, 6.1.2. MIT OpenCourseWare, 6.829 A. Parekh, R. Gallager, A generalized Processor Sharing Approach to Flow Control - The Single Node Case, IEEE Infocom

More information

Stretch-Optimal Scheduling for On-Demand Data Broadcasts

Stretch-Optimal Scheduling for On-Demand Data Broadcasts Stretch-Optimal Scheduling for On-Demand Data roadcasts Yiqiong Wu and Guohong Cao Department of Computer Science & Engineering The Pennsylvania State University, University Park, PA 6 E-mail: fywu,gcaog@cse.psu.edu

More information

THERE has been much interest recently in a class of. The Interleaved Matching Switch Architecture

THERE has been much interest recently in a class of. The Interleaved Matching Switch Architecture IEEE TRASACTIOS O COMMUICATIOS, VOL. 57, O., DECEMBER 009 The Interleaved Matching Switch Architecture Bill Lin, Member, IEEE, and Isaac Keslassy, Member, IEEE. Abstract Operators need routers to provide

More information

On Scheduling Unicast and Multicast Traffic in High Speed Routers

On Scheduling Unicast and Multicast Traffic in High Speed Routers On Scheduling Unicast and Multicast Traffic in High Speed Routers Kwan-Wu Chin School of Electrical, Computer and Telecommunications Engineering University of Wollongong kwanwu@uow.edu.au Abstract Researchers

More information

A Four-Terabit Single-Stage Packet Switch with Large. Round-Trip Time Support. F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I.

A Four-Terabit Single-Stage Packet Switch with Large. Round-Trip Time Support. F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I. A Four-Terabit Single-Stage Packet Switch with Large Round-Trip Time Support F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I. Iliadis IBM Research, Zurich Research Laboratory, CH-8803 Ruschlikon, Switzerland

More information

A Scalable Frame-based Multi-Crosspoint Packet Switching Architecture

A Scalable Frame-based Multi-Crosspoint Packet Switching Architecture A Scalable Frame-based ulti- Pacet Switching Architecture ie Li, Student ember, Itamar Elhanany, Senior ember Department of Electrical & Computer Engineering he University of ennessee noxville, 3799-00

More information

Design and Performance Analysis of a Practical Load-Balanced Switch

Design and Performance Analysis of a Practical Load-Balanced Switch 242 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL 57, NO 8, AUGUST 29 Design and Performance Analysis of a Practical Load-Balanced Switch Yanming Shen, Shivendra S Panwar, and H Jonathan Chao Abstract The load-balanced

More information

Packetisation in Optical Packet Switch Fabrics using adaptive timeout values

Packetisation in Optical Packet Switch Fabrics using adaptive timeout values Packetisation in Optical Packet Switch Fabrics using adaptive timeout values Brian B. Mortensen COM DTU Technical University of Denmark DK-28 Kgs. Lyngby Email: bbm@com.dtu.dk Abstract Hybrid electro-optical

More information

DISA: A Robust Scheduling Algorithm for Scalable Crosspoint-Based Switch Fabrics

DISA: A Robust Scheduling Algorithm for Scalable Crosspoint-Based Switch Fabrics IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 21, NO. 4, MAY 2003 535 DISA: A Robust Scheduling Algorithm for Scalable Crosspoint-Based Switch Fabrics Itamar Elhanany, Student Member, IEEE, and

More information

Adaptive Linear Prediction of Queues for Reduced Rate Scheduling in Optical Routers

Adaptive Linear Prediction of Queues for Reduced Rate Scheduling in Optical Routers Adaptive Linear Prediction of Queues for Reduced Rate Scheduling in Optical Routers Yang Jiao and Ritesh Madan EE 384Y Final Project Stanford University Abstract This paper describes a switching scheme

More information

Parallelism in Network Systems

Parallelism in Network Systems High Performance Switching Telecom Center Workshop: and outing Sept 4, 997. Parallelism in Network Systems Joint work with Sundar Iyer HP Labs, 0 th September, 00 Nick McKeown Professor of Electrical Engineering

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Work-Conserving Distributed Schedulers for Terabit Routers

Work-Conserving Distributed Schedulers for Terabit Routers Work-Conserving Distributed Schedulers for Terabit Routers Prashanth Pappu Washington University Computer Science and Engineering St. Louis, MO 633-4899 +-34-935-436 prashant@arl.wustl.edu Jonathan Turner

More information

IV. PACKET SWITCH ARCHITECTURES

IV. PACKET SWITCH ARCHITECTURES IV. PACKET SWITCH ARCHITECTURES (a) General Concept - as packet arrives at switch, destination (and possibly source) field in packet header is used as index into routing tables specifying next switch in

More information

THE LOAD-BALANCED ROUTER

THE LOAD-BALANCED ROUTER THE LOAD-BALACED ROUTER a dissertation submitted to the department of electrical engineering and the committee on graduate studies of stanford university in partial fulfillment of the requirements for

More information

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch *

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch * Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch * K. J. Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 3360 Abstract - Input-buffered

More information

Introduction: Two motivating examples for the analytical approach

Introduction: Two motivating examples for the analytical approach Introduction: Two motivating examples for the analytical approach Hongwei Zhang http://www.cs.wayne.edu/~hzhang Acknowledgement: this lecture is partially based on the slides of Dr. D. Manjunath Outline

More information

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Man-Ting Choy Department of Information Engineering, The Chinese University of Hong Kong mtchoy1@ie.cuhk.edu.hk

More information

Hierarchical Scheduling for DiffServ Classes

Hierarchical Scheduling for DiffServ Classes Hierarchical Scheduling for DiffServ Classes Mei Yang, Jianping Wang, Enyue Lu, S Q Zheng Department of Electrical and Computer Engineering, University of evada Las Vegas, Las Vegas, V 8954 Department

More information

Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup

Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup Deng Pan Florida International University Miami, FL Kia Makki TUM Miami, FL Niki Pissinou Florida International

More information

Adaptive Max-min Fair Scheduling in Buffered Crossbar Switches Without Speedup

Adaptive Max-min Fair Scheduling in Buffered Crossbar Switches Without Speedup Adaptive Max-min Fair Scheduling in ed Crossbar Switches Without Speedup Xiao Zhang, Satya R. Mohanty and Laxmi. Bhuyan Department of Computer Science and Engineering University of California, Riverside,

More information

Parallel Packet Switching using Multiplexors with Virtual Input Queues

Parallel Packet Switching using Multiplexors with Virtual Input Queues Parallel Packet Switching using Multiplexors with Virtual Input Queues Ahmed Aslam Kenneth J. Christensen Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620

More information

THE CAPACITY demand of the early generations of

THE CAPACITY demand of the early generations of IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 3, MARCH 2007 605 Resequencing Worst-Case Analysis for Parallel Buffered Packet Switches Ilias Iliadis, Senior Member, IEEE, and Wolfgang E. Denzel Abstract

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Switching Using Parallel Input Output Queued Switches With No Speedup

Switching Using Parallel Input Output Queued Switches With No Speedup IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 10, NO. 5, OCTOBER 2002 653 Switching Using Parallel Input Output Queued Switches With No Speedup Saad Mneimneh, Vishal Sharma, Senior Member, IEEE, and Kai-Yeung

More information

Scheduling in Switches with Small Internal Buffers

Scheduling in Switches with Small Internal Buffers cheduling in witches with mall Internal Buffers ikos Chrysos and Manolis Katevenis Inst. of Computer cience (IC), Foundation for Research and Technology - Hellas (FORTH) - member of HiPEAC FORTH-IC, Vassilika

More information

The Bounded Edge Coloring Problem and Offline Crossbar Scheduling

The Bounded Edge Coloring Problem and Offline Crossbar Scheduling The Bounded Edge Coloring Problem and Offline Crossbar Scheduling Jonathan Turner WUCSE-05-07 Abstract This paper introduces a variant of the classical edge coloring problem in graphs that can be applied

More information

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues D.N. Serpanos and P.I. Antoniadis Department of Computer Science University of Crete Knossos Avenue

More information

On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path

On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path Guoqiang Mao, Lixiang Xiong, and Xiaoyuan Ta School of Electrical and Information Engineering The University of Sydney NSW 2006, Australia

More information

Analyzing CIOQ Routers with Localized Memories

Analyzing CIOQ Routers with Localized Memories Chapter 4: Analyzing CIOQ Routers with Localized Memories Feb 2008, Bombay, India Contents 4.1 Introduction... 89 4.1.1 A Note to the Reader... 92 4.1.2 Methodology... 92 4.2 Architecture of a Crossbar-based

More information

A New Integrated Unicast/Multicast Scheduler for Input-Queued Switches

A New Integrated Unicast/Multicast Scheduler for Input-Queued Switches Proc. 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 20), Brisbane, Australia A New Integrated Unicast/Multicast Scheduler for Input-Queued Switches Kwan-Wu Chin School of Electrical,

More information

Switch Packet Arbitration via Queue-Learning

Switch Packet Arbitration via Queue-Learning Switch Packet Arbitration via QueueLearning Timothy X Brown Electrical and Computer Engineering University of Colorado Boulder, CO 803090530 timxb@coloradoedu Accepted in Advances in Neural Information

More information

Architecture and Performance Analysis of the Multicast Balanced Gamma Switch for Broadband Communications 1

Architecture and Performance Analysis of the Multicast Balanced Gamma Switch for Broadband Communications 1 Architecture and Performance Analysis of the Multicast Balanced Gamma Switch for Broadband Communications 1 Cheng Li, Member, IEEE, R. Venkatesan, Senior Member, IEEE, and H. M. Heys, Member, IEEE Faculty

More information

Scheduling Algorithms for Input-Queued Cell Switches. Nicholas William McKeown

Scheduling Algorithms for Input-Queued Cell Switches. Nicholas William McKeown Scheduling Algorithms for Input-Queued Cell Switches by Nicholas William McKeown B.Eng (University of Leeds) 1986 M.S. (University of California at Berkeley) 1992 A thesis submitted in partial satisfaction

More information

Scheduling Unsplittable Flows Using Parallel Switches

Scheduling Unsplittable Flows Using Parallel Switches Scheduling Unsplittable Flows Using Parallel Switches Saad Mneimneh, Kai-Yeung Siu Massachusetts Institute of Technology 77 Massachusetts Avenue Room -07, Cambridge, MA 039 Abstract We address the problem

More information

INF5050 Protocols and Routing in Internet (Friday ) Subject: IP-router architecture. Presented by Tor Skeie

INF5050 Protocols and Routing in Internet (Friday ) Subject: IP-router architecture. Presented by Tor Skeie INF5050 Protocols and Routing in Internet (Friday 9.2.2018) Subject: IP-router architecture Presented by Tor Skeie High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. This presentation

More information

PERFORMANCE ANALYSIS OF SPEEDED-UP IDGH-SPEED PACKET SWITCHES

PERFORMANCE ANALYSIS OF SPEEDED-UP IDGH-SPEED PACKET SWITCHES PERFORMANCE ANALYSIS OF SPEEDED-UP IDGH-SPEED PACKET SWITCHES Aniruddha S. Diwan1, Roch A. Guerin2, and Kumar N. Sivarajan1 * Ilndian Institute of Science 2 University of Pennsylvania Electrical Comm.

More information

On Achieving Throughput in an Input-Queued Switch

On Achieving Throughput in an Input-Queued Switch 858 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 11, NO. 5, OCTOBER 2003 On Achieving Throughput in an Input-Queued Switch Saad Mneimneh and Kai-Yeung Siu Abstract We establish some lower bounds on the speedup

More information

Comparison of pre-backoff and post-backoff procedures for IEEE distributed coordination function

Comparison of pre-backoff and post-backoff procedures for IEEE distributed coordination function Comparison of pre-backoff and post-backoff procedures for IEEE 802.11 distributed coordination function Ping Zhong, Xuemin Hong, Xiaofang Wu, Jianghong Shi a), and Huihuang Chen School of Information Science

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

Lecture 5: Performance Analysis I

Lecture 5: Performance Analysis I CS 6323 : Modeling and Inference Lecture 5: Performance Analysis I Prof. Gregory Provan Department of Computer Science University College Cork Slides: Based on M. Yin (Performability Analysis) Overview

More information

Parallel Packet Copies for Multicast

Parallel Packet Copies for Multicast Do you really need multicast? At line rates? Less is More J Parallel Packet Copies for Multicast J.1 Introduction Multicasting is the process of simultaneously sending the same data to multiple destinations

More information

Scaling Internet Routers Using Optics Producing a 100TB/s Router. Ashley Green and Brad Rosen February 16, 2004

Scaling Internet Routers Using Optics Producing a 100TB/s Router. Ashley Green and Brad Rosen February 16, 2004 Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004 Presentation Outline Motivation Avi s Black Box Black Box: Load Balance Switch Conclusion

More information

On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering

On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering Shivendra S. Panwar New York State Center for Advanced Technology in Telecommunications (CATT) Department of Electrical

More information

Buffered Packet Switch

Buffered Packet Switch Flow Control in a Multi-Plane Multi-Stage Buffered Packet Switch H. Jonathan Chao Department ofelectrical and Computer Engineering Polytechnic University, Brooklyn, NY 11201 chao@poly.edu Abstract- A large-capacity,

More information

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router Scheduling algorithms Scheduling Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Scheduling: choose a packet to transmit over a link among all

More information

PERFORMANCE OF A PACKET SWITCH WITH SHARED BUFFER AND INPUT QUEUEING

PERFORMANCE OF A PACKET SWITCH WITH SHARED BUFFER AND INPUT QUEUEING TELERAFFIC AD DATA RAFFIC in a Period of Change, ITC-3 A. Jensen and V.B. Iversen (Editors) Elsevier Science Publishers B.V. (orth-holland) lac, 99 9 PERFORMACE OF A PACKET SWITCH WITH SHARED BUFFER AD

More information

The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks

The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks E. Kozlovski, M. Düser, R. I. Killey, and P. Bayvel Department of and Electrical Engineering, University

More information

Crossbar - example. Crossbar. Crossbar. Combination: Time-space switching. Simple space-division switch Crosspoints can be turned on or off

Crossbar - example. Crossbar. Crossbar. Combination: Time-space switching. Simple space-division switch Crosspoints can be turned on or off Crossbar Crossbar - example Simple space-division switch Crosspoints can be turned on or off i n p u t s sessions: (,) (,) (,) (,) outputs Crossbar Advantages: simple to implement simple control flexible

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing

More information

ECE 697J Advanced Topics in Computer Networks

ECE 697J Advanced Topics in Computer Networks ECE 697J Advanced Topics in Computer Networks Switching Fabrics 10/02/03 Tilman Wolf 1 Router Data Path Last class: Single CPU is not fast enough for processing packets Multiple advanced processors in

More information