A FAST ARBITRATION SCHEME FOR TERABIT PACKET SWITCHES

Size: px
Start display at page:

Download "A FAST ARBITRATION SCHEME FOR TERABIT PACKET SWITCHES"

Transcription

1 A FAST ARBTRATON SCHEME FOR TERABT PACKET SWTCHES H. Jonathan Chao, Cheuk H. Lam, and Xiaolei Guo Department of Electrical Engineering, Polytechnic University, NY USA Abstract nput-output queued switches have been widely considered as the most feasible solution for large capacity packet switches and P routers. The challenge is to develop a high speed and cost-effective arbitration scheme to maximize the switch throughput and delay performance for supporting multimedia services with various quality-of-service (QoS) requirements. n this paper, we propose a ping-pong arbitration (PPA) scheme for output contention resolution in input-output queued switches. The basic idea is to divide the inputs into groups and apply arbitration recursively. Our recursive arbiter is hierarchically structured, consisting of multiple small-size arbiters at each layer. The arbitration time of an n-input switch is proportional to log4[;] when we group every two inputs or every two input groups at each layer. We present a 256 x 256 terabit crossbar multicast packet switch using the PPA. The design shows that our scheme can reduce the arbitration time of the 256 x 256 switch to 11 gates delay, demonstrating the arbitration is no longer the bottleneck limiting the switch capacity. 1 ntroduction Packet switching has been recognized as the key multiplexing and data transfer technique for future Broadband ntegrated Services Digital Networks (B-SDN) [l] and a basic means of high speed implementation of gigabit/terabit P routers [2, 31 for Next Generation nternet. To provide multimedia services with various quality-of-service (QoS) requirements, one of the challenging issues in designing a terabit switch is to develop a high speed and cost-effective output contention resolution scheme to maximize the switch throughput and delay performance. A packet switch consists of input and output ports interconnected by a switch fabric. The switch fabric can use shared-medium (e.g., bus), shared-memory, and space-division (e.g.l crossbar) architecture [l]. The 'Asynchronous Transfer Mode (ATM) switching is a special case of packet switching with equally sized packets (53-byte) called cells P. function of a packet switch is to transfer packets (actually cells ') from the input ports to the appropriate output ports based on the addresses contained within the packet headers. Since multiple packets from different input ports could be destined for the same output port at the same time, we call it output contention or output conflict, a switch arbitration or scheduling algorithm is needed to choose among them the one that the output mostly prefers at that time slot, grant the corresponding input, and configure the switch fabric to transfer the packet. Reducing the arbitration time can significantly reduce the packet delay across a switch, thus enabling high speed implementation. This paper will address itself to this issue. Consider a packet switch with n input/output ports. The switch can be classified as output queued, input queued or input-output queued switch. Define the speedup (c) of the switch fabric as the ratio of the switch fabric bandwidth and the bandwidth of the input links. (Unless otherwise stated, we assume henceforth every input/output link has the same capacity.) An output queued switch is the one with c 2 n. Since each output port can receive n incoming packets in a time slot, there is no output contention. The switch has desirably zero input queuing delay without considering storeand-forward implementation. However, the well-known problem to an output queued switch is the output port memory speed limiting it from buffering all possible input packets. An input queued switch has no speedup (i.e., c = 1) and thus is much easier to implement. However, it suffers the well-known problem of head-of-line (HOL) blocking [4], which could limit its maximum throughput to about 58% when it uses first-in-first-out (FFO) at each input port and operates under uniform traffic (i.e., the output address of each packet is independently and equally distributed among every output). Many techniques have been suggested to reduce the HOL blocking, for example, by considering the first i' cells in the FFO, where i' > 1 [5]. The HOL blocking can be eliminated entirely by using virtual output queuing (VOQ) 'n practice, the variable length packets are usually broken into fixed sired cells (not necessarily 53 bytes) before being transmitted across the switch fabric; the cells are reassembled at the output of the switch [7] /99/$ EEE Global Telecommunications Conference - Globecom'99

2 [6], where each input maintains a separate queue for each output. To achieve 100% throughput in an input-queued switch with VOQs, sophisticated arbitration is required to schedule packets between various inputs and outputs. t is simply an application of bipartite graph matching [7] - each ouput must be paired with at most one input that has a cell destined for that output; a complex procedure to implement in hardware. t has been shown that an input buffered switch with VOQs can provide asymptotic 100% throughput using a maximum matching algorithm [9]. However, the complexity of the best known maximum matching algorithm is O(n2.5) [ll], which is too high for high speed implementation. n practice, a number of maximal matching algorithms have been proposed, such as parallel iterative matching (PM) [7], iterative round robin matching (SLP) [8], and dual round robin matching (DRRM) [lo]. Their complexities are still much high. An input-output queued switch uses a speedup of c > 1. Recent study [12] shows that it is possible to achieve 100% switch throughput with a moderate speedup of 2. Since each output port can receive up to c cells in a time slot (each input port can send up to c cells during the same time), the requirement on the number of input-output matching found in each arbitration cycle (c cycles in a time slot) may possibly be relaxed, enabling simpler arbitration schemes. On the other hand, the arbitration time is reduced c times, making the time constraint more stringent. This motivates us to develop a ping-pong arbitration (PPA) scheme for output contention resolution in terabit packet switching. The basic idea is to divide the inputs into groups and apply arbitration recursively. The traditional arbiters handle all inputs together and the arbitration time is proportional to the number of inputs. As a result, the switch size or capacity is limited given a fixed amount of arbitration time. Our recursive arbiter is hierarchically structured, consisting of multiple small-size arbiters at each layer. The arbitration time of an n-input switch is proportional to log4 when we group every two inputs or every two input groups at each layer. We present a 256 x 256 terabit crossbar multicast packet switch using the PPA. The design shows that our scheme can reduce the arbitration time of the 256 x 256 switch to 11 gates delay, demonstrating the arbitration is no longer the 'A maximum match is one that pairs the maximum number of inputs and outputs together; there is no other pairing that matches more inputs and outputs [7]. 4A maximal match is one for which pairings cannot be trivially added; each node (i.e., input or output) is either matched or has no edge (i.e., connection path) to an unmatched node [q. bottleneck limiting the switch capacity. The rest of this paper is organized as follows. Section 2 introduces the PPA and its performance study. Section 3 describes the implementation of the PPA. Section 4 shows a 256 x 256 terabit crossbar multicast packet switch using the PPA. Section 5 presents the conclusion, 2 Ping-Pong Arbitration (PPA) 2.1 Principles of Ping-Pong Arbitration Consider an n-input packet switch. To resolve its output contention, a solution is to use an arbiter for each output to fairly select one among those incoming packets and send back a grant signal to the corresponding input. The arbitration procedure is as follows: 1. During every arbitration cycle, each input submits a onebit request signal to each output (arbiter), indicating whether its packet, if any, is destined for the output. 2. Each output arbiter collects n request signals, among which one input with active request is granted according to some priority order. 3. A grant signal is sent back to acknowledge the input. The paper focuses on the second step which arbitrates one input among n possible ones. A simple round robin scheme is generally adopted in an arbiter to ensure a fair arbitration among the inputs, such as SLP [S] and DRRM [lo]. magine there is a token circulating among the inputs in a certain ordering. The input that is granted by the arbiter is said to grasp the token, which represents the grant signal. The arbiter is responsible for moving the token among the inputs that have request signals. The traditional arbiters handle all inputs together and the arbitration time is proportional to the number of inputs. As a result, the switch size or capacity is limited given a fixed amount of arbitration time. Here we suggest to divide the inputs into groups. Each group has its own arbiter. The request information of each group is summarized as a group request signal. Further grouping can be applied recursively to all the group request signals at the current layer, forming a tree structure, as illustrated in Figure 1. Thus, an arbiter with n inputs can be constructed using multiple small-size arbiters (AR) at each layer. Different group sizes can be used. Global Telecommunications Conference - Globecom'

3 P U P P d w br* 4, 1. layer 4 without any interference from its external grant signal so that one gate delay is saved. The external grant signal is used only for governing the flag signal update. At each leaf AR2, the local grant signals have to com- i w hyer, local logical operations to be finished while waiting for the grant signals from upper layers, which minimizes the total arbitration time. Assume n = 2k. Figure 1 depicts a k-layer complete ' ' binary tree with a group size of two when k = 4. AR2 represents a 2-input AR. An AR2 contains an internally feedback signal that indicates which input is favored. Once an input is granted in an arbitration cycle, the other input will be favored in the next cycle. n other words, the granted request is always chosen between left (input) and right alternately. That is why we call it ping-pong arbitration (PPA). This mechanism is maintained by producing an output flag signal feedbacked to the input; a register is required to forward this signal at the beginning of each arbitration cycle. The first layer consists of 2k-' arbiters we call leafar2s. The next k - 2 layers consist of arbiters called iniermediaie AR2s, 2"' of which are at layer i. Finally, the last layer consists of only an arbiter called root AR2. Every AR2 has two request signals. An input request signal at layer i is the group request signal of 2i-1 inputs and can be produced by OR gates either directly or recursively. The grant signal from an AR2 has to be feedbacked to all the lower-layer AR2s related to the corresponding input. Therefore, in addition to the feedback flag signal, an AR2 adds an external grant signal that ANDes all grant signals at upper layers, indicating the arbitration results of upper layers. One important usage of the external grant signal is to govern the local flag signal update. f the external grant signal is invalid, which indicates that these two input requests as a whole are not granted at some upper layer(s), then the flag should be kept unchanged in order to preserve the original preference. The root AR2 needs no external grant signal. At each intermediate AR2, the local grant signals are sent out when n = 4 for instance, which is still round-robin, if each input always has packet to send and there is no conflict between all the input request signals. Below we show its performance by simulations. 2.2 Performance ssues We simulate a 32 x 32 switch under uniform traffic (the output address of each cell is equally distributed among all outputs), or bursty traffic with burst length of 10 cells. The bursty traffic can be used as a packet traffic model with each burst representing a packet of multiple cells destined for the same output. The output address of each packet (burst) is also equally distributed among all outputs. The PPA is also used for request selection among VOQs at each input ports. We compare the PPA with FFO+RR (FFO for input queuing and RR for round-robin arbitration), Output Queuing, is- LP, and DRRM. The FFO+RR and Output Queuing serve as benchmark (lower and upper bound in terms of performance). n the islp [8], each VOQ in the input buffer can send a request to an output arbiter. n other words, each input can send up to n requests to n arbiters, one for each. After the grant arbitration, an input may receive multiple grants, and another round of arbitration is needed to guarantee that at most one cell is selected in each input port. A cycle of SLP arbitration consists of five steps: (1) input ports send multiple requests to the output arbiters; (2) the output arbiters perform the grant arbitration; (3) the output arbiters send grants to input arbiters; (4) the input arbiters perform another arbitration for solving the problem of possible multiple grants; and (5) the input arbiters send accept signals to 1238 Global Telecommunications Conference - Globecom'99

4 10 ::T 3' 1 e- -cw~lo"ranp ELiP. ' ' ' /: -i output arbiters. n the DRRM [lo], an input arbiter at each input selects a non-empty VOQ according to the round-robin service discipline. After the selection, each input port sends one request, if any, to an output arbiter. An output arbiter at each output receives up to n requests and chooses one of them based on the round-robin service discipline and sends a grant to the winner input port. The DRRM has four steps in a cycle. They are: (1) each input arbiter performs request selection; (2) the input arbiters send requests to the output arbiters; (3) each output arbiter performs grant arbitration; and (4) the output arbiters send grant signals to input arbiters. Figure 2 shows the throughput and total average delay of the switch under various arbitration schemes, where a speedup of 1 or 2 is used. The PPA performs better than the FFO+RR but worse than the SLP and the DRRM when the speedup is 1, however, they all perform comparably when a speedup of 2 is used. Recall that PM [7] needs a random number generator for its decision process, which is difficult and expensive to implement at high speed. Both SLP and DRRM need to maintain a round-robin service list, which is also expensive to implement. The PPA, however, is simpler for high speed implementation. Since all arbitrations are done in parallel, the overall arbitration time of an n-input switch is proportional to log4 when we group every two inputs or every two input groups at each layer, as will be described in the following section. 3 mplementation of the PPA ' ThOwnlJ 200 le0 - le0 ~ f40: 4 '20 a : RW ~ FFO+RR + PPA (c) c = 2 under uniform traffic x DRRM 0 SLP - OVtM OUWlnp (d) c = 2 under bursty traffic Fig. 2. Comparison of the PPA with FFO+RR, Output Queuing, SLP and DRRM: switch throughput and total average delay Multiple small arbiters can be recursively grouped together to form a large and multi-layer arbiter, as illustrated in Figure 1. Figure 3 depicts an n-input arbiter constructed by using p q-input arbiters (AR-q), from which the group request/grant signals are incorporated into a pinput arbiter (AR-p). Below we demonstrate constructing a 256-input arbiter starting from the basic units: 2-input arbiters input Arbiter (AR2) Figure 4 shows a basic 2-input arbiter (AR2) and its logical circuits. The AR2 contains an internally feedbacked flag signal, denoted by Fi, that indicates which input is favored. When all G, inputs are 1, indicating these two inputs requests (Ro and R) as a whole are swhen the flag is LOW, Ro is favored; when the flag is HGH, R is favored. Global Telecommunications Conference - Globecom'99 f-d>* *. - b u 1239

5 ports Fig. 3. Hierarchy of recursive arbitration with n = pq inputs granted by all the upper layers, once an input is granted in an arbitration cycle, the other input will be favored in the next cycle, as shown by the true table in Figure 4(a). This mechanism is maintained by producing an output flag signal, denoted by F,, feedbacked to the input. Between F, and F; there is a D-flip-flop which is functioned as a register forwarding FO to Fi at the beginning of each cell time slot. When at least one of G, inputs is 0, indicating the group request of Ro and R is not granted at some upper layer(s), Go = G = 0, F, = F;, i.e., the flag is kept unchanged in order to preserve the original preference. As shown in Figure 4(b), the local grant signals have to be ANDed with the grant signals from the upper layers to provide full information whether the corresponding input is granted or not. G, inputs are added at the final stage to allow other local logical operations to be finished in order to minimize the total arbitration time input Arbiter (AR4) A 4-input arbiter module (AR4) has four request signals, four output grant signals, one outgoing group request and one incoming group grant signal. Figure 5(a) depicts our design of an AR4 constructed by three AR2s (two leaf AR2s and one intermediate AR2; all have the same circuitry), two 2-input OR gates and one 4-input OR gate. Each leaf AR2 handles a pair of inputs and generates the local grant signals while allowing two external grant signals coming from upper layers: one from the intermediate AR2 inside the AR4 and the other from outside AR4. These two signals directly join the AND gates at the final stage inside each leaf AR2 for minimizing the delay. Denote R,j and G;j as the group request signal and the group grant signal between input i and input j. The intermediate AR2 handles the group requests (Rol and R23) and generates the grant signals Fig. 4. (a) A 2-input arbiter (AR2) and its true table (b) its logical circuits (Go1 and G23) to each leaf AR2 respectively. t contains only one grant signal that is from the upper layer for controlling the flag signal input Arbiter (AR16) As shown in Figure 5(b), an AR16 contains five AR4s in two layers: four at the lower layer handling the local input request signals and one at the higher layer handling the group request signals input Arbiter (AR256) Figure 6 illustrates a 256-input arbiter (AR256) constructed by AR4s and its arbitration delay components. The path numbered from 1 to 11 shows the delay from when an input sending its request signal till it receiving the grant signal. The first four gates delay (1-4) counts the time for the input s request signal passing though the four layers of AR4s and reaching the root AR2, where one OR-gate delay is needed at each layer to generate the request signal [see Figure 5(a)]. The next three gates delay (5-7) counts the time that the root AR2 performs its arbitration [see Figure 4(b)]. The last four gates delay (8-11) counts the time for the grant signals at upper layers passing down to the corresponding input. The total arbitration time of an AR256 is then 1240 Global Telecommu.nicotions Conference - Globecom 99

6 RO gates delay. t thus follows that the arbitration time (Tn) of an n-input arbiter using such implementation is n Tn = 2 log, + 3. (1) R ' HGH (=) AR4 R - R3 - Jm4 0 mo1ar2 intermediate AR2 0 leafar gram signal - request : OR-gate delay generating q ual signal a1 each : %gale delay of -@ : he llst AND-gale delay in each AR2 Fig. 6. Decomposition of arbitration delay in an AR256 4 A Terabit Crossbar Packet Switch Using PPA n this section, we present a terabit crossbar packet switch by using the PPA. Our design adopts the pipelining technique, separating the arbitration circuits from the data routing circuits, to enable the next-round arbitration to be performed in parallel with the current round of data transmission. Fig. 5. (a) A 4-input arbiter (AR4) and (b) a 16-input arbiter (AR16) constructed by five AR4s 4.1 Crosspoint Unit A crosspoint, the basic unit in a crossbar switch, corresponds to an input and output pair. As shown in Figure 7, it conceptually consists of two parts : a data crosspoint (DXP) and a multicast request crosspoint (MXP). The output of a DXP is controlled by the grant signal. t is LOW by default and the crosspoint is in CROSS Global Telecommunications Conference - Globecom'

7 state that the vertical data will get through. f the grant signal turns HGH, then the crosspoint is toggled and the horizontal data will get through. communications between an input port controller (PC).. and an SW16 chip are through the following 6 lines: 0 4line data broadcasting from the input port to all crosspoints on the same row; line Multicast Pattern (MP) with the NMP bit at the head of the MP indicating whether it is a new MP; 1-line acknowledgement (ACK) signal with 2 bits from the chip to the input. Fig. 7. The conceptual depiction of a crosspoint unit (Xunit) Note that the Dh is broadcast to all DXPs, while the MP signal is cascaded between MXPs to facilitate shifting in the MP. When a new MP is shifted into the switch chip, the MP bit is stored in each corresponding MXP. n addition, we have a bit at the head of MP signal, denoted by NMP (New MP), to indicate whether the MP signal is a new MP and thus to decide whether the MP should be accepted at the switch chip. After each arbitration, we update the request signal (i.e. the MP bit) for the next round. Depending on the NMP bit signal arriving at the beginning of the next arbitration cycle, we decide to use the new MP or the old updated one. 4.2 SW16 - a 16 x 16 crossbar switching chip with AR16s Figure 8 shows a chip layout for a 16 x 16 switch. The. m U.... U Do... DD Fig. 8. Layout of the SW16 chip (data bus in bold) Da W The number of incoming and outgoing signal pins in the chip is 6 x x 16 = 192. The two-bit ACK signal, (AC-1, AClSo), is generated by the handshaking circuits (HSC) in the SW16 chip. The signal specifications are as follows: (ACK, ACKo) PC Action Description 00 do nothing don t send cell nor MP 01 load cell winning the contention 10 load MP all MP bits are zero 11 load both The first bit (AC-1) is used for transmitting MP and the second (ACKo) for transmitting cell. When building a large-scale switch, multiple SW16 chips are interconnected in a two-dimension array. Each PC will receive multiple ACK signals, one from each SW16. The final decision of whether the HOL cell or the MP of the cell next to the HOL should be transmitted to the switch can be easily made by ORing ACKo s or by ANDing ACS s from the SW16 chips on the same row. 4.3 A 256 x 256 switch with 1 Tb/s capacity Consider to build a 256 x 256 terabit multicast switch by using a speedup of two and bit slicing technique. The chosen cell size is 64 bytes when calculating the time budget for arbitration. Each 64-byte cell is sliced into 4 16-byte parts, parallelly handled by using 4 switching planes. n each plane, 256 SW16 chips are arranged in a two-dimension array, as shown in Figure 9. The input capacity per port in each plane is reduced to 5Gbls/4 = 1.25Gb/s. With a 4-bit wide bus for data signals, the switch operation rate is 1.25Gbls x 214 % 622Mb/s. The layout of the 256 x 256 switch plane is shown in Figure 9. The switch consists of 16 x 16 = 256 SW16 chips. On top of these chips, we have 256 AR16s for higher-layer arbitrations. They can be grouped into chips and built separately as shown in Figure 9. Or they can be distributed over all SW16 chips in the same column in order to minimize the number of chips. The 1242 Global Telecommunications Conference - Globecom 99

8 sic: Srpl i9mfr. an&......,:::......,..,,. j : : : ijjj *::: : j : : j ; : : 1 :, : capacity input-output queued switches, which aims at maximizing the switch throughput and delay performance for supporting multimedia services with various &OS requirements. The basic idea is to divide the inputs into groups and apply arbitration recursively. Our recursive arbiter is hierarchically structured, consisting of multiple small-size arbiters at each layer. The arbitration time of an n-input switch is proportional to log, 121 when we group every two inputs or every two input groups at each layer. We present a 256 x 256 terabit crossbar multicast packet switch using the PPA. The design shows that our scheme can reduce the arbitration time of the 256 x 256 switch to 11 gates delay, less than 5 ns using the current CMOS technology, demonstrating the arbitration is no longer the bottleneck limiting the switch capacity. Fig. 9. A plane of the 1 Tb/s crossbar structured multicast switch total number of signal pins in each SW16 will then be increased by 16 x 2 = 32 to 224. Data is identically broadcast from an input to all SW16 chips in the same row while the multicast patterns to those SW16 chips are different. We introduce a SC (switch interface circuits) between each PC and a row of SW16 chips to handle the data broadcast while collecting and processing the ACK signals from the SW16 chips. The SCS can be either placed inside the switch plane or incorporated into the PCs. With four data lines, the transmission time for each cell is equal to 16 bytes / (4 bits/clock) = 32 clocks, which is the time budget for the arbitration and its preand post- processing. An arbitration cycle includes (1) shifting the multicast pattern; (2) arbitrating; and (3) feedbacking acknowledgements. n our design, chips are assigned MP directly. t takes just 17 bit clocks (including the NMP) for the MP shifting into a chip. The arbitration time using the PPA is only 11 gates delay (see Figure 6) for the 256 x 256 switch, less than 5 ns using the current CMOS technology. The circuitry for generating acknowledgements is very simple. The total arbitration and feedback delay is about a few clocks. Therefore, it takes about 22 clocks for one arbitration cycle, less than 32 clocks required for transmitting a cell. 5 Conclusions n this paper, we propose a fast ping-pong arbitration (PPA) scheme for output contention resolution in large Acknowledgement We would like to thank Dr. Jin-Soo Park for providing the simulation results. References F.A. Tobagi, Fast Packet Switch Architectures for Broadband nte rated Services Digital Networks, Proceedings of the EEE, 78(13, p , January S. Keshav and R. Sharma, ssues and Trends in Router Design, EEE Communications Magazine, p , May V.P. Kumar, T.V. Lakshman and D. Stiliadis, Beyond Best Effort: Router Architectures for the Differentiated Services of Tomorrow s nternet, EEE Communications Mogazine, p , May M. Karol, M. Hluchyj, and S. Morgan, nput versus output queueing on a space division switch, EEE Trans. Comm., 35(12), pp , M. Karol and M. Hluchyj, Queueing in high-performance packet-switching, EEE J. Select. Area in Comm., Vo1.6, pp , December Y. Tamir and H-C. Chi, High performance multi-queue buffers for VLS communication switches, Proc. oj 15th Ann. Symp. on Comp. Arch., p , June T. Anderson, S. Owicki, J. Saxe, and C. Thacker, High speed switch scheduling for local area networks, ACM Trans. Computer Systems, pp , November N. McKeown, P. Varaiya, and J. Walrand, Scheduling cells in an input-queued switch, EE Electronics Letters, 29(25), pp , December N. McKeown, V. Anantharam, and J. Walrand Achievin 100% Throughput in an nput-queued Switch, Pric. EEE d FOCOM, pp , H. Jonathan Chao and J. S. Park, Centralized contention re? olution schemes for a large-capacity optical ATM switch, in Proc. EEE ATM Workshop, Ebirfax, VA, May R. E. Tarjan, Data Structure8 and Network Algorithms, Bell Labs, R. Guerin and K.N. Sivarajan, Delay and Throu hput Performance of Speed-up nput-queuing Packet Switc\es. BM Research Report RC 20892, June Global Telecommunications Conference - Globecom

Scalable Schedulers for High-Performance Switches

Scalable Schedulers for High-Performance Switches Scalable Schedulers for High-Performance Switches Chuanjun Li and S Q Zheng Mei Yang Department of Computer Science Department of Computer Science University of Texas at Dallas Columbus State University

More information

Dynamic Scheduling Algorithm for input-queued crossbar switches

Dynamic Scheduling Algorithm for input-queued crossbar switches Dynamic Scheduling Algorithm for input-queued crossbar switches Mihir V. Shah, Mehul C. Patel, Dinesh J. Sharma, Ajay I. Trivedi Abstract Crossbars are main components of communication switches used to

More information

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues D.N. Serpanos and P.I. Antoniadis Department of Computer Science University of Crete Knossos Avenue

More information

Fair Chance Round Robin Arbiter

Fair Chance Round Robin Arbiter Fair Chance Round Robin Arbiter Prateek Karanpuria B.Tech student, ECE branch Sir Padampat Singhania University Udaipur (Raj.), India ABSTRACT With the advancement of Network-on-chip (NoC), fast and fair

More information

Matching Schemes with Captured-Frame Eligibility for Input-Queued Packet Switches

Matching Schemes with Captured-Frame Eligibility for Input-Queued Packet Switches Matching Schemes with Captured-Frame Eligibility for -Queued Packet Switches Roberto Rojas-Cessa and Chuan-bi Lin Abstract Virtual output queues (VOQs) are widely used by input-queued (IQ) switches to

More information

Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch

Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch Ziqian Dong and Roberto Rojas-Cessa Department of Electrical and Computer Engineering New Jersey Institute of Technology

More information

IV. PACKET SWITCH ARCHITECTURES

IV. PACKET SWITCH ARCHITECTURES IV. PACKET SWITCH ARCHITECTURES (a) General Concept - as packet arrives at switch, destination (and possibly source) field in packet header is used as index into routing tables specifying next switch in

More information

Designing Efficient Benes and Banyan Based Input-Buffered ATM Switches

Designing Efficient Benes and Banyan Based Input-Buffered ATM Switches Designing Efficient Benes and Banyan Based Input-Buffered ATM Switches Rajendra V. Boppana Computer Science Division The Univ. of Texas at San Antonio San Antonio, TX 829- boppana@cs.utsa.edu C. S. Raghavendra

More information

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch Eiji Oki * Zhigang Jing Roberto Rojas-Cessa H. Jonathan Chao NTT Network Service Systems Laboratories Department of Electrical Engineering

More information

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services Shared-Memory Combined -Crosspoint Buffered Packet Switch for Differentiated Services Ziqian Dong and Roberto Rojas-Cessa Department of Electrical and Computer Engineering New Jersey Institute of Technology

More information

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches : A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches Eiji Oki, Roberto Rojas-Cessa, and H. Jonathan Chao Abstract This paper proposes a pipeline-based concurrent round-robin

More information

F cepted as an approach to achieve high switching efficiency

F cepted as an approach to achieve high switching efficiency The Dual Round Robin Matching Switch with Exhaustive Service Yihan Li, Shivendra Panwar, H. Jonathan Chao AbsrmcrVirtual Output Queuing is widely used by fixedlength highspeed switches to overcome headofline

More information

K-Selector-Based Dispatching Algorithm for Clos-Network Switches

K-Selector-Based Dispatching Algorithm for Clos-Network Switches K-Selector-Based Dispatching Algorithm for Clos-Network Switches Mei Yang, Mayauna McCullough, Yingtao Jiang, and Jun Zheng Department of Electrical and Computer Engineering, University of Nevada Las Vegas,

More information

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch *

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch * Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch * K. J. Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 3360 Abstract - Input-buffered

More information

Efficient Queuing Architecture for a Buffered Crossbar Switch

Efficient Queuing Architecture for a Buffered Crossbar Switch Proceedings of the 11th WSEAS International Conference on COMMUNICATIONS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 95 Efficient Queuing Architecture for a Buffered Crossbar Switch MICHAEL

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

Design and Simulation of Router Using WWF Arbiter and Crossbar

Design and Simulation of Router Using WWF Arbiter and Crossbar Design and Simulation of Router Using WWF Arbiter and Crossbar M.Saravana Kumar, K.Rajasekar Electronics and Communication Engineering PSG College of Technology, Coimbatore, India Abstract - Packet scheduling

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services Shared-Memory Combined -Crosspoint Buffered Packet Switch for Differentiated Services Ziqian Dong and Roberto Rojas-Cessa Department of Electrical and Computer Engineering New Jersey Institute of Technology

More information

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router Scheduling algorithms Scheduling Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Scheduling: choose a packet to transmit over a link among all

More information

Introduction to ATM Technology

Introduction to ATM Technology Introduction to ATM Technology ATM Switch Design Switching network (N x N) Switching network (N x N) SP CP SP CP Presentation Outline Generic Switch Architecture Specific examples Shared Buffer Switch

More information

Selective Request Round-Robin Scheduling for VOQ Packet Switch ArchitectureI

Selective Request Round-Robin Scheduling for VOQ Packet Switch ArchitectureI This full tet paper was peer reviewed at the direction of IEEE Communications Society subject matter eperts for publication in the IEEE ICC 2011 proceedings Selective Request Round-Robin Scheduling for

More information

Buffer Sizing in a Combined Input Output Queued (CIOQ) Switch

Buffer Sizing in a Combined Input Output Queued (CIOQ) Switch Buffer Sizing in a Combined Input Output Queued (CIOQ) Switch Neda Beheshti, Nick Mckeown Stanford University Abstract In all internet routers buffers are needed to hold packets during times of congestion.

More information

Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions

Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions Anuj Kumar, Rabi N. Mahapatra Texas A&M University, College Station, U.S.A Email: {anujk, rabi}@cs.tamu.edu

More information

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Man-Ting Choy Department of Information Engineering, The Chinese University of Hong Kong mtchoy1@ie.cuhk.edu.hk

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

Router architectures: OQ and IQ switching

Router architectures: OQ and IQ switching Routers/switches architectures Andrea Bianco Telecommunication etwork Group firstname.lastname@polito.it http://www.telematica.polito.it/ Computer etwork Design - The Internet is a mesh of routers core

More information

Packet Switching Queuing Architecture: A Study

Packet Switching Queuing Architecture: A Study Packet Switching Queuing Architecture: A Study Shikhar Bahl 1, Rishabh Rai 2, Peeyush Chandra 3, Akash Garg 4 M.Tech, Department of ECE, Ajay Kumar Garg Engineering College, Ghaziabad, U.P., India 1,2,3

More information

BROADBAND PACKET SWITCHING TECHNOLOGIES

BROADBAND PACKET SWITCHING TECHNOLOGIES BROADBAND PACKET SWITCHING TECHNOLOGIES A Practical Guide to ATM Switches and IP Routers H. JONATHAN CHAO CHEUK H. LAM EMI OKI A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York / Chichester

More information

Switching. An Engineering Approach to Computer Networking

Switching. An Engineering Approach to Computer Networking Switching An Engineering Approach to Computer Networking What is it all about? How do we move traffic from one part of the network to another? Connect end-systems to switches, and switches to each other

More information

Router/switch architectures. The Internet is a mesh of routers. The Internet is a mesh of routers. Pag. 1

Router/switch architectures. The Internet is a mesh of routers. The Internet is a mesh of routers. Pag. 1 Router/switch architectures Andrea Bianco Telecommunication etwork Group firstname.lastname@polito.it http://www.telematica.polito.it/ Computer etworks Design and Management - The Internet is a mesh of

More information

Sample Routers and Switches. High Capacity Router Cisco CRS-1 up to 46 Tb/s thruput. Routers in a Network. Router Design

Sample Routers and Switches. High Capacity Router Cisco CRS-1 up to 46 Tb/s thruput. Routers in a Network. Router Design outer Design outers in a Network Overview of Generic outer Architecture Input-d Switches (outers) IP Look-up Algorithms Packet Classification Algorithms Sample outers and Switches Cisco 46 outer up to

More information

BROADBAND AND HIGH SPEED NETWORKS

BROADBAND AND HIGH SPEED NETWORKS BROADBAND AND HIGH SPEED NETWORKS ATM SWITCHING ATM is a connection-oriented transport concept An end-to-end connection (virtual channel) established prior to transfer of cells Signaling used for connection

More information

CS 552 Computer Networks

CS 552 Computer Networks CS 55 Computer Networks IP forwarding Fall 00 Rich Martin (Slides from D. Culler and N. McKeown) Position Paper Goals: Practice writing to convince others Research an interesting topic related to networking.

More information

The Network Layer and Routers

The Network Layer and Routers The Network Layer and Routers Daniel Zappala CS 460 Computer Networking Brigham Young University 2/18 Network Layer deliver packets from sending host to receiving host must be on every host, router in

More information

ECE 697J Advanced Topics in Computer Networks

ECE 697J Advanced Topics in Computer Networks ECE 697J Advanced Topics in Computer Networks Switching Fabrics 10/02/03 Tilman Wolf 1 Router Data Path Last class: Single CPU is not fast enough for processing packets Multiple advanced processors in

More information

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1, Topics for Today Network Layer Introduction Addressing Address Resolution Readings Sections 5.1, 5.6.1-5.6.2 1 Network Layer: Introduction A network-wide concern! Transport layer Between two end hosts

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing

More information

Matrix Unit Cell Scheduler (MUCS) for. Input-Buered ATM Switches. Haoran Duan, John W. Lockwood, and Sung Mo Kang

Matrix Unit Cell Scheduler (MUCS) for. Input-Buered ATM Switches. Haoran Duan, John W. Lockwood, and Sung Mo Kang Matrix Unit Cell Scheduler (MUCS) for Input-Buered ATM Switches Haoran Duan, John W. Lockwood, and Sung Mo Kang University of Illinois at Urbana{Champaign Department of Electrical and Computer Engineering

More information

Providing Flow Based Performance Guarantees for Buffered Crossbar Switches

Providing Flow Based Performance Guarantees for Buffered Crossbar Switches Providing Flow Based Performance Guarantees for Buffered Crossbar Switches Deng Pan Dept. of Electrical & Computer Engineering Florida International University Miami, Florida 33174, USA pand@fiu.edu Yuanyuan

More information

6.2 Per-Flow Queueing and Flow Control

6.2 Per-Flow Queueing and Flow Control 6.2 Per-Flow Queueing and Flow Control Indiscriminate flow control causes local congestion (output contention) to adversely affect other, unrelated flows indiscriminate flow control causes phenomena analogous

More information

Designing of Efficient islip Arbiter using islip Scheduling Algorithm for NoC

Designing of Efficient islip Arbiter using islip Scheduling Algorithm for NoC International Journal of Scientific and Research Publications, Volume 3, Issue 12, December 2013 1 Designing of Efficient islip Arbiter using islip Scheduling Algorithm for NoC Deepali Mahobiya Department

More information

Throughput Analysis of Shared-Memory Crosspoint. Buffered Packet Switches

Throughput Analysis of Shared-Memory Crosspoint. Buffered Packet Switches Throughput Analysis of Shared-Memory Crosspoint Buffered Packet Switches Ziqian Dong and Roberto Rojas-Cessa Abstract This paper presents a theoretical throughput analysis of two buffered-crossbar switches,

More information

A Four-Terabit Single-Stage Packet Switch with Large. Round-Trip Time Support. F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I.

A Four-Terabit Single-Stage Packet Switch with Large. Round-Trip Time Support. F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I. A Four-Terabit Single-Stage Packet Switch with Large Round-Trip Time Support F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I. Iliadis IBM Research, Zurich Research Laboratory, CH-8803 Ruschlikon, Switzerland

More information

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach Topic 4a Router Operation and Scheduling Ch4: Network Layer: The Data Plane Computer Networking: A Top Down Approach 7 th edition Jim Kurose, Keith Ross Pearson/Addison Wesley April 2016 4-1 Chapter 4:

More information

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX

More information

A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling

A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling Lotfi Mhamdi Computer Engineering Laboratory TU Delft, The etherlands lotfi@ce.et.tudelft.nl Abstract The crossbar fabric

More information

Lecture 16: Router Design

Lecture 16: Router Design Lecture 16: Router Design CSE 123: Computer Networks Alex C. Snoeren Eample courtesy Mike Freedman Lecture 16 Overview End-to-end lookup and forwarding example Router internals Buffering Scheduling 2 Example:

More information

Themes. The Network 1. Energy in the DC: ~15% network? Energy by Technology

Themes. The Network 1. Energy in the DC: ~15% network? Energy by Technology Themes The Network 1 Low Power Computing David Andersen Carnegie Mellon University Last two classes: Saving power by running more slowly and sleeping more. This time: Network intro; saving power by architecting

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

MULTICAST is an operation to transmit information from

MULTICAST is an operation to transmit information from IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 10, OCTOBER 2005 1283 FIFO-Based Multicast Scheduling Algorithm for Virtual Output Queued Packet Switches Deng Pan, Student Member, IEEE, and Yuanyuan Yang,

More information

A Fast Switched Backplane for a Gigabit Switched Router

A Fast Switched Backplane for a Gigabit Switched Router The original version of this paper appears in Business Communication Review. WHITE PAPER: A Fast Switched Backplane for a Gigabit Switched Router by Nick McKeown (tel: 650/725-3641; fax: 650/725-6949;

More information

Title: Implementation of High Performance Buffer Manager for an Advanced Input-Queued Switch Fabric

Title: Implementation of High Performance Buffer Manager for an Advanced Input-Queued Switch Fabric Title: Implementation of High Performance Buffer Manager for an Advanced Input-Queued Switch Fabric Authors: Kyongju University, Hyohyundong, Kyongju, 0- KOREA +-4-0-3 Jung-Hee Lee IP Switching Team, ETRI,

More information

Int. J. Advanced Networking and Applications 1194 Volume: 03; Issue: 03; Pages: (2011)

Int. J. Advanced Networking and Applications 1194 Volume: 03; Issue: 03; Pages: (2011) Int J Advanced Networking and Applications 1194 ISA-Independent Scheduling Algorithm for Buffered Crossbar Switch Dr Kannan Balasubramanian Department of Computer Science, Mepco Schlenk Engineering College,

More information

Hierarchical Scheduling for DiffServ Classes

Hierarchical Scheduling for DiffServ Classes Hierarchical Scheduling for DiffServ Classes Mei Yang, Jianping Wang, Enyue Lu, S Q Zheng Department of Electrical and Computer Engineering, University of evada Las Vegas, Las Vegas, V 8954 Department

More information

Crossbar - example. Crossbar. Crossbar. Combination: Time-space switching. Simple space-division switch Crosspoints can be turned on or off

Crossbar - example. Crossbar. Crossbar. Combination: Time-space switching. Simple space-division switch Crosspoints can be turned on or off Crossbar Crossbar - example Simple space-division switch Crosspoints can be turned on or off i n p u t s sessions: (,) (,) (,) (,) outputs Crossbar Advantages: simple to implement simple control flexible

More information

Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai

Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai Routers.. A router consists - A set of input interfaces at which packets arrive - A set of output interfaces from which

More information

IMPLEMENTATION OF CONGESTION CONTROL MECHANISMS USING OPNET

IMPLEMENTATION OF CONGESTION CONTROL MECHANISMS USING OPNET Nazy Alborz IMPLEMENTATION OF CONGESTION CONTROL MECHANISMS USING OPNET TM Communication Networks Laboratory School of Engineering Science Simon Fraser University Road map Introduction to congestion control

More information

Network Model for Delay-Sensitive Traffic

Network Model for Delay-Sensitive Traffic Traffic Scheduling Network Model for Delay-Sensitive Traffic Source Switch Switch Destination Flow Shaper Policer (optional) Scheduler + optional shaper Policer (optional) Scheduler + optional shaper cfla.

More information

A Proposal for a High Speed Multicast Switch Fabric Design

A Proposal for a High Speed Multicast Switch Fabric Design A Proposal for a High Speed Multicast Switch Fabric Design Cheng Li, R.Venkatesan and H.M.Heys Faculty of Engineering and Applied Science Memorial University of Newfoundland St. John s, NF, Canada AB X

More information

Efficient Multicast Support in Buffered Crossbars using Networks on Chip

Efficient Multicast Support in Buffered Crossbars using Networks on Chip Efficient Multicast Support in Buffered Crossbars using etworks on Chip Iria Varela Senin Lotfi Mhamdi Kees Goossens, Computer Engineering, Delft University of Technology, Delft, The etherlands XP Semiconductors,

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

EE 122: Router Design

EE 122: Router Design Routers EE 22: Router Design Kevin Lai September 25, 2002.. A router consists - A set of input interfaces at which packets arrive - A set of output interfaces from which packets depart - Some form of interconnect

More information

LS Example 5 3 C 5 A 1 D

LS Example 5 3 C 5 A 1 D Lecture 10 LS Example 5 2 B 3 C 5 1 A 1 D 2 3 1 1 E 2 F G Itrn M B Path C Path D Path E Path F Path G Path 1 {A} 2 A-B 5 A-C 1 A-D Inf. Inf. 1 A-G 2 {A,D} 2 A-B 4 A-D-C 1 A-D 2 A-D-E Inf. 1 A-G 3 {A,D,G}

More information

Parallel Read-Out High-Speed Input Buffer ATM Switch Architectures

Parallel Read-Out High-Speed Input Buffer ATM Switch Architectures Parallel Read-Out High-Speed Input Buffer ATM Switch Architectures 95 Chugo Fujihashi and Junji Ichikawa The prototype first in first out (FIFO) input buffer asynchronous transfer mode (ATM) switch is

More information

CSE 3214: Computer Network Protocols and Applications Network Layer

CSE 3214: Computer Network Protocols and Applications Network Layer CSE 314: Computer Network Protocols and Applications Network Layer Dr. Peter Lian, Professor Department of Computer Science and Engineering York University Email: peterlian@cse.yorku.ca Office: 101C Lassonde

More information

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13 Input Port Functions Routers: Forwarding EECS 22: Lecture 3 epartment of Electrical Engineering and Computer Sciences University of California Berkeley Physical layer: bit-level reception ata link layer:

More information

Globecom. IEEE Conference and Exhibition. Copyright IEEE.

Globecom. IEEE Conference and Exhibition. Copyright IEEE. Title FTMS: an efficient multicast scheduling algorithm for feedbackbased two-stage switch Author(s) He, C; Hu, B; Yeung, LK Citation The 2012 IEEE Global Communications Conference (GLOBECOM 2012), Anaheim,

More information

Algorithm-Hardware Codesign of Fast Parallel Round-Robin Arbiters

Algorithm-Hardware Codesign of Fast Parallel Round-Robin Arbiters Algorithm-Hardware Codesign of Fast Parallel Round-Robin Arbiters S. Q. Zheng and Mei Yang Department of Computer Science University of Texas at Dallas, Richardson, T 7583-688, USA Department of Electrical

More information

A distributed architecture of IP routers

A distributed architecture of IP routers A distributed architecture of IP routers Tasho Shukerski, Vladimir Lazarov, Ivan Kanev Abstract: The paper discusses the problems relevant to the design of IP (Internet Protocol) routers or Layer3 switches

More information

Adaptive Linear Prediction of Queues for Reduced Rate Scheduling in Optical Routers

Adaptive Linear Prediction of Queues for Reduced Rate Scheduling in Optical Routers Adaptive Linear Prediction of Queues for Reduced Rate Scheduling in Optical Routers Yang Jiao and Ritesh Madan EE 384Y Final Project Stanford University Abstract This paper describes a switching scheme

More information

CSC 401 Data and Computer Communications Networks

CSC 401 Data and Computer Communications Networks CSC 401 Data and Computer Communications Networks Network Layer Overview, Router Design, IP Sec 4.1. 4.2 and 4.3 Prof. Lina Battestilli Fall 2017 Chapter 4: Network Layer, Data Plane chapter goals: understand

More information

Scaling routers: Where do we go from here?

Scaling routers: Where do we go from here? Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm

More information

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits

More information

A distributed memory management for high speed switch fabrics

A distributed memory management for high speed switch fabrics A distributed memory management for high speed switch fabrics MEYSAM ROODI +,ALI MOHAMMAD ZAREH BIDOKI +, NASSER YAZDANI +, HOSSAIN KALANTARI +,HADI KHANI ++ and ASGHAR TAJODDIN + + Router Lab, ECE Department,

More information

Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks

Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks Yuhua Chen Jonathan S. Turner Department of Electrical Engineering Department of Computer Science Washington University Washington University

More information

ATM Switches. Switching Technology S ATM switches

ATM Switches. Switching Technology S ATM switches ATM Switches Switching Technology S38.65 http://www.netlab.hut.fi/opetus/s3865 9 - ATM switches General of ATM switching Structure of an ATM switch Example switch implementations Knockout switch Abacus

More information

Chapter 4 Network Layer: The Data Plane

Chapter 4 Network Layer: The Data Plane Chapter 4 Network Layer: The Data Plane A note on the use of these Powerpoint slides: We re making these slides freely available to all (faculty, students, readers). They re in PowerPoint form so you see

More information

Stop-and-Go Service Using Hierarchical Round Robin

Stop-and-Go Service Using Hierarchical Round Robin Stop-and-Go Service Using Hierarchical Round Robin S. Keshav AT&T Bell Laboratories 600 Mountain Avenue, Murray Hill, NJ 07974, USA keshav@research.att.com Abstract The Stop-and-Go service discipline allows

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including Introduction Router Architectures Recent advances in routing architecture including specialized hardware switching fabrics efficient and faster lookup algorithms have created routers that are capable of

More information

CSC 4900 Computer Networks: Network Layer

CSC 4900 Computer Networks: Network Layer CSC 4900 Computer Networks: Network Layer Professor Henry Carter Fall 2017 Villanova University Department of Computing Sciences Review What is AIMD? When do we use it? What is the steady state profile

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Analysis of Power Consumption on Switch Fabrics in Network Routers

Analysis of Power Consumption on Switch Fabrics in Network Routers Analysis of Power Consumption on Switch Fabrics in Network Routers Terry Tao Ye Computer Systems Lab Stanford University taoye@stanford.edu Luca Benini DEIS University of Bologna lbenini@deis.unibo.it

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13 Routers: Forwarding EECS 122: Lecture 13 epartment of Electrical Engineering and Computer Sciences University of California Berkeley Router Architecture Overview Two key router functions: run routing algorithms/protocol

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

4. Time-Space Switching and the family of Input Queueing Architectures

4. Time-Space Switching and the family of Input Queueing Architectures 4. Time-Space Switching and the family of Input Queueing Architectures 4.1 Intro: Time-Space Sw.; Input Q ing w/o VOQ s 4.2 Input Queueing with VOQ s: Crossbar Scheduling 4.3 Adv.Topics: Pipelined, Packet,

More information

Buffered Packet Switch

Buffered Packet Switch Flow Control in a Multi-Plane Multi-Stage Buffered Packet Switch H. Jonathan Chao Department ofelectrical and Computer Engineering Polytechnic University, Brooklyn, NY 11201 chao@poly.edu Abstract- A large-capacity,

More information

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS THE UNIVERSITY OF NAIROBI DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT. PROJECT NO. 60 PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS OMARI JAPHETH N. F17/2157/2004 SUPERVISOR:

More information

Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches

Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches Roberto Rojas-Cessa, Member, IEEE, Eiji Oki, Member, IEEE, and H. Jonathan Chao, Fellow, IEEE Abstract The scalability

More information

Medical cloud platform for efficient flow control technology Fan Xincan 1, a

Medical cloud platform for efficient flow control technology Fan Xincan 1, a Medical cloud platform for efficient flow control technology Fan Xincan 1, a 1 school of computer engineering, Shenzhen PolyTechnic, Shenzhen 518055,China a horse_fxc@163.com Keywords:Medical cloud platform;

More information

Quality of Service (QoS)

Quality of Service (QoS) Quality of Service (QoS) The Internet was originally designed for best-effort service without guarantee of predictable performance. Best-effort service is often sufficient for a traffic that is not sensitive

More information

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features UDC 621.395.31:681.3 High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features VTsuneo Katsuyama VAkira Hakata VMasafumi Katoh VAkira Takeyama (Manuscript received February 27, 2001)

More information

Integration of Look-Ahead Multicast and Unicast Scheduling for Input-Queued Cell Switches

Integration of Look-Ahead Multicast and Unicast Scheduling for Input-Queued Cell Switches 22 3th nternational Conference on High Performance Switching and Routing ntegration of Look-Ahead Multicast and Unicast Scheduling for nput-queued Cell Switches Hao Yu, Member,, Sarah Ruepp, Member,, Michael

More information

TOC: Switching & Forwarding

TOC: Switching & Forwarding Walrand Lecture TO: Switching & Forwarding Lecture Switching & Forwarding EES University of alifornia Berkeley Why? Switching Techniques Switch haracteristics Switch Examples Switch rchitectures Summary

More information

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview Chapter 4: chapter goals: understand principles behind services service models forwarding versus routing how a router works generalized forwarding instantiation, implementation in the Internet 4- Network

More information

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11 CMPE 150/L : Introduction to Computer Networks Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11 1 Midterm exam Midterm this Thursday Close book but one-side 8.5"x11" note is allowed (must

More information

On Scheduling Unicast and Multicast Traffic in High Speed Routers

On Scheduling Unicast and Multicast Traffic in High Speed Routers On Scheduling Unicast and Multicast Traffic in High Speed Routers Kwan-Wu Chin School of Electrical, Computer and Telecommunications Engineering University of Wollongong kwanwu@uow.edu.au Abstract Researchers

More information