Dynamic Scheduling Algorithm for input-queued crossbar switches

Similar documents
Scalable Schedulers for High-Performance Switches

Matching Schemes with Captured-Frame Eligibility for Input-Queued Packet Switches

Long Round-Trip Time Support with Shared-Memory Crosspoint Buffered Packet Switch

Integrated Scheduling and Buffer Management Scheme for Input Queued Switches under Extreme Traffic Conditions

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues

Selective Request Round-Robin Scheduling for VOQ Packet Switch ArchitectureI

Design and Evaluation of a Parallel-Polled Virtual Output Queued Switch *

Efficient Queuing Architecture for a Buffered Crossbar Switch

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services

On Scheduling Unicast and Multicast Traffic in High Speed Routers

Designing Efficient Benes and Banyan Based Input-Buffered ATM Switches

Router architectures: OQ and IQ switching

Router/switch architectures. The Internet is a mesh of routers. The Internet is a mesh of routers. Pag. 1

Throughput Analysis of Shared-Memory Crosspoint. Buffered Packet Switches

Shared-Memory Combined Input-Crosspoint Buffered Packet Switch for Differentiated Services

Implementation and Evaluation of Large Interconnection Routers for Future Many-Core Networks on Chip

A New Integrated Unicast/Multicast Scheduler for Input-Queued Switches

MULTICAST is an operation to transmit information from

Design and Simulation of Router Using WWF Arbiter and Crossbar

F cepted as an approach to achieve high switching efficiency

K-Selector-Based Dispatching Algorithm for Clos-Network Switches

A Partially Buffered Crossbar Packet Switching Architecture and its Scheduling

Fair Chance Round Robin Arbiter

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches

[29] Newman, P.; Lyon, T.; and Minshall, G.; Flow Labelled IP: A Connectionless Approach to ATM, Proceedings of IEEE INFOCOM 96, San Francisco,

166 The International Arab Journal of Information Technology, Vol. 3, o. 2, April 2006 Modified Round Robin algorithms and discuss their relative meri

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

Efficient Multicast Support in Buffered Crossbars using Networks on Chip

The GLIMPS Terabit Packet Switching Engine

Network Model for Delay-Sensitive Traffic

Int. J. Advanced Networking and Applications 1194 Volume: 03; Issue: 03; Pages: (2011)

IV. PACKET SWITCH ARCHITECTURES

Matrix Unit Cell Scheduler (MUCS) for. Input-Buered ATM Switches. Haoran Duan, John W. Lockwood, and Sung Mo Kang

Packet Switching Queuing Architecture: A Study

Doubling Memory Bandwidth for Network Buffers

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch

An Enhanced Dynamic Packet Buffer Management

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

On Achieving Throughput in an Input-Queued Switch

A Novel Feedback-based Two-stage Switch Architecture

A FAST ARBITRATION SCHEME FOR TERABIT PACKET SWITCHES

The Arbitration Problem

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Switching Using Parallel Input Output Queued Switches With No Speedup

Introduction to ATM Technology

Providing Flow Based Performance Guarantees for Buffered Crossbar Switches

Crossbar - example. Crossbar. Crossbar. Combination: Time-space switching. Simple space-division switch Crosspoints can be turned on or off

High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch.

LS Example 5 3 C 5 A 1 D

Quality of Service (QoS)

Switch Architecture for Efficient Transfer of High-Volume Data in Distributed Computing Environment

BROADBAND AND HIGH SPEED NETWORKS

CS 552 Computer Networks

Integration of Look-Ahead Multicast and Unicast Scheduling for Input-Queued Cell Switches

EP2210 Scheduling. Lecture material:

Design and Evaluation of the Combined Input and Crossbar Queued (CICQ) Switch. Kenji Yoshigoe

A Fast Switched Backplane for a Gigabit Switched Router

Routing, Routers, Switching Fabrics

Performance Comparison of OTDM and OBS Scheduling for Agile All-Photonic Network

Hierarchical Scheduling for DiffServ Classes

A Proposal for a High Speed Multicast Switch Fabric Design

Using Traffic Models in Switch Scheduling

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Design of a Tile-based High-Radix Switch with High Throughput

DISA: A Robust Scheduling Algorithm for Scalable Crosspoint-Based Switch Fabrics

HARDWARE IMPLEMENTATION OF A HIGH-SPEED SYMMETRIC CROSSBAR SWITCH

Lecture 16: Router Design

Switch Fabric on a Reconfigurable Chip using an Accelerated Packet Placement Architecture

Comparative Study of blocking mechanisms for Packet Switched Omega Networks

Priority-aware Scheduling for Packet Switched Optical Networks in Datacenter

Designing of Efficient islip Arbiter using islip Scheduling Algorithm for NoC

Fixed-Length Switching vs. Variable-Length Switching in Input-Queued IP Switches

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP

Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router

A Modification to RED AQM for CIOQ Switches

Scheduling Algorithms for Input-Queued Cell Switches. Nicholas William McKeown

Unit 2 Packet Switching Networks - II

Maximum Weight Matching Dispatching Scheme in Buffered Clos-network Packet Switches

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches

A Scalable Frame-based Multi-Crosspoint Packet Switching Architecture

High-performance switching based on buffered crossbar fabrics

Performance Modeling of Queuing Techniques for Enhance QoS Support for Uniform and Exponential VoIP & Video based Traffic Distribution

TOC: Switching & Forwarding

TOC: Switching & Forwarding

Parallel Packet Switching using Multiplexors with Virtual Input Queues

Asynchronous vs Synchronous Input-Queued Switches

Multicast Scheduling in WDM Switching Networks

Buffer Sizing in a Combined Input Output Queued (CIOQ) Switch

1 Architectures of Internet Switches and Routers

The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks

A General Purpose Queue Architecture for an ATM Switch

THE ASYNCHRONOUS transfer mode (ATM) has been

Buffered Crossbar based Parallel Packet Switch

Resource Sharing for QoS in Agile All Photonic Networks

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

PACKET HANDLING SCHEDULING IN MULTIPLE ROUTING CONFIGURATIONS FOR FAST IP NETWORK RECOVERY

Exact and Approximate Analytical Modeling of an FLBM-Based All-Optical Packet Switch

Buffered Packet Switch

Transcription:

Dynamic Scheduling Algorithm for input-queued crossbar switches Mihir V. Shah, Mehul C. Patel, Dinesh J. Sharma, Ajay I. Trivedi Abstract Crossbars are main components of communication switches used to construct interconnection networks. Scheduling algorithm controls contention in switch architecture. Several scheduling algorithms were proposed for input-queued crossbar switch architectures. This paper suggests a Dynamic Scheduling Algorithm (). This algorithm changes the priority rotation dynamically based on two parameters: queue occupancy and quality of service of input and output. efficiently utilizes the buffers and at the same time gives good service to the selected inputs and outputs. The simulation results show that saves loss of the cells due to buffer overflow and thus increases the throughput by 2% to 4% compared to its counter part. reduces the latency for prescribed Quality of Service class input output and increases the average latency. Index Terms : Crossbar switches, QoS (Quality of Service), Scheduler, VOQ (Virtual output Queuing) I. INTRODUCTION Crossbars (NxN) switches play a major role in the design of high speed interconnection networks. Architecture of a single-stage, non-blocking switch fabric has three basic parts as shown in figure 2. 1. Queue Structure Input queuing is increasingly used for high bandwidth switches and routers. If an input-queued switch employs a single FIFO queue at each input, as shown in figure 1 then Head of Line blocking (HOL) problem limits the throughput to 58.6% [3]. In figure 1 HOL blocking occurs when a cell at the head of queue waiting for a busy output, blocks a cell behind it that is destined to an idle output. To solve the HOL problem,voq is used. [2], [6], [8]. Figure 1. An input buffered single FIFO queue switch. Each input maintains N FIFO VOQs, one for each output. Q ij denotes the VOQ at input i containing cells destined to output j as shown in figure 2. 2. Crosspoint Switch (Switch Fabric) A crossbar switch can transfer cells between multiple ports simultaneously by operating multiple cross points. 3. Scheduler It is a clever device selecting configurations for the switch fabric to transfer cells from inputs to outputs. Scheduler manages cell transfer and solves contentions within the switching fabric. Switch throughput can be increased from 58.6% to % with a centralized scheduling algorithm [6]. INPUT QUEUES Manuscript received February 23, 27. Mihir V Shah is with EC department, L.D.College of Engineering, Ahmedabad 315,INDIA. He is also a Ph.D Student at Electrical Department, Faculty of technology and Engineering, kalabhavan, M.S. University, Vadodara, INDIA (mihirec@gmail.com). Mehul C. Patel is with EC department, L.D.College of Engineering, Ahmedabad 315, INDIA (mehulpatel2222@gmail.com). Dinesh J. Sharma is with L&T Limited, Vadodara, INDIA. (dinesh_sharma39@yahoo.co.in ). Ajay I. Trivedi is with Electrical Engineering Department,Faculty of Technology and Engineering, Kalabhavan, M.S. University of Baroda, Vadodara, INDIA. (aitrivra@ieee.org) Figure 2. NxN crossbar switch with VOQ structure

In this implementation, it is assumed that the fabric handles fixed size cells but variable sized cells can also be routed by using some sort of cell dissembler and assembler. Each input port competes for multiple output ports but can communicate with only one at a time. Likewise, each output port vies for multiple input ports but can communicate with only one at a time. Hence, scheduling process is symmetric with respect to inputs and outputs. The scheduling problem for NxN crossbar switch with VOQ is defined as follows: for each input port i, there are n request lines going to the scheduler: R i,1 ; R i,2 ; ; ; R i,n. R i,j = 1 indicates that there is a request from input port i to switch a cell to output port j. Correspondingly, there are n grant lines produced by the scheduler: G i,1 ; G i,2 ; ; ; ; G i,n. G i,j = 1 means that the scheduler has granted the request from input port i to switch a cell to output port j depending on the scheduling algorithm. To ensure that each input delivers at most one cell into the crossbar fabric, and that each output receives at most one cell, the conditions Σ n j =1 G i,j 1; for 1 i n and Σ n i =1 G i,j 1; for 1 j n must hold, respectively. [4] The paper presents the simulation of 4x4, 8x8, 16x16, and 32x32 crossbar switches with Round Robin Matching (),, Rectilinear Propagation Architecture(), Diagonal Propagation Architecture(), Modified Diagonal Propagation Architecture( m-) and Dynamic Scheduling Architecture(), scheduling algorithms. Paper also presents the design and implementation of 4x4 and 8x8 crossbar switches with, m- and scheduling algorithms. Rest of the paper first discusses related work in section 2, and then describes algorithm in detail in section 3, simulations in section 4, and implementations in section 5. Section 6 describes conclusion. Section 7 presents our references. II. RELATED WORK A. architecture Diagonal Propagation Architecture was design by Hurt, A. May, X. Zhu, and B. Lin as shown in figure 3.[4] The N cross points of any NxN crossbar switch fabric are guaranteed not to conflict since they are all on different rows and columns. If the scheduling process begins with all N cross points lined upon a diagonal, then granting one of them cannot disable granting others because they are independent of each other. As shown in figure 3 for 4x4 crossbar switch case, there are 4 such diagonal groups: G1: (1, 1); (4, 2); (3, 3); (2, 4) G2: (2, 1); (1, 2); (4, 3); (3, 4) G3: (3, 1); (2, 2); (1, 3); (4, 4) G4: (4, 1); (3, 2); (2, 3); (1, 4) At a time, N cross points are switched in. So, arbitration cycle of will finish in nearly NT time units. In there are N possible priorities. Normal rotates the priority in round robin fashion. Figure 3 architecture. B. Modified architecture (m-) Stage It is a modification of in which instead of rotating the priority order in round robin fashion, m- rotates it depending upon queue occupancy of the diagonal group. Suppose buffer size in each VOQ is k and if buffer size of any cell reaches k then that cell enters into red-zone. During each time slot, numbers of cells whose buffers are in the red-zone for each diagonal group are calculated and priority is given to the group with maximum number of cells having red-zone buffers. In case if diagonal groups having equal number of red-zone cells are found, then priority vectors will rotate as usual in round robin fashion. Thus, the most urgent diagonal group which is likely to loose the packet is routed through the switch first, and so on. This way, scheme for priority order selection has a propensity towards saving cell loss in case of buffer overflows. Intuitively, benefit of the scheme will be puffed-up in cases of unevenly distributed and bursty traffics. Hence, throughput of the crossbar switch will be increased. This gain comes from the very basic fact that the hoard of every cell augments throughput. [9] III. DYNAMIC SCHEDULING ALGORITHM This paper proposes a Dynamic Scheduling Algorithm,. The priority order is varied dynamically with respect to following corresponding parameters: 1. Queue occupancy ( buffer size of each VOQ) 2. Quality of Service (QoS)

socialistic approach. This dual behavior (normally capitalistic and after buffer is full socialistic) of makes it more dynamic and robust to different traffic models. IV. SIMULATION RESULTS,,,, m- and scheduling algorithms are simulated for 4x4, 8x8, 16x16, and 32x32 crossbar switches with four different traffic models ( A,B,C and D) using MATLAB 7.. For each traffic model, algorithms were simulated for time slots and results were taken by averaging the outcomes for times. A. With i.i.d. Bernoulli arrivals and uniformly distributed destinations. B. With i.i.d. Bernoulli arrivals and non-uniformly distributed destinations. The effect of burstiness on using an on-off arrival process is illustrated in traffic pattern C and D. C. With bursty arrivals and uniformly distributed destinations. D. With bursty arrivals and non-uniformly distributed destinations. Figure 4. Scheduler Architecture Figure 4 depicts the architecture of scheduler. QoS of a cross-point is obtained by summing QoS of corresponding input (Q in )and output(q out ) lines. Buffer status signal of a VOQ indicates the present percentage occupancy of the VOQ in terms of number of packets. It is multiplied by a variable factor (say K). This feature imparts dual behavior to the. There are two cases; viz. BS < % and BS = %, for which K may assume one of the values given in table-i. Table I: Values of K Case K Behavior BS < % < 2 Strong capitalistic = 2 Fair > 2 Socialistic BS = % > 2 Socialistic and Cautious For simulation and analysis purposes, it is assumed that K is 2 when BS was less than % and is 4 when BS was equal to %. That is, when queues are not overflowing, QoS and queue occupancy enjoy equal weightage (k=2). Thus, decides priority in a dynamic order determined by QoS and queue occupancy. When any of the queues is fully occupied, i.e. it has reached into red zone [9] and is likely to loose packets, then; weightage of queue occupancy for that particular queue is doubled (k=4) compared to weightage of QoS. That is, takes up a cautious step to avoid packet loss and to decrease packet latency by discouraging QoS and select 75 Uniformly Distributed m- Figure 5. Throughput increments (%) in 4x4 switch As shown in figure 5, throughput of as a function of offered load increases by 1% to 4% compared to and m- for traffic model A. For high QoS input/output throughput efficiency reaches at nearly 99% as shown in figure 8. As shown in figure 6, average latency of as a function of offered load increases by to 22 timeslots compared to and m-, but average latency of high QOS input/output drastically reduces as shown in figure 7.

45 4 35 3 25 2 15 Non-uniformly Distributed m- 75 With Bursty Arrivals Uniformly Distributed PIM m- 5 3 25 2 15 5 Figure 6. Average latency in timeslots in 4x4 switch Uniformly Distributed m- High Qos Input-Output() 1 2 3 4 5 6 7 8 Buffer Size Figure 9 Throughput efficiency v/s buffer size in 4x4 switch 16 14 12 8 6 4 With Bursty Arrivals Non-uniformly Distributed m- Figure 7 Average latency in timeslots in 4x4 switch (High QoS input/output) 2 1 2 3 4 5 6 7 8 Buffer Size Figure. Average latency in timeslots in 4x4 switch Uniformly Distributed 75 m- High Qos Input-Output () Figure 8 Throughput efficiency v/s offered load for in 4x4 switch(high QoS input/output ) Uniformly Distributed 6 5 m- 4 Figure 11 Throughput v/s offered load in 8x8 switch.

As shown in figure 9, throughput of as a function of buffer size increases by 1% to 3% compared to and m- for traffic model C. As shown in figure, average latency of as a function of buffer size increases with buffer size. As shown in figure 11, throughput of as a function of offered load increases by 1% to 3% compared to and m- for traffic model A for 8x8 switch. 18 16 14 12 With Bursty Arrivals Non-uniformly Distributed 8 6 4 m- 2 1 2 3 4 5 6 7 8 Buffer Size Figure 12. Average latency v/s buffer size in 8x8 switch As shown in figure 12, average latency of as a function of buffer size increases by 5 to 8 timeslots compared to and m- in 8x8 switch. 6 5 4 3 2 Bernoulli arrivals uniformly distributed m- Figure 14. Average latency in timeslots in 16x16 switch As shown in figure 14, average latency of as a function of offered load increases by 15 to 35 timeslots compared to and m- in 16x16 switch. V. IMPLEMENTATION 4x4 and 8x8 ATM crossbar switches with, m- and algorithms are implemented using ALTERA S QUARTUS II tool in EP2k15EFC33-2x device. Based on the implementations, Table II and III present the area analyses of, m- and. Table II: / m- area analysis for 4x4 ATM switch m- Logic cells 6299 6487 67 Bernoulli arrivals uniformly distributed destinations 75 m- Figure 13. Throughput v/s offered load in 16x16 switch As shown in figure 13, throughput of as a function of offered load increases by 2% to 5% compared to and m- for traffic model A for 16x16 switch. Table III: / m area analysis for 8x8 ATM switch m- Logic cells 2491 24498 25558 VI. CONCLUSION A new dynamic scheduling algorithm, solving the symmetric scheduling problem with VOQ-based input queued crossbar switches was introduced and results from computer simulations and VHDL implementation were shown to prove the effectiveness of the algorithm in terms of rise in throughput efficiency by 2% to 5% at the cost of latency. It can be seen that the benefits of are more pronounced with high QoS inputs outputs.

VII. REFERENCES [1] F. A. Tobagi, Fast cell switch architectures for broadband integrated services digital networks, Proc. of the IEEE, vol. 78, January 19, pp. 133-178. [2] T. E. Anderson, S. S. Owicki, J. B. Saxe, and C. P. Thacker, High speed switch scheduling for local area networks, ACM Transactions on Computer Systems, pp. 319-352, November 1993. [3] M. J. Karol, M. G. Hluchyj and S. P. Morgan, Input vs. output queueing on a space-division cell switch, IEEE Transaction on Communications, Vol. 35, No. 12, pp. 1347-1356, 1987. [4] J. Hurt, A. May, X. Zhu, and B. Lin, Design and implementation of high-speed symmetric crossbar schedulers, Proc. IEEE International Conference on Communications (ICC 99), Vancouver, Canada, June 1999, pp. 253-258. [5] A. Mekkittikul and N. McKeown, A practical scheduling algorithm to achieve % throughput in input-queued switches, Proc. IEEE INFOCOM 1998, vol. 2, Apr. 1998, San Francisco, pp. 792-799. [6] N. McKeown, V. Anamtharam, and J. Warland, Achieving % throughput in an input-queued switch, Proc. INFOCOM 96, San Francisco, March 1996, pp. 296-32. [7] H. S. Chi and Y. Tamir, Decomposed arbiters for large crossbars with multi-queue input buffers Proc. of International Conference on Computer Design, Cambridge, Massachusetts, October 1991, pp. 233-238. [8] Y. Tamir and H.C. Chi, Symmetric crossbar arbiters for VLSI communication switches, IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 1, pp. 13-27, January 1993. [9] Shah M.V., Sharma D.J, Trivedi A.I., Modified Algorithm for High Speed Symmetric Crossbar Switch proceeding of the international Conference on next generation networks cp25.1-cp25.5,february 26. [] Etamar Elhanany A comparative view of the GLIMPS Sheduling algorithms teracross, October 22. [11] Etamar Elhanany, O. Beeri The Glimpse terabit switching engine, February 22.