QUEUEING ANALYSIS OF BUFFERED SWITCHING NETWORKS. JONATHAN S. TURNER fellow, ieee. a class of systems that cannot be analyzed directly using

Similar documents
Accurate modelling of the queueing behaviour of shared bu!er ATM switches

Analytical Modeling of Routing Algorithms in. Virtual Cut-Through Networks. Real-Time Computing Laboratory. Electrical Engineering & Computer Science

Simulation of an ATM{FDDI Gateway. Milind M. Buddhikot Sanjay Kapoor Gurudatta M. Parulkar

MAINTAINING HIGH THROUGHPUT. Jonathan S. Turner. Washington University. be used to make routing and call acceptance decisions

n = 2 n = 1 µ λ n = 0

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar. Computer Science Department and UMIACS

Abstract. statistical trac models, motivating a renewed interest in nonblocking networks.

Networks. Wu-chang Fengy Dilip D. Kandlurz Debanjan Sahaz Kang G. Shiny. Ann Arbor, MI Yorktown Heights, NY 10598

Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar. Abstract

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4

An accurate performance model of shared buffer ATM switches under hot spot traffic

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science

Fixed-Length Packets versus Variable-Length Packets. in Fast Packet Switching Networks. Andrew Shaw. 3 March Abstract

Performance Comparison Between AAL1, AAL2 and AAL5

perform well on paths including satellite links. It is important to verify how the two ATM data services perform on satellite links. TCP is the most p

PE PE PE. Network Interface. Processor Pipeline + Register Windows. Thread Management & Scheduling. N e t w o r k P o r t s.

Worst-case running time for RANDOMIZED-SELECT

Preliminary results from an agent-based adaptation of friendship games

IV. PACKET SWITCH ARCHITECTURES

AN OPTIMAL NONBLOCKING MULTICAST. Jonathan S. Turner. Washington University, St. Louis

/$10.00 (c) 1998 IEEE

Uncontrollable. High Priority. Users. Multiplexer. Server. Low Priority. Controllable. Users. Queue

THROUGHPUT IN THE DQDB NETWORK y. Shun Yan Cheung. Emory University, Atlanta, GA 30322, U.S.A. made the request.

A Delayed Vacation Model of an M/G/1 Queue with Setup. Time and its Application to SVCC-based ATM Networks

IBM Almaden Research Center, at regular intervals to deliver smooth playback of video streams. A video-on-demand

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc

Space Priority Trac. Rajarshi Roy and Shivendra S. Panwar y. for Advanced Technology in Telecommunications, Polytechnic. 6 Metrotech Center

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

ECE 697J Advanced Topics in Computer Networks

Reinforcement Learning Scheme. for Network Routing. Michael Littman*, Justin Boyan. School of Computer Science. Pittsburgh, PA

An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies

Improving TCP Throughput over. Two-Way Asymmetric Links: Analysis and Solutions. Lampros Kalampoukas, Anujan Varma. and.

A Comparison of Allocation Policies in Wavelength Routing Networks*

Chapter. CAC & Routing Strategies

Ecient Precomputation of Quality-of-Service Routes. Anees Shaikhy, Jennifer Rexfordz, and Kang G. Shiny

ANALYSIS OF THE CORRELATION BETWEEN PACKET LOSS AND NETWORK DELAY AND THEIR IMPACT IN THE PERFORMANCE OF SURGICAL TRAINING APPLICATIONS

Dynamics of an Explicit Rate Allocation. Algorithm for Available Bit-Rate (ABR) Service in ATM Networks. Lampros Kalampoukas, Anujan Varma.

The Bounded Edge Coloring Problem and Offline Crossbar Scheduling

Matrix Unit Cell Scheduler (MUCS) for. Input-Buered ATM Switches. Haoran Duan, John W. Lockwood, and Sung Mo Kang

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Delay Analysis of Fair Queueing Algorithms with the. Stochastic Comparison Approach. Nihal Pekergin

Performance Evaluation of Two New Disk Scheduling Algorithms. for Real-Time Systems. Department of Computer & Information Science

London SW7 2BZ. in the number of processors due to unfortunate allocation of the. home and ownership of cache lines. We present a modied coherency

/99/$10.00 (c) 1999 IEEE

cell rate bandwidth exploited by ABR and UBR CBR and VBR time

Buffered Fixed Routing: A Routing Protocol for Real-Time Transport in Grid Networks

Real-time communication scheduling in a multicomputer video. server. A. L. Narasimha Reddy Eli Upfal. 214 Zachry 650 Harry Road.

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t

0 Source. Destination. Destination. 16 Destination Destination 3. (c) (b) (a)

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational

III Data Structures. Dynamic sets

\Classical" RSVP and IP over ATM. Steven Berson. April 10, Abstract

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Performance Analysis of WLANs Under Sporadic Traffic

Limits on Interconnection Network Performance. Anant Agarwal. Massachusetts Institute of Technology. Cambridge, MA Abstract

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz.

Design and Performance Evaluation of a New Spatial Reuse FireWire Protocol. Master s thesis defense by Vijay Chandramohan

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Stability in ATM Networks. network.

In the International Journal of Parallel Programming, vol.28, no. 1, Enhanced Co-Scheduling: A Software Pipelining Method

Analysis of Slotted Multi-Access Techniques for Wireless Sensor Networks

Netsim: A Network Performance Simulator. University of Richmond. Abstract

STATE UNIVERSITY OF NEW YORK AT STONY BROOK. CEASTECHNICAL REPe. Multiclass Information Based Deflection Strategies for the Manhattan Street Network

BMVC 1996 doi: /c.10.41

ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

Comparative Study of blocking mechanisms for Packet Switched Omega Networks

Unit 2 Packet Switching Networks - II

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University

Using the Imprecise-Computation Technique for Congestion. Control on a Real-Time Trac Switching Element

Dynamic Multi-Path Communication for Video Trac. Hao-hua Chu, Klara Nahrstedt. Department of Computer Science. University of Illinois

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally

Thrashing in Real Address Caches due to Memory Management. Arup Mukherjee, Murthy Devarakonda, and Dinkar Sitaram. IBM Research Division

Outline. Application examples

On the Use of Multicast Delivery to Provide. a Scalable and Interactive Video-on-Demand Service. Kevin C. Almeroth. Mostafa H.

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

Minimum Energy Paths for Reliable Communication in Multi-hop. Wireless Networks CS-TR December Introduction

Deterministic Best Eort and Planning Based. Protocols for Real-time Communication in Multiple. Access Networks. Samphel Norden S.

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

TCP recvr ABR recvr. TCP xmitter ABR source

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom

Analysis of One-Way Reservation Algorithms

Determining the Number of CPUs for Query Processing

Simulation for ATM Networks. G. Kesidis, A. Singh, D. Cheung and W.W. Kwok. E.& C.E. Dept, University of Waterloo. Waterloo, ON, Canada, N2L 3G1

Analytical models for parallel programs have been successful at providing simple qualitative

Recoverable Mobile Environments: Design and. Trade-o Analysis. Dhiraj K. Pradhan P. Krishna Nitin H. Vaidya. College Station, TX

Efficient Queuing Architecture for a Buffered Crossbar Switch

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988

Transcription:

QUEUEING ANALYSIS OF BUFFERED SWITCHING NETWORKS JONATHAN S. TURNER fellow, ieee Abstract This paper provides a method for analyzing the queueing behavior of switching networks constructed from switches that employ shared buering or parallel bypass input buering. It extends the queueing models rst introduced by Jenq and later generalized by Szymanski and Shaikh to handle these classes of networks. Our analysis explicitly models the state of an entire switch and infers information about the distribution of packets associated with particular inputs or outputs when needed. Earlier analyses of networks constructed from switches using input buering attempt to infer the state of a switch from the states of individual buers and cannot be directly applied to the networks of interest here. Index Terms packet switching networks, ATM networks, queueing analysis I. Introduction In a widely cited paper [4], Jenq describes a method for analyzing the queueing behavior of binary banyan networks with a single buer at each switch input. The method, while not yielding closed form solutions, does permit the ecient computation of the delay and throughput characteristics of a switch. A key element of the analysis is the inference of the state of a single switch from the state of its two buers, based on the assumption that the states of the two buers are independent. This independence assumption is not valid but does not yield gross inaccuracies in the systems that Jenq studied. Recently, Szymanski and Shaikh [6] have extended Jenq's method to switching systems constructed from switches with an arbitrary number of inputs and an arbitrary number of buer slots. They have also applied it to systems with dierent buering techniques. While these extensions are useful, it turns out that for many specic choices of system parameters, the independence assumption mentioned above leads to signicant inaccuracies. We extend the previous work to cover switching systems in which the buer slots in a switch are shared among all the inputs and outputs, rather than being dedicated to either particular inputs or particular outputs. Such systems require an analysis which explicitly models the state of the entire switch rather than the states of individual input or output queues. We can also apply our 0 This work was supported by the National Science Foundation (grant DCI 8600947), Bellcore, BNR, DEC, Italtel SIT, NEC, NTT and SynOptics. Figure 1: Recursive Denition of a Delta Network method to systems using parallel bypass input buering, a class of systems that cannot be analyzed directly using the previous methods. Our technique can also be applied to the systems studied previously and for some system congurations yields signicantly more accurate results. In section 2, we review the previous results for switching systems with input buering, in order to motivate the key issues involved in their analysis. In section 3, we show how to analyze a switching system with shared buering and present a variety of performance curves characterizing such systems. In section 4, we show how our methods can be extended to switching systems with input buering, including systems supporting bypass queueing. Finally, in section 5, we provide numerical comparisons of the dierent buering techniques, describe our computational experience and suggest some possible extensions to our work. II. Analysis of Networks with Input Buffering Figure 1 shows the recursive construction of a delta network D n d with n inputs and outputs, constructed from d-port switches. Such networks provide a single path between any inputs and outputs, and have log d n stages of switching. (We use the term network here to describe the system as a whole and switch to describe the components from which the network is constructed.) The delta network is topologically equivalent to such networks as the banyan and omega networks. The results we describe here are equally applicable to any of these networks. Delta networks are often constructed from switches that contain buering for a small number of packets, with ow control between successive switches to ensure that the buers do not overow. Figure 2 shows the structure of atypical switch in which each switch input has a buer with a capacity of packets. Typically these systems are operated in a time-slotted fashion, with xed length packets progressing from stage to stage in a synchronous fashion. Consequently, we can think of the system as operating in two phases. In the rst phase, ow control information passes through the network from right to left. In the second phase, packets ow from left to right, in accordance with the ow control information. A switch input will allow its predecessor to send it a packet if it has an empty buer slot currently or if one of the packets in its buer will leave during the 1

2 ieee transactions on communications Figure 2: Switch with Fifo Input Buer second phase of the current cycle. This is called global ow control, since the ow control decision at a switch potentially depends on all of its successors in the network. Local ow control is also possible in this form, a switch input allows its predecessor to send a packet only if its buer has an empty slot. While local ow control doesn't make as eective use of a switch's buers, it is more straightforward to implement, particularly in high speed systems where the propagation time required for global ow control can lead to unacceptable overheads. Note also, that several packets in a switch may contend for the same output, but only one will be allowed to proceed during a given cycle. We assume (as is usual) that in such a case, one of the contending packets is selected at random. One way to analyze the queueing behavior of a buered delta network is to explicitly model the state of a single input buer by a discrete time birth-death process and then model the state of an entire switch by assuming that the states of its various input buers are independent. A Bernoulli arrival process is assumed and packets are independently assigned random destination addresses upon entry to the system. This analytical technique is described in [6]. We briey review it here for completeness. Let i () be the steady state probability that an input buer in stage i of the network (stages are numbered from left to right starting with 1) contains exactly packets, where 0. Leta i be the probability that a packet is available to enter a stage i buer and let q i be the probability that the rst packet (assuming there is a packet) in a stage i buer can leave during a given cycle. With these denitions, the transition probabilities for the stage i buer are as shown below. (Here, a i =1; a i and q i =1; q i we use the overline throughout to indicate the \complement" of the given probability.) The reasoning is straightforward. If the queue contains packets where 0 < <, then the probability that during the next cycle the queue contains + 1 packets is ust the probability that a new packet is available to enter the queue and the packet at the head of the queue does not leave this is a i q i, assuming that arrivals and departures are independent of one another. Similarly, the probability that during the next cycle the queue contains ;1 packets is ust the probabilitythatno new packet is available to enter the queue and the packet at the head of the queue does leave that is, a i q i. If we knew a i and q i then, we could easily compute the state probabilities i (). The trouble of course is that a i and q i depend on the state probabilities of the buers in the neighboring switches. This leads to an iterative computational method in which we assign arbitrary initial values to the state probabilities, then compute a i and q i for all i, use these values together with the balance equations for the Markov chain to compute new state probabilities, and so forth. We calculate a i using the following equation a i =1; (1 ; i;1 (0)=d) d The reasoning is that a packet is available to enter a particular input buer of a stage i switch if at least one of the d buers in the predecessor is non-empty and has a rst packet for the particular stage i switch ofinterest. Note that the states of the predecessor's d buers are assumed to be independent. Dene b i to be the probability that a successor of a stage i switch can accept a packet. Then, 1 ; i+1 ()q b i = i+1 for global ow control i+1 () for local ow control and q i = = = d;1 =0 b i i (0) b i +1 d =1 d ; 1 d ( i (0)=d) (1 ; i (0)=d) (d;1); ( i (0)=d) (1 ; i (0)=d) d; b i i (0) (1 ; (1 ; i(0)=d) d )=b i a i+1 = i (0) The rst equality above is based on the observation that the rst packet in a stage i buer can leave if the successor it is destined for can accept it and it wins any contention that may occur between it and the other input buers in the same switch. There are d ; 1 other input buers that might contend with it, the probability thatany one does contend is i (0)=d, and the probability that the given input buer wins, when it has to contend with others is 1=( + 1). In realistic systems, each input to the network is supplied with a buer that is typically much larger than those in the switches. We can model such a buer using the Markov chain shown below, where 0 is the number of buer slots, b 0 is the probability that a stage 1 switch can accept a packet oered to it (computed according to the equation for b i given above) and is the oered load, that is the probability that a packet is available to enter the buer. Finally, we note that a 1 is computed not according to the general equation given above but is equal to the probability that the input buer is nonempty also, we assume that the output of the network can always accept a packet meaning that b k = 1, where k =log d n. Given the above quantities, we can easily obtain the common performance metrics of interest. The carried load, for example, is the probability that a buer in the last stage is non-empty and given that it is non-empty, that it is able to transmit a packet that is, k (0)q k.the average delay through the network can be calculated by summing the average delays at each stage. The average

turner: queueing analysis of buffered switching networks 3 Figure 3: Throughput of Networks with Fifo Input Buering (S&S Analysis) delay at stage i is calculated using Little's Law. For the network using global grants we calculate the delay using 1 a i (1 ; i (B))q i B =0 i () In this expression, the quantity in the denominator of the initial fraction is the average arrival rate at stage i and the summation is the average queue length. For local grants, we ust substitute a i i (B) for the expression in the denominator. The delay in the input buer can be calculated in a similar fashion. The performance curves shown in Figure 3 were computed with this method. The leftmost and center pairs of plots show the maximum obtainable throughput as a function of network size for networks comprising switches of dierent sizes and varying amounts of buering. The rightmost plots show the eect of varying the amount of buering for switches with 256 inputs. The curves on the left show the throughput in the case of local ow control and those on the right are for global ow control. It's interesting to note that the networks constructed from larger switches have lower throughput when n is large. This appears to be caused by two mechanisms. First, because the networks constructed from larger switches have fewer stages for a given value of n,theyhave less buering overall. Secondly, the head-of-line blocking that occurs in these networks has a greater eect on the networks made up of large switches, since a blocked packet can aect packets with a wider range of destination addresses in this case. Figure 4: Switch with Shared Buering III. Analysis of Networks with Shared Buffering It's well known that switching networks in which buers are shared among the inputs can yield better performance than those in which buers are dedicated either to inputs or outputs. Figure 4 shows a switch inwhich packets arriving at any ofdinputs are placed in available buer slots from a pool containing B slots. Packets are routed from the shared buer to the appropriate outputs. An implementation of such a switch would require a db crossbar to distribute arriving packets to buers and a separate B d crossbar to route packets from buers to outputs. As in the input buered switch, one can use either local or global ow control, but we analyze only the case of local ow control. There are two additional possibilities for implementing local ow control which we refer to as the grant and acknowledgement methods. In the grant method of ow control, a switch withxempty buer slots, grants permission to send a packet to min fx dg of its upstream neighbors at the start of an operation cycle of the switch. If x<d,we assume that x predecessors are chosen at random. Notice that in this method, a switch supplies grants to upstream neighbors without knowing which of them has packets to send. This can result in sending a grant to a neighbor that doesn't have a packet, while a neighbor that does have apacket may not receive agrant. The acknowledgement methodofow control remedies this fault by allowing all predecessors

4 ieee transactions on communications with packets to send them. The receiving switch stores as many as it can in its buer and acknowledges their receipt by means of a control signal. Unacknowledged packets are retransmitted during a subsequent cycle. The acknowledgement method requires that the predecessors hold a copy ofapacket pending an acknowledgement, but allows better buer utilization overall. We rst analyze a network using the grant methodof ow control. We model each switch asab + 1 state Markov chain. We let i (s) be the steady state probability that a stage i switch contains exactly s packets and we let i (s 1 s 2 ) be the probability that a stage i switch contains s 2 packets in the current cycle given that it contained s 1 packets during the previous cycle. Let p i ( s) be the probability that packets enter a stage i switch that has s packets in its buer and let q i ( s) be the probability that packets leave astagei switch that has s packets in its buer. Then i (s 1 s 2 )= p i (h s 1 )q i (h;(s 2 ;s 1 ) s 1 ) 0 s 2;s 1hd s 2 B;s 1 (1) Let a i be the probability thatanygiven predecessor of a stage i switch has a packet for it. Then if we let m =minfd B ; sg, m p i ( s) = a i (1 ; a i) m; (2) a i = 0B i;1 () 1 ; (1 ; 1=d) (3) Let b i be the probability that a successor of a stage i switch provides a grant and let Y d (r s) be the probability that a switch that contains s packets, contains packets for exactly r distinct outputs. Then r q i ( s) = Y d (r s) b i (1 ; b i) r; (4) b i = rd s 0hB;d 0hd;1 i+1 (h)+ i+1 (B ; h)h=d (5) Y is easily calculated, assuming all distributions of s packets to the d outputs are equally likely. This is ust a classical distribution problem. For the purposes of calculation, the following recurrence is all we require. Y d (r s)= 8 >< >: 1 s = r =0 0 (s >0^r =0)_ s<r r d Y d(r s; 1) + d;(r;1) Y d d (r ; 1 s; 1) 0 <r s Note that Y d (r s) is independent of the stage of the switch in the network. For computational purposes, it is most convenient to merely precompute a table with the values of Y required the above recurrence is ideal for this purpose. As in the earlier analysis, we compute performance parameters by assuming a set of initial values for i (), then use these and the equations given above to compute i (s 1 s 2 ). These, together with the balance equations for the Markov chain are used to obtain new values of i () and then we iterate until we obtain convergence. While convergence is not guaranteed, our experience has shown convergence to be fairly rapid except when the oered load is approximately equal to the network's maximum throughput when the oered load is below this critical point, convergence is obtained in fewer than 100 iterations, above the critical point convergence typically requires several hundred iterations and in the vicinity of the critical point, it may require several thousand iterations. Notice that the calculation of Y d (r s) given above relies on the assumption that the addresses of the packets stored within a switch's buer are independent. This is not in fact the case. While it is true that the addresses of packets arriving at a switch are independent (given the input trac assumptions), buered packets are correlated as a result of having contended for outputs. The correlations are strongest when d is small and B large. We can now easily obtain the performance metrics of interest. The carried load is given by d B =0 s=0 q k ( s) k (s) The average delay at stage i is given by P B s s=0 i(s) P B p s=0 i( s) i (s) P d =0 In this expression, the quantity inthenumerator is the average queue length in the stage i buer and the denominator is the average arrival rate. Total delay is obtained by summing the per stage delays. Most of the above analysis carries over to networks that use the acknowledgement method of ow control. The only changes required are in the equations for p i ( s) and b i. In particular, we have p i ( s) = and b i = 8 >< >: ; d a 0 B ; s< i (1 ; a i) d; B ; s> d a h i (1 ; a i ) d;h B ; s h hd 0hB;d 2 4 0rh;1 hrd;1 i+1 (h)+ 1hd;1 i+1 (B ; h) d ; 1 a r r i+1(1 ; a i+1 ) (d;1);r + 3 h d ; 1 a r r +1 r i+1(1 ; a i+1 ) (d;1);r 5 (6) (7)

turner: queueing analysis of buffered switching networks 5 Figure 5: Carried Load Curves for Networks with Shared Buering (solid: analysis, dashed: simulation) Figure 5 shows curves of oered load vs. carried load for networks with 256 inputs and outputs and varying switch and buer dimensions. In the plots = B=d is the number of buer slots per switch input, the solid lines are the analytical results, while the dashed lines are simulation results. We note that for shared buer networks, large switches usually perform ust slightly better than small ones with the same values of. The advantage of the acknowledgement method of ow controlismost pronounced when the number of buer slots is limited, although one would expect a greater benet in the presence of unbalanced trac patterns. The analysis is optimistic in the sense that it predicts higher carried loads than the simulation. This is typical of such analytical techniques. Notice that the analytical results are most accurate when the switch size is largest and the buering is smallest. Haifeng Bi [1] has traced the source of the discrepancy to the independence assumption mentioned above. Because the analysis neglects the correlations among the destination addresses for packets buered in a given switch, it overestimates the number of distinct outputs for which packets are present in a given state. This in turn, leads to an overestimate of the number ofpackets leaving a switch inagiven state. As an experiment, Bi ran modied simulations in which correlations among packets in a switch were systematically eliminated by randomly reassigning their addresses at the start of each simulation cycle. The simulation results obtained in this way were virtually identical with the analytical results, meaning that the crucial direction for further renement of the analytical models lies in capturing the eects of correlations among packets. The simulation results given above, are taken from an extensive simulation study described in reference [1]. The simulation results consistently reveal the same characteristics mentioned above for all the queueing models we study that is, the analysis overestimates the maximum carried load and its accuracy is best for networks comprising large switches with limited buering. The simulation and analysis do rank the dierent buering techniques consistently making it possible to compare dierent buering techniques qualitatively using the analytical methods. We include no further simulation results here. Interested readers can nd further details in [1]. Figure 6 shows curves ofaverage delay. The curves that become constant for large load give the delay through the network itself. The curves that rise steeply for large loads include the delay through the input buer in addition to the network delay. We note that for oered loads below the maximum capacity of a given network the total delay is generally between 1 and 2 times the number of stages in the network, yielding an advantage for networks with large switches. We also note that the maximum network delay for a given conguration is generally smallest for = 1:5 or 2. Figure 7 gives maximum throughput curves for networks with shared buering of varying size and buer capacities. IV. Improved Analysis of Networks with Input Buffering We now return to the study of networks comprising switches using input buering. In addition to switches that use fo buers, we areinterested in switches that use bypass buering to avoid the head-of-line blocking eects that limit the performance of systems with fo buer-

6 ieee transactions on communications Figure 6: Delay Curves for Networks with Shared Buering ing. Two types of bypass buering are possible. In serial bypass, the rst packets in a switch's input buers rst contend for outputs, then the losing input buers that contain a second packet are allowed to contend a second time, those that lose in the second round and have a third packet are allowed to contend a third time, and so forth. In parallel bypass, all packets in a switch contend in a single round with the winners proceeding to the outputs. This allows more than one packet from a given input to proceed during a single cycle, allowing potentially higher performance, although of course each output can carry at most one packet per cycle. In high speed systems, parallel bypass is actually somewhat easier to implement, as one does not have the overhead of multiple contention rounds. For this reason and because it is more straightforward to analyze, we concentrate here on parallel bypass. The analysis of a network with parallel bypass input buering is similar to that for a network with shared buers using the grant method of ow control. In particular, we need only alter the equations for b i and p i ( s). As previously, B is the total number of buers in a switch and = B=d. Let d ( s) be the probability thata given input buer has packets given that the switch as a whole contains s. Then, b i = 0sB i+1 (s)(1 ; d ( s)) (8) Next, let W d (r s) be the probability that exactly r input buers are not full given that a switch contains exactly s packets. Then, p i ( s) = W r d (r s) a i (1 ; a i) r; (9) rd and W are easily computed, assuming that when the switch contains s packets, all distributions of those packets among the input buers are equally likely. This assumption is not really correct, as it neglects correlations among packets in a given switch, resulting from prior contention for outputs. Bi [1] presents simulation results quantifying the discrepancy caused by this assumption the results are similar to those cited above. Let z d (s) be the number of ways to distribute s distinct obects (packets) among d distinct containers (input buers), under the restriction that each container may contain at most obects. Also, let x d (r s) be the number of ways to distribute s obects among d containers of capacity, so that a particular container receives exactly r obects. Then d (r s)=x d (r s)=z d (s) Similarly, ifw d (r s)isthenumber of distributions that leave exactly r containers with fewer than obects, then W d (r s)=w d (r s)=z d (s) We compute z, x and w as follows, z d (s) = 8 >< >: 1 s =0 0 s>d s z (s ; i) d;1 i 0i s 0 <s d

turner: queueing analysis of buffered switching networks 7 x d (r s) = s r w d (r s) = d r z d;1 Figure 7: Throughput Curves for Networks with Shared Buering (s ; r) s (d ; r) zr ;1 (s ; (d ; r))z d;r ((d ; r)) Using these equations, it is straightforward to compute tables containing the requisite values of and W. Wenow return to the case of an input buered network with fo buers. Most of the analysis for bypass input buering carries over to this case. The two equations requiring modication are those for a i and q i ( s). Let (r s) be the probability that exactly r input buers contain at least one packet, given that the switch contains s packets. Then, assuming that when a switch contains s packets, all distributions of the packets among the outputs are equally likely, and that the destination addresses of all packets are independent, Y d a i = q i ( s) = 0sB 0rd s hd s rh i;1 (s) Y d (r s)(1 ; (1 ; 1=d)r ) (10) Y d (h s) r Y d (r h) b i (1 ; b i) r; (11) (Once again, the assumption made above neglects correlations among packets, caused by prior contention.) If we let y d (r s) be the number of ways to distribute s obects among d containers so that exactly r containers receive one or more obects, then Y d (r s)=y d (r s)=z d (s) and y d (r s) is computed using the recurrence y d (r s) = y d;1 (r s)+ 1i s;(r;1) s y d;1 (r ; 1 s; i) i when 0 <r s d and r d y d (r s) = 1 when r = s = 0 and y d (r s) = 0 when s<ror d<ror d < s or r =0<s. Figure 8 gives curves of maximum throughput for networks comprising switches with both fo and parallel bypass input buering, of varying size and buer capacity. We note that bypass buering gives a very substantial improvement over fo buering and that larger buers yield a greater improvement in the case of bypass buering. It's also worthwhile to note the dierences between the curves in the top row of Figure 8 to the corresponding curves in the top row of Figure 3 that were obtained using Szymanski and Shaikh's analysis. The prior analysis is consistently more optimistic than the method described here. However in many cases the improvement obtained with the new method is slight. V. Conclusions Figure 9 lists the numbers of the equations used to compute the key quantities for each of the four buering

8 ieee transactions on communications Figure 8: Throughput Curves for Networks with Input Buering methods analyzed here. Figure 10 compares the maximum throughput obtained with the various buering methods and networks of varying size, switch dimension and buer capacity. We show curves for shared buering using both the grant and acknowledgement methods of ow control. We show curves for input buering using local ow control, with bypass queueing and fo queueing. We note that shared buer switches oer clearly superior performance for a given amount of buering, but bypass input buering performs impressively as well. Fifo input buering, performs rather poorly in comparison to the other methods, but may be acceptable in certain applications. Interestingly, variation in switch size yields only small changes in maximum throughput for networks with the same values of, but the reduction in the number of stages obtained with larger switches yields a signicant economy in implementation, as well as lower delays. We note that the acknowledgement method of ow control yields only modest improvements over the grant method when we have uniform random trac with Bernoulli arrivals. This appears surprising until one realizes that in the presence of heavy uniform random trac, all the predecessors of a given switch are likely to have one or more packets to send it at any one time. We would expect a greater dierence in the face of non-uniform bursty traf- c, since in this case, there can be substantial dierences in the instantaneous trac from the upstream neighbors. Our computational experience with the method described here is quite favorable. In collecting the data for all the curves shown in this paper, we computed approximately 900 data points and used a total of 40 hours of cpu time on a Sun Sparcstation I, with 16 Mbytes of memory, yielding an average of about 2.67 minutes per Shared Buering Input Buering grant ack bypass fo i 1 1 1 1 p i 2 6 9 9 a i 3 3 3 10 q i 4 4 4 11 b i 5 7 8 8 Figure 9: Equations Used in Various Queueing Models data point. The program required to compute the results for input fo buering consists of about 8 pages of C++ code. The memory requirements for this program are under 3 Mbytes when dimensioned for networks with up to 12 stages, switches with up to 32 inputs and a total of up to 100 buer slots. The other programs are a little smaller in both code and memory usage. The switch dimension and buer capacity both have a strong inuence on the running time and memory requirements. We have computed results for switches with d = 32 and = 3, but this is about as far as one can reasonably push the method with typical workstations. Fortunately, this covers the cases of greatest interest, as larger switches are dicult to implement. Also, the relative insensitivity of the results on switch dimension allows one to extrapolate to networks of larger switches with a good deal of condence. Since the running time is relatively insensitive to the total size of the network, it is far superior to simulation when modeling networks with hundreds or thousands of inputs. We note that, the analysis also supports computation of delay distributions, and packet loss rates in

turner: queueing analysis of buffered switching networks 9 addition to average delay and throughput. As mentioned above, Haifeng Bi [1] has made a careful evaluation of the analytical techniques described here by comparing them with simulation and has both quantied the discrepancies and identied the crucial contributing factors. Bi also studied networks comprising switches with output buering and made a systematic comparison of output buering with input buering. His work demonstrates that the dierence commonly noted between input buering and output buering is less signicant than commonly assumed. What most authors overlook is that in a switch with output buering, the internal crossbar or bus required to provide access to the outputs requires greater capacity than the crossbar required in a switch using fo input buering. In conventional output buering, each buer can receive uptod packets per cycle, whereas in fo input buering, each buer can transmit only one packet per cycle. If one compares conventional output buering to bypass input buering, where each input buer can transmit multiple packets in a given cycle, the dierence between the two disappears. Bi has compared generalized forms of input and output buering. In his study, there is a parameter r that limits the number of packets that can be transmitted from an input buer in a given cycle or received at an output buer. His results show that there is very little dierence between input and output buering when they operate under the same restrictions with respect to crossbar access interestingly in fact, input buering enoys a slight advantage due to \boundary eects" at the rst and last stages of the network. An interesting extension of this work would be to modify our methods to allow modeling of systems with global ow control. This appears dicult and may beofonly academic interest, but nonetheless it would be interesting to compare with the earlier analyses. Extension of our models to networks with uneven trac distribution would also be worthwhile. We expect this could be done following the pattern established in [5]. Networks that perform distribution and/or packet replication are of substantial practical interest currently [7]. The extension of our models to cover distribution networks appears straightforward the case of replication is more challenging but may prove tractable. Finally, these techniques are directly applicable to the study of several switching system architectures that are under development for atm networks [2, 3, 7]. A detailed comparison of these systems using the tools we have developed could have an important practical impact on the development of emerging networks. Acknowledgements. My thanks to Hiafeng Bi for providing the simulation results reported here and for identifying several errors in an earlier version of the paper. Also, I thank Ankira Arutaki and an anonymous reviewer for their careful reading and incisive comments. References [1] Bi, Haifeng. \Queuing Analysis of Buered Packet Switching Networks," MS thesis, Washington University Computer Science Department, 8/90. [2] Coudreuse, J. P. and M. Servel. \Prelude: An Asynchronous Time-Division Switched Network," International Communications Conference, 1987. [3] De Prycker, M., and J. Bauwens. \A Switching Exchange for an Asynchronous Time Division Based Network," International Communications Conference, 1987. [4] Jenq, Yih-Chyun. \Performance Analysis of a Packet Switch Based on a Single-Buered Banyan Network," IEEE Journal on Selected Areas in Communications, vol. SAC-1, no. 6, 12/83, 1014{1021. [5] Kim, Hyong Sok and Alberto Leon-Garcia. \Performance of Buered Banyan Networks under Nonuniform Trac Patterns," Proceedings of Infocom 88, 4/88. [6] Szymanski, Ted and Salman Shaikh. \Markov Chain Analysis of Packet-Switched Banyans with Arbitrary Switch Sizes, Queue Sizes, Link Multiplicities and Speedups," Proceedings of Infocom 89, 4/89. [7] Turner, Jonathan S. \Design of a Broadcast Packet Network," IEEE Transactions on Communications, June 1988.

10 ieee transactions on communications Figure 10: Comparison of Buering Methods