DQDB Networks with and without Bandwidth Balancing

DQDB Networks with and without Bandwidth Balancing Ellen L. Hahne Abhijit K. Choudhury Nicholas F. Maxemchuk AT&T Bell Laboratories Murray Hill, NJ 07974 ABSTRACT This paper explains why long Distributed Queue Dual Bus (DQDB) networks without bandwidth balancing can have fairness problems when several nodes are performing large file transfers. The problems arise because the network control information is subject to propagation delays that are much longer than the transmission time of a data segment. Bandwidth balancing is then presented as a simple solution. By constraining each node to take only a certain fraction of the transmission opportunities offered to it by the basic DQDB protocol, bandwidth balancing gradually achieves a fair allocation of bandwidth among simultaneous file transfers. We also propose two ways to extend this procedure effectively to multi-priority traffic. This paper appears in IEEE Transactions on Communications, Vol. 40, No. 7, July 1992, pp. 1192-1204.

- 2-1. INTRODUCTION The Distributed Queue Dual Bus (DQDB) [1], [2] is a metropolitan area network that has recently been standardized by the IEEE [3]. The dual-bus topology is identical to that used in Fasnet [4] and is depicted in Figure 1. The two buses support unidirectional communications in opposite directions. Nodes are connected to both buses and communicate by selecting the proper bus. In both DQDB and Fasnet a special unit at the head-end of each bus generates slots; however, the protocols for acquiring slots differ significantly. Fasnet uses a protocol similar to that of a token ring, where each station is given an opportunity to transmit in order. DQDB resembles a slotted ring with free access, where stations transmit in every empty slot if they have data. Both token access and free access have performance drawbacks, so Fasnet and DQDB use the channel in the opposite direction from which they are sending data to derive performance improvements over the earlier networks. In a token-passing network, a significant fraction of the bandwidth can be wasted as the token circulates among a small number of active stations. Therefore Fasnet includes several techniques for using the reverse channel to reduce the token circulation time and to let stations use slots that would otherwise have been wasted. In a slotted ring network with free access, one station may take all the slots and prevent the others from transmitting. This fairness problem is exacerbated when the ring topology is replaced by a bus, because the station closest to the bus head-end always has first access to slots. Therefore DQDB uses the reverse channel to reserve slots for stations that are further from the head-end, as explained in Section 2. The aim of DQDB s reservations is to improve access fairness without the bandwidth wastage of a circulating token. Moreover, by associating priority levels with these reservations, DQDB can offer multi-priority service. Unfortunately, the DQDB reservation process is imperfect. The network span (up to 50 km), the transmission rate (assumed to be 150 Mbps in this paper), and the slot size (53 bytes) of DQDB allow many slots to be in transit between the nodes. Therefore the nodes can have inconsistent views of the reservation process. If this happens, and if the access protocol is too efficient and tries never to waste a slot, then bandwidth can be divided unevenly among nodes simultaneously performing long file transfers. Since the network does not provide the same throughput to all of the nodes, in that sense it is unfair. This is the main problem to be addressed in this paper. In Section 3 we offer a novel analysis

- 3 - of a closed queueing network model for this scenario. (Other DQDB fairness studies appear in [5]-[31].) In Section 5 we present a simple enhancement to the basic DQDB protocol, called "bandwidth balancing", that equalizes the throughput. (Other recent fairness proposals can be found in [28]-[37].) Bandwidth balancing intentionally wastes a small amount of bus bandwidth in order to facilitate coordination among the nodes currently using that bus, but it divides the remaining bandwidth equally among those nodes. The key idea (adapted from Jaffe [38]) is that the maximum permissible nodal throughput rate is proportional to the unused bus capacity; each node can determine this unused capacity by observing the volume of busy slots and reservations. The throughput is equalized gradually over an interval several times longer than the propagation delay between competing nodes. Bandwidth balancing is easy to implement: each node permits itself to use only a certain fraction of the transmission opportunities offered to it by the basic DQDB protocol. If the traffic is all of one priority, then bandwidth balancing requires no additional control information to be transmitted, and it requires only one additional counter (per bus) in each node. Bandwidth balancing has been incorporated into the DQDB standard as a required feature that is enabled by default; the ability to disable this feature is also required. In the standard, bandwidth balancing conforms to the priority structure of the basic DQDB protocol, which is explained in Section 2 below. In particular, a node is aware of the priority levels of incoming reservations, but not the priority levels of the data in busy slots. This asymmetry means that different nodes have different information about the traffic on a bus, making it difficult to control multi-priority traffic. The version of bandwidth balancing specified in the standard guarantees equal allocations of bus bandwidth to nodes with traffic of the lowest priority level. A node with higher-priority traffic is guaranteed at least as much bandwidth as a lowest-priority node, but no further guarantees are possible. Furthermore, when there are nodes with different priorities, the throughputs achieved by nodes can depend on their relative positions on the bus [27], [39]. In Sections 6 and 7 of this paper we propose two better ways to extend the uni-priority bandwidth balancing procedure of Section 5 to multi-priority traffic. (Additional proposals appear in [40] and [41].) The first step is to correct the asymmetry of the priority information, either by adding priority

- 4 - information about the data in busy slots, or by removing priority information from the reservations. The former method we call the "global" approach, because the priority information for all traffic is available to all nodes by reading the data and reservation channels. The latter method we call the "local" approach, because a node is only aware of the priority of its own locally generated data, and the node does not disseminate this information over the network. Section 6 presents a version of bandwidth balancing based on local priority information. (Similar "local" versions of bandwidth balancing have been proposed by Damodaram [42], Spratt [43], and Hahne and Maxemchuk [44], [45]). Section 7 presents a version of bandwidth balancing based on global priority information. (A cruder version of this scheme appears in [46].) Both multi-priority schemes presented in this paper produce bandwidth allocations that are independent of the nodes relative positions on the bus. Moreover, both schemes are fair; i.e., they allocate equal bandwidth shares to all nodes active at the same priority level. The schemes differ in the way they allocate bandwidth across the various priority levels. Either scheme could easily be included in a future version of the standard, and nodes satisfying the old standard could share the same network with the new nodes, provided that the old nodes only generate traffic of the lowest priority level. 2. DQDB WITHOUT BANDWIDTH BALANCING DQDB allows some network capacity to be set aside for synchronous services such as voice, but we will assume that no such traffic is present. DQDB supports asynchronous traffic of several priority levels, which for convenience we will number from 1 (least important) through P (most important). 1 DQDB uses the dual-bus topology depicted in Figure 1. The two buses support unidirectional communications in opposite directions. Nodes are connected to both buses and communicate by selecting the proper bus. The transmission format is slotted, and each bus is used to reserve slots on the other bus in order to make the access fairer. Each slot contains one request bit for each priority level and a single busy bit. The busy bit indicates whether another node has already inserted a segment of data into the slot. The request bits on one bus are used to notify nodes with prior access to the data 1. The DQDB standard calls for three priority levels, i.e., P = 3, but it labels them differently: 0, 1, 2.

- 5 - slots on the other bus that a node is waiting. When a node wants to transmit a segment on a bus, it waits for an empty request bit of the appropriate priority on the opposite bus and sets it, and it waits for an empty slot on the desired bus to transmit the data. The IEEE 802.6 Working Group is currently considering optional procedures for erasing the data from a slot once it passes the destination, so that the slot can be reused [47], [48], [49]. However, this paper assumes that once a data segment has been written into a slot, that data is never erased or overwritten. This paper only discusses how data segments are written onto a bus, since reading data from the bus is straightforward. The operation for data transmission in both directions is identical. Therefore, for the remainder of this paper, the operation in only one direction is described. One bus will be considered the data bus. Slots on this bus contain a busy bit and a payload of one segment of data. These slots are transmitted from upstream nodes to downstream nodes. The other bus is considered the request bus. Each slot on this bus contains one request bit for each priority level, and slots are transmitted from downstream nodes to upstream nodes. Figure 2 shows how a DQDB node operates with bandwidth balancing disabled. We model each node as composed of P sections, one to manage the writing of requests and data for each priority level. We assume that the sections act like separate nodes: each section has its own attachments to the buses, and the data bus passes through the sections in order, starting with priority 1, while the request bus passes through the sections in the opposite order, starting with priority P. While this layout does not correspond to the actual physical implementation of DQDB, 2 for the purposes of this paper, they are functionally equivalent. 3 2. In an actual DQDB node, the sections of the various priority levels all read the request bus at the same place, before any of them has a chance to write. Write-conflicts on the request bus are impossible, though, because the priority-p request bit can only be set by the priority-p node section. There is some internal linkage among the sections: whenever a section generates a request, it notifies all lower-priority sections within the node as well as writing the request onto the bus. Similarly, all sections of the node read the data bus at the same place, before any of them can write. Nevertheless, because of the internal linkage just described, the protocol can guarantee that two node sections will not try to write data into the same empty slot. 3. Figures 2, 4, 8, and 10 and associated text are intended to be functional descriptions. The physical implementation of these ideas will not be discussed in this paper.

- 6 - In Figure 2 the details of the priority-p section are shown. This section has a Local FIFO Queue to store priority-p data segments generated by local users while these segments wait for the Data Inserter (DI) to find the appropriate empty slots for them on the data bus. The Data Inserter operates on one local data segment at a time; once the Local FIFO Queue forwards a segment to the Data Inserter, the Local FIFO Queue may not forward another segment until the Data Inserter has written the current segment onto the data bus. When the Data Inserter takes a segment from the Local FIFO Queue, first it orders the Request Inserter (RI) to send a priority-p request on the request bus. Then the Data Inserter determines the appropriate empty slot for the local segment by inserting the segment into the Data Inserter s Transmit Queue (TQ). All the other elements of this queue are requests of priority p or greater from downstream nodes. (The Data Inserter ignores all requests of priority less than p.) The Transmit Queue orders its elements according to their priority level, with elements of equal priority ordered by the times they arrived at the Data Inserter. The Data Inserter serves its Transmit Queue whenever an empty slot comes in on the data bus. If the element at the head of the queue is a request, then the Data Inserter lets the empty slot pass. If the head element is the local data segment, then the busy bit is set and the segment is transmitted in that slot. The Transmit Queue is implemented with two counters, called the Request Counter and the Countdown Counter. When there is no local data segment in the queue, the Request Counter keeps track of the number of unserved reservations from downstream nodes in the Transmit Queue. When the Data Inserter accepts a local data segment, the Request Counter value is moved to the Countdown Counter, which counts the number of reservations that are ahead of the local data segment in the Transmit Queue, and the Request Counter is then used to count reservations behind the local data segment. The Request Inserter sends one reservation of priority p for each data segment taken by the Data Inserter from the Local FIFO Queue. Since the incoming priorityp request bits may have been set already by downstream nodes, the Request Inserter sometimes needs to queue the internally generated reservations until vacant request bits arrive. Thus it is possible for a data segment to be transmitted before its reservation is sent. Perfect operation of the DQDB protocol without bandwidth balancing would occur if the system had no propagation or processing delays, and if it included an idealized reservation channel with no slotting, queueing or transmission delays. Under these conditions [50]:

- 7 - Slots are never wasted. The priority mechanism is absolute (i.e., a data segment can only be transmitted when there are no higher-priority segments waiting anywhere in the network). Nodes with traffic at the current highest priority level are served one-segment-per-node in roundrobin fashion. However, if the propagation delay between nodes is much longer than the transmission time of a data segment, then performance deteriorates. This is the subject of the next section. 3. THROUGHPUT FAIRNESS OF DQDB WITHOUT BANDWIDTH BALANCING When the network propagation delay is larger than a slot transmission time, the DQDB access protocol without bandwidth balancing is unfair, in the sense that nodes simultaneously performing large file transfers can obtain different throughputs. The severity of the problem depends upon the propagation delay, the network utilization, and the lengths of the messages submitted to the network. In this section, we assume that users often submit messages consisting of a great many segments. This model (suggested to us by Manoel Rodrigues) seems to be increasingly appropriate as diskless workstations abound, because large files are typically transferred between a workstation and its file server. Network "overloads" are caused by as few as two users simultaneously performing file transfers and hence could be quite typical. We model an overloaded DQDB system as a closed network of queues and study its fairness through a novel approximate analysis. We will examine scenarios similar to those explored in Wong s study [10] of an earlier version of DQDB. Consider two nodes that are transmitting very long messages of the same priority. Call the upstream node 1 and the downstream node 2. Ideally, each node should obtain half the bandwidth of the data channel, but this rarely happens. Suppose that the propagation delay between the nodes equals D slot transmission times, where D is an integer. Let be the difference in the starting times of the two nodes, i.e., the time when node 2 wants to begin transmission minus the time when node 1 is ready; is measured in slot times and is assumed to be an integer. Once both nodes are active, node 1 leaves slots idle only in response to requests from node 2. Therefore, once node 2 begins to receive segments transmitted by node 1, the only idle slots node 2 receives are in response to its earlier requests. Each idle slot received by node 2

- 8 - results in a segment being transmitted, a new segment being queued, and a new reservation being transmitted. Therefore, the number X of requests plus idle slots circulating between the two nodes is fixed. (Some of these requests may be stored in node 1 s Transmit Queue.) Let us call these conserved entities permits. This quantity X determines the throughput of the downstream node. Unfortunately, X depends strongly on D and : 4 X = 1 + D c () (3.1) where c is a function that clips its argument to the range [ D, D ]. To clarify this claim, let us explain the system behavior for extreme values of. If the first segment from node 1 has already been received at node 2 by the time node 2 becomes active, i.e., if D, then node 2 inserts one data segment in its Transmit Queue and transmits one reservation. The segment will not be transmitted until the reservation is received by node 1 and an idle slot is returned. In this instance, there is one permit in the network. At the other extreme, consider D. Initially, only node 2 is active. It inserts its first segment in its Transmit Queue and sends its first reservation upstream. The first segment is transmitted immediately in the first slot. Then the second segment is queued, the second reservation sent, and the second segment is transmitted in the second slot, etc. The request channel is already carrying D requests when node 1 begins transmission, and in the D time slots that it takes for node 1 s first segment to reach node 2, node 2 injects another D requests, so that X 2D. Now we will show the relationship between X and the nodal throughputs. Recall that permits can be stored in the request channel, in the data channel, and in the Transmit Queue of the upstream node. This Transmit Queue also includes a single data segment from node 1. When the second file transfer begins, there is a transient phase in which the Transmit Queue length moves to a steady-state average value Q. More precisely, we should distinguish between Q (1), the average queue length observed by a data segment from node 1 just after it has been inserted into node 1 s Transmit Queue and Q (2), the average queue length observed by a request (permit) from node 2 just after it has been inserted into node 4. The analysis depends on some detailed timing assumptions. We have assumed that the bus synchronization is such that a node reads a busy bit on the data channel immediately after reading a request bit on the reservation channel. We also assume that there are no processing delays in the nodes, and that a node is permitted to insert a new data segment into the Transmit Queue and send the corresponding reservation as soon as it begins to transmit the previous data segment.

- 9-1 s Transmit Queue. The difference between these two views of the queue will be explained shortly. The network s steady-state behavior can be determined approximately by simultaneously solving the following equations involving X, Q (1), Q (2), the nodal throughput rates r (1) and r (2), and the average round-trip delay T experienced by a permit. (Throughput rates are measured in segments per slot time, and round-trip delays are measured in slot times.) r (1) + r (2) = 1 (3.2) r (1) = 1 / Q (1) (3.3) r (2) = X / T (3.4) T = 2D + Q (2) 2D + Q (1) (3.5) Before solving the equations above, let us discuss the approximation in the last equation. The difference between Q (1) and Q (2) is most pronounced when the inter-node distance D is large and only one permit circulates. In this case, the queue length observed by the permit is always two (itself plus node 1 s data segment), so Q (2) = 2. The queue length observed by the data segment, however, is usually one (itself) and occasionally two (itself plus the permit), so Q (1) 1. Even though Q (1) and Q (2) differ by almost a factor of two in this example, recall that D is large, so that the approximation shown above for the round-trip delay T is still justified. Solving (3.1)-(3.5) for the steady-state throughput rates yields: r (1) 2 2 D c () + (D c () + 2) 2 + 4Dc () r (2) = 1 r (1) Note that if the nodes are very close together (D = 0) or if they become active at the same time ( = 0), then each node gets half the bandwidth. However, if D is very large and the downstream node starts much later, its predicted throughput rate is only about 1 / 2D. Node 1 is also penalized for starting late (though not as severely as node 2): the worst case upstream rate is roughly 1 / 2D. The predicted throughputs match simulated values very well. Figure 3 compares our approximate analysis with simulation results for an inter-node distance of D = 50 slots 29 km. The analysis can easily be generalized to multiple nodes, provided that these nodes are clustered at only two distinct bus locations.

- 10 - The analysis and simulation studies in this section show perfect fairness when the propagation delay is negligible. However, moderate unfairness can be demonstrated even for D =0 if the timing assumptions in the previous footnote are changed [51], [52]. Hence bandwidth balancing may prove useful for fairness enhancement even on short networks. 4. DEFINITIONS This section gives various background assumptions and definitions to be used in the remainder of this paper. Recall that we are focusing on data transmission over one bus only. (Of course, the other bus is needed to carry requests for the use of the primary bus). The term parcel will be used to denote the traffic originating at one node at one priority level for transmission over one bus. All traffic rates will be measured in data segments per slot time. We will usually assume that the traffic demand of each node n has some fixed rate ρ(n ). This offered load may be stochastic, as long as it has a well-defined average rate. The offered load of the traffic parcel of priority level p at node n will be denoted ρ p (n ). It is possible that not all this offered load can be carried. The actual long-term average throughput of node n will be denoted by r (n ), and that of its parcel p by r p (n ). The unused bus capacity will be denoted by U : U = 1 Σ r (m ) = 1 Σ Σ r q (m ) m m q while U p + will be the bus capacity left over by parcels of priority p and greater: U p + = 1 Σ m q Σ r q (m ) p Of course, any individual node n at any instant t will not have direct knowledge of the long-term average rates defined above. All the node can see is: B (n,t ), the rate of busy slots coming into node n at time t from nodes upstream; R (n,t ), the rate of requests coming into node n at time t from nodes downstream; and S (n,t ), the rate at which node n serves its own data segments at time t. In one of our proposed protocols, the node can break these observations down by priority level p, in which case they are denoted B p (n,t ), R p (n,t ), and S p (n,t ). These observations can be used to determine U (n,t ), the bus capacity unallocated by node n at time t. By "unallocated", we mean the capacity that is neither used by nodes upstream of n, nor requested by nodes downstream of n, nor taken by node n itself:

- 11 - U (n,t ) = 1 B (n,t ) R (n,t ) S (n,t ) = 1 Σ B q (n,t ) q Σ R q (n,t ) q Σ S q (n,t ) q If node n can observe priority levels, then it can also measure U p + (n,t ), the bus capacity not allocated by node n at time t to parcels of priority p or greater: U p + (n,t ) = 1 Σ B q (n,t ) Σ R q (n,t ) Σ S q (n,t ) q p q p q p All the access control protocols described in this paper have a parameter M called the bandwidth balancing modulus; 5 in some schemes the modulus is different for each priority level p and is denoted M p. For convenience we assume that the bandwidth balancing moduli are integers, though rational numbers could also be used. Finally, let us define an alternative queueing discipline called Deference Scheduling for the Data Inserter s Transmit Queue. With Deference Scheduling, the Transmit Queue for the node section of priority p still holds at most one local data segment at a time, still accepts no requests of priority less than p, and still accepts all requests of priority p or greater. The difference is that all requests (of priority p or greater) are served before the local data segment even those priority-p requests that entered the Transmit Queue after the local data segment. The local data segment is served only when there are no requests (of priority p or greater) in the Transmit Queue. Deference Scheduling uses a Request Counter but needs no Countdown Counter. By itself, Deference Scheduling makes little sense because it offers no guarantee that the local data segment in the Transmit Queue will ever be served. This discipline does make sense, however, when used in conjunction with bandwidth balancing, as explained in the next section. To contrast with Deference Scheduling, we will call the two-counter discipline of Section 2 Distributed Queueing. 5. BANDWIDTH BALANCING FOR UNI-PRIORITY TRAFFIC We contend that the unfairness problem discussed in Section 3 arises because the DQDB protocol pushes the system too hard. If it attempts to use every slot on the bus, DQDB can inadvertently lock 5. In the DQDB standard the bandwidth balancing modulus is tunable, with a default value of 8.

- 12 - the network into unfair configurations for the duration of a sustained overload. In this section, we introduce the concept of bandwidth balancing: the protocol of Section 2 is followed, except that a node takes only a fraction of the slots that are not reserved or busy. This rate control mechanism lets the system relax a bit so it can work as intended. In this section, we focus on traffic of one priority level only. In Sections 6 and 7, we propose ways to extend bandwidth balancing effectively to multi-priority traffic. Section 5.1 presents our definition of fairness and the bandwidth balancing concept. In Section 5.2, the implementation of uni-priority bandwidth balancing is described. It requires only one extra counter, and either Distributed Queueing or Deference Scheduling may be used. The performance of bandwidth balancing is investigated in Section 5.3 through analysis and simulation. There we show the existence of a trade-off between bus utilization and the rate of convergence to a fair operating point. In this paper, we consider only throughput performance during overloads involving two or three active nodes. Other simulation studies of bandwidth balancing using more nodes, different traffic models, and a variety of performance measures appear in [24]-[27]. 5.1 Concept When the bus is overloaded, we want to divide its bandwidth fairly. Our ideal of throughput fairness is that nodes with sufficient traffic to warrant rate control should all obtain the same throughput, called the control rate. As discussed in Section 3, the performance of the DQDB protocol without bandwidth balancing can diverge from this ideal when the bus is long. One obvious solution is a centralized approach where nodes inform some controller node of their offered loads; the controller then computes the control rate and disseminates it. We will present an alternative way to do this one that requires no controller node, no offered load measurements, and no explicit communication of the control rate. Our method intentionally wastes a small amount of bus bandwidth, but evenly divides the remaining bandwidth among the nodes. The key idea is that the control rate is implicitly communicated through the idle bus capacity; since each node can determine this quantity by observing the passing busy bits from upstream and request bits from downstream, control coordination across the system can be achieved. This idea has also been

- 13 - suggested for congestion control in wide-area mesh-topology networks [38], but the problem there is more complex, since flow control must also be coordinated along multi-hop paths; in our case, the implementation is quite simple, adding very little to the complexity of DQDB. More specifically, each node limits its throughput to some multiple M of the unused bus capacity; nodes with less demand than this may have all the bandwidth they desire: r (n ) = min ρ(n ), M. U = min ρ(n ), M. 1 Σ r (m ) m (5.1) This scheme is fair in the sense that all rate-controlled nodes get the same bandwidth. Given the offered loads ρ(n ) and the bandwidth balancing modulus M, equation (5.1) can be solved for the carried loads r (n ). If there are N rate-controlled nodes, then the throughput of each is: r (n ) = M. 1 + M. (1 S ) (5.2) N and the total bus utilization is _ S + M. N 1 + M., where S is the utilization due to the nodes that are not rate- N controlled. (It takes some trial and error to determine which nodes are rate-controlled). The worst-case bandwidth wastage is 1 / (1+M ), which occurs when only one node is active. For example, if there are three nodes whose average offered loads are 0.24, 0.40, and 0.50 segments per slot time and if M = 9, then only the last two nodes are rate-controlled, the carried loads are 0.24, 0.36, and 0.36, respectively, and the wasted bandwidth is 0.04. One desirable feature of this scheme is that it automatically adapts to changes in network load. 5.2 Implementation In order to implement uni-priority bandwidth balancing, the slot header need only contain the busy bit and a single request bit. In theory, a node can determine the bus utilization by summing the rate of busies on one bus, the rate of requests on the other bus, and the node s own transmission rate. In the long run, this sum should be the same at every node (though the individual components will differ from node to node). In other words, each node n has enough information available to implement equation (5.1). Fortunately, it is not necessary for the node to measure the bus utilization rate over some lengthy

- 14 - interval. As the analysis and simulation of Section 5.3 will show, it is sufficient for node n to respond to arriving busy bits and request bits in such a way that: S (n,t ) M. U (n,t ) = M. [1 B (n,t ) R (n,t ) S (n,t )] (5.3) or, equivalently: S (n,t ) M. [1 B (n,t ) R (n,t )] (5.4) 1 + M In other words, the node takes only a fraction M / (1+M ) of the slots that are not reserved or busy at any point in time. One simple way to implement (5.3) and (5.4) is to add a Bandwidth Balancing Counter (BC) to the Data Inserter, as shown in Figure 4. The Bandwidth Balancing Counter counts local data segments transmitted on the bus. After M segments have been transmitted, the Bandwidth Balancing Counter resets itself to zero and generates a signal that the Data Inserter treats exactly like a request from a downstream node. This artificial request causes the Data Inserter to let a slot go unallocated. (The Request Inserter is not aware of this signal; hence the node does not send any extra requests upstream corresponding to the extra idle slots it sends downstream.) The Data Inserter may use either Distributed Queueing or Deference Scheduling in serving its Transmit Queue. (Since bandwidth balancing by all nodes ensures some spare system capacity, the local data segment in the Transmit Queue will eventually be served, regardless of the queue s scheduling discipline.) The advantages of Deference Scheduling are that it uses one counter (rather than two) and that it is easier to analyze, as we shall show shortly. However, we prefer Distributed Queueing for the following reasons: (1) While we will show that both versions of bandwidth balancing have the same throughput performance under sustained overload, the delay performance under moderate load is frequently better with Distributed Queueing [24]. (2) Many DQDB networks will have no significant fairness problems (e.g., if the buses are short, or if the transmission rate is low, or if the application is point-to-point rather than multi-access). In these cases, one would want to disable bandwidth balancing (because it wastes some bandwidth) and use the DQDB protocol as described in Section 2, which works only with Distributed Queueing. It is convenient to build one Data Inserter that can be used with or

- 15 - without bandwidth balancing, and this would have to be the Distributed Queueing version. 5.3 Performance 5.3.1 Analysis This section analyzes the transient behavior of bandwidth balancing. In preparation, we first present our modeling assumptions and a useful bound on the value of a node s Request Counter. Suppose the propagation delays between nodes are all integer numbers of slot transmission times. Let the propagation delay from the most upstream node to the most downstream node be D MAX slot times. Assume the bus synchronization is such that a node reads a busy bit on the data channel immediately after reading a request bit on the reservation channel. Let the nodes use Deference Scheduling. To simplify the counting, imagine that Deference Scheduling serves a node s Transmit Queue as follows: first all genuine requests (i.e., those from downstream nodes) are served in order of arrival, then the artificial request from the node s Bandwidth Balancing Counter is served, then the node s local data segment is served. This viewpoint yields the following two upper bounds on the time when a genuine request r is served at a node n : (i) the time when r is served at the node immediately upstream from n, plus the one-way node-to-node propagation delay; (ii) the arrival time of r at n, plus the round-trip propagation delay between n and the network s upstream end. (Bound (i) can be proved by induction on the requests. Bound (ii) follows from (i) and induction on the nodes.) Bound (ii) shows that the number of genuine requests in a node s Transmit Queue can be at most 2D MAX +1. Adding in one possible artificial request from the Bandwidth Balancing Counter bounds the Request Counter value at 2D MAX +2. We will now offer an approximate analysis of bandwidth balancing during simultaneous file transfers by two nodes separated by a propagation delay of D slots. Call the upstream node 1 and the downstream node 2. We will show that the bandwidth balancing scheme converges to the steady-state throughputs given by equation (5.2), independent of the initial conditions created by the previous history of the system. We will also determine the rate of convergence. Although the analysis assumes Deference Scheduling, simulations will show that the Distributed Queueing implementation also achieves the desired steady-state throughputs.

- 16 - First we show that the Request Counters of both active nodes drain rapidly. Suppose that both file transfers have started, and that all other nodes have been inactive for at least D MAX slot times, so that the effects of these other nodes have disappeared from the buses (though not necessarily from the Request Counters). In every M +1 slot times, nodes 1 and 2 each transmit at most M data segments and at most M requests; i.e., node 1 leaves at least one idle data slot and node 2 leaves at least one vacant request bit. Each of these holes gives the other node a chance to decrement its Request Counter. Since the Request Counter values started at 2D MAX +2 or less, they will drain to zero within (2D MAX +2). (M +1) slot times, and thereafter they will never increase above one. Now we can show how the throughput rates converge to a fair allocation. Assume that at time 0, the Request Counters have already drained. Also assume that a fraction f B of the D busy bits in transit between nodes 1 and 2 on the data bus and a fraction f R of the D request bits in transit between the nodes are set. For convenience, define: α = M 1 + M Each node transmits in a fraction α of the idle slots available to it for its own data transmission. Consequently, in the first D slot times, node 1 will transmit in α(1 f R )D slots and node 2 will transmit in α(1 f B )D slots. In the next D slot times, node 1 transmits in α[1 α(1 f B )]D slots, while node 2 transmits in α[1 α(1 f R )]D slots. The throughput of a node over half a round-trip time depends on the other node s throughput in the previous half round-trip time. (This analysis is approximate; in the interval D a node actually acquires an integer number of slots.) Let γ(1, k ) and γ(2, k ) be the fraction of the bandwidth acquired by nodes 1 and 2 respectively during slots kd to (k +1)D, where k =0,1,2,... The analyses for the two nodes are similar and we shall concentrate on the bandwidth acquired by node 1. Consider the sequence γ(1, k ) to be composed of two subsequences: a subsequence of even terms γ e (1, m ) = γ(1, 2m ) and a subsequence of odd terms γ o (1, m ) = γ(1, 2m +1), for m =0,1,2,... Both subsequences γ e (1, m ) and γ o (1, m ) satisfy the same difference equation, for m =1,2,3,...,

- 17 - γ e (1, m ) = α(1 α) + α 2 γ e (1, m 1) γ o (1, m ) = α(1 α) + α 2 γ o (1, m 1) but they have different initial conditions: γ e (1, 0) = α(1 f R ) γ o (1, 0) = α[1 α(1 f B )] The throughput of node 1 over half round-trip times can be found by separate Z-transform analyses of the even and odd subsequences: γ(1, k ) = α f R α α k +1, k even 1 + α 1 + α α α + f B α k +1, k odd 1 + α 1 + α (5.5) Similarly, the throughput of node 2 over half round-trip times is given by: γ(2, k ) = α f B α α k +1, k even 1 + α 1 + α α α + f R α k +1, k odd 1 + α 1 + α (5.6) We can use the model developed above to analyze various possible scenarios in the simultaneous transfer of two files, some of which are listed below. Both nodes turn on at the same time: f B = f R = 0 The upstream node turns on at least half a round-trip time before the downstream node: f B = α, f R = 0. The downstream node turns on at least half a round-trip time before the upstream node: f B = 0, f R = α. The approximate throughput expressions are found to match simulation results reasonably well. Figure 5 compares the analysis with simulation results for the case where the two active nodes are separated by 38 slots ( 22 km), the upstream node starts transmitting at least half a round-trip time before the downstream node, and α = 0.9. The plotted throughputs are measured over successive full round-trip times (i.e., successive 76-slot intervals). Simulation results are shown for both Deference

- 18 - Scheduling and Distributed Queueing. Let us make a few remarks on equations (5.5) and (5.6). Note that in steady state the nodal throughputs are each α 1 + α = M 1 + 2M and the amount of system bandwidth wasted is _ 1 α 1 + α = 1, in accord with equation (5.2). For example, if α = 0.9, 5.3 percent of the 1 + 2M bandwidth is wasted. Note, moreover, that the steady-state nodal throughputs are independent of the initial conditions f B and f R, in marked contrast to the behavior of DQDB without bandwidth balancing, shown in Figure 3. Finally note that, while the exact transient depends on f B and f R, the rate of convergence depends only on α. For example, if α = 0.9, then the error (i.e., the unfairness) in each nodal throughput γ(n, k ) shrinks by a factor of 0.9 22 =0.1 every 22D slot times. In other words, each nodal throughput moves 90 percent of the way to its steady-state value every 11 round-trip times. A lower α results in faster convergence but more bandwidth wastage. The effect of different values of α on the convergence rate and on the steady-state throughputs is shown in Figure 6, for the same scenario as Figure 5. 5.3.2 Simulation Figure 7 depicts simultaneous file transfers by three nodes, with 28 slots ( 16 km) between successive nodes. The plot shows the average nodal throughputs measured over successive 112-slot intervals. Bandwidth balancing is used, with M =9. The system starts in an idle state. The most upstream node comes up first and immediately achieves a throughput of 9/10, in accord with equation (5.2). The most downstream node turns on next and contends with the upstream node for an equal share of the bandwidth, viz., 9/19. The middle node turns on next, and the system again adjusts so that all three nodes achieve the throughput of 9/28 predicted by (5.2). The most downstream node and then the middle node complete their file transfers, and in each case the system adjusts rapidly and redistributes the available bandwidth equally. Note that the amount of wasted bandwidth decreases as the number of active nodes increases. (The simulation for Figure 7 used Distributed Queueing; the simulation was also performed with Deference Scheduling, and the results were virtually indistinguishable.) An interesting feature of bandwidth balancing is that nodes whose offered load is less than the

- 19 - control rate are not rate-controlled. The remaining bandwidth is distributed equally among the ratecontrolled nodes, in accord with equation (5.2). The simulation results in Table 1 show the distribution of bandwidth among three active nodes, when the upstream and downstream nodes are involved in long file transfers and the middle node is a low-rate user, with either Poisson or periodic segment arrivals. The results are the same whether Distributed Queueing or Deference Scheduling is used. 6. BANDWIDTH BALANCING USING LOCAL PRIORITY INFORMATION 6.1 Concept Now let us introduce multi-priority traffic. As before, our bandwidth balancing procedure will guarantee that there is some unused bus capacity and ask each parcel to limit its throughput to some multiple of that spare capacity. Now, however, the proportionality factor will depend on the priority level of the parcel. Specifically, the parcel of priority p is asked to limit its throughput to a multiple M p of the spare bus capacity; parcels with less demand than this may have all the bandwidth they desire: r p (n ) = min ρ p (n ), M p. U = min ρ p (n ), M p. 1 Σ Σ r q (m ) m q (6.1) Note that every active parcel in the network gets some bandwidth. This scheme is fair in the sense that all rate-controlled parcels of the same priority level get the same bandwidth. Parcels of different priority levels are offered bandwidth in proportion to their bandwidth balancing moduli M p. Given the offered loads ρ p (n ) and the bandwidth balancing moduli M p, equation (6.1) can be solved for the carried loads r p (n ). In the special case where all N p parcels of priority level p have heavy demand, the solution has an especially simple form: M p r p (n ) = 1 + Σ M. (6.2) q N q q Suppose, for example, that there are three priority levels and M 1 = 2, M 2 = 4, and M 3 = 8. If there is one active parcel of each priority, then the parcels throughput rates are 2/15, 4/15, and 8/15, and the unused bandwidth is 1/15 of the bus capacity.

- 20-6.2 Implementation For this "local" version of bandwidth balancing, the slot header need only contain the busy bit and a single request bit. In order to implement equation (6.1), the node should respond to arriving busy bits and request bits in such a way that: S p (n,t ) M p. U (n,t ) (6.3) The most straightforward way to implement (6.3) is to construct a separate section for each priority level p, similar to Figure 2, then add a Bandwidth Balancing Counter with modulus M p to that section. A more compact implementation is shown in Figure 8. Here the node has only one section with one Data Inserter and one Request Inserter to manage data of all priority levels, but a separate Local FIFO Queue for each priority is required. The Data Inserter may serve its Transmit Queue using either Distributed Queueing or Deference Scheduling. A Gate controls the movement of local data segments from the Local FIFO queues to the Data Inserter. A local data segment must be authorized (as explained below) before it may pass through the Gate, and the Data Inserter may only accept and process one authorized segment at a time. Whenever the Data Inserter observes an unallocated slot, it authorizes M p local data segments for each priority level p. (If fewer than M p segments of priority p are available, then all these available segments are authorized and the extra authorizations expire.) The order in which authorized segments pass through the Gate is unimportant, as long as FIFO order is preserved among segments of the same priority level. When all authorized segments of all priority levels have been transmitted, the Data Inserter is temporarily prevented from processing any more local data. Because the other nodes are following the same discipline, however, the Data Inserter will eventually detect an unallocated slot and create more authorizations. 6.3 Performance Figure 9 shows simulation results for the bandwidth balancing scheme with local priority information, using the compact implementation described above and using Distributed Queueing. As in the simulation of Figure 7, the bus is shared by three nodes spaced apart by 28 slots ( 16 km) for a round-trip delay of 112 slot times. Figure 9 shows the nodal throughputs over successive round-trip

- 21 - times. There are three priority levels of traffic, and their bandwidth balancing moduli are 2, 4, and 8. First the node farthest upstream begins transmitting a long message of medium priority. As predicted by equation (6.2), this node acquires 4/5 of the bus bandwidth. Later the downstream node gets a highpriority message to transmit, and after several round-trip times it achieves a throughput rate of 8/13, while the medium-priority parcel is cut back to 4/13, again as predicted by (6.2). Finally, the middle node becomes active at low priority, and the nodal throughputs shift to 8/15, 4/15, and 2/15, in accord with (6.2). 7. BANDWIDTH BALANCING USING GLOBAL PRIORITY INFORMATION 7.1 Concept Now we assume that every node can determine the bus utilization due to traffic of each priority level. Each parcel is asked to limit its throughput to some multiple M of the spare bus capacity not used by parcels of equal or greater priority; parcels with less demand than this may have all the bandwidth they desire: r p (n ) = min ρ p (n ), M. U p + = min ρ p (n ), M. 1 Σ m q Σ r q (m ) p (7.1) This scheme is fair in the sense that all rate-controlled parcels of the same priority level get the same bandwidth. Allocation of bandwidth across the various priority levels is as follows: First, the entire bus capacity is bandwidth-balanced over the highest-priority parcels, as though the lower-priority parcels did not exist. Bandwidth balancing ensures that some bus capacity will be left unused by the highestpriority parcels. This unused bandwidth is then bandwidth-balanced over the second-highest-priority parcels. The bandwidth left over after the two highest priorities have been processed is then bandwidth-balanced over the third-highest-priority parcels, etc. We emphasize that with this scheme, in contrast to the scheme of Section 6, the throughput attained by a parcel of a given priority is independent of the presence of lower-priority parcels anywhere in the network. Given the offered loads ρ p (n ) and the bandwidth balancing modulus M, equation (7.1) can be solved for the carried loads r p (n ). In the special case where all N p parcels of priority level p have heavy demand, the solution has a simple form:

- 22 - r p (n ) = M Π (1 + M. (7.2) N q ) q p For example, if M = 4 and there are three active parcels of three different priorities, then the parcels throughput rates are 4/5, 4/25, and 4/125, and the unused bandwidth is 1/125 of the bus capacity. 7.2 Implementation For this "global" version of bandwidth balancing, the slot header must contain the busy bit, an indication of the priority level of the data segment in a busy slot, 6 and one request bit for each priority level. By reading these fields, each node can determine the priority level of all traffic on the bus (i.e., there is "global" priority information). In order to implement equation (7.1), node n should respond to arriving busy and request information in such a way that: S p (n,t ) M. U p + (n,t ) (7.3) As shown in Figure 10, the node needs a separate section to manage data for each priority level. Each section has its own Data Inserter, Request Inserter, Local FIFO Queue, and Gate. Each Data Inserter may serve its Transmit Queue using either Distributed Queueing or Deference Scheduling. Inequality (7.3) can be implemented by the node section of priority p as follows. Only authorized segments may pass through the Gate, from the Local FIFO Queue into the Data Inserter. The Data Inserter is still restricted to processing only one (authorized) local data segment at a time. Whenever the Data Inserter observes a slot that is not allocated to traffic of priority p or higher, it authorizes up to M additional local data segments that were not previously authorized. (If fewer than M unauthorized segments of priority p are available, then all these available segments are authorized and the extra authorizations expire.) Note that there are two circumstances under which the Data Inserter observes such a slot: (a) the slot is already busy with a segment of priority less than p when the slot arrives at the Data Inserter, or (b) the slot arrives empty and finds the Transmit Queue inside the Data Inserter also empty, holding no local data segment and holding no requests from downstream nodes. 6. In the current DQDB standard, the Access Control Field of the slot header does not include the data priority level. However, the field has enough spare bits that this information could be added in a future version of the standard.