Implementation Based Evaluation of a Full-Fledged Multi-Hop TDMA-MAC for WiFi Mesh Networks

Implementation Based Evaluation of a Full-Fledged Multi-Hop TDMA-MAC for WiFi Mesh Networks Vishal Sevani, Bhaskaran Raman, Piyush Joshi Department of CSE, IIT Bombay, INDIA {vsevani,br}@cse.iitb.ac.in, piyush28joshi@gmail.com ABSTRACT Wireless mesh networks in general, and WiFi mesh networks in particular, offer a cost-effective option to provide broadband connectivity in sparse regions. Effective support for real-time as well as high throughput applications requires a TDMA-based approach. However, multi-hop TDMA implementations in wireless have been few and far-between, and for good reasons. These present significant issues in terms of time synchronization, TDMA schedule dissemination, multi-channel support, routing integration, spatial reuse, etc. And achieving these efficiently, in the face of wireless channel losses presents a formidable challenge. In this work, we present an implementation of LiT MAC, a full-fledged multi-hop TDMA MAC, on commodity WiFi platforms. We undertake extensive evaluations: micro-benchmarks as well as application level performance, using outdoor as well as indoor testbeds. We also present an integration of LiT MAC with various routing metrics, and a routing stability study of recently proposed routing metrics (ROMA, SLIQ). Our results show that we can achieve µs granularity time synchronization across several hops, and TDMA slot size as small as 2ms. These imply low control overheads. Experiments over several days, on our 9-node outdoor testbed shows that LiT MAC s soft-state based approach is effective in robust operation even in the presence of significant external interference. 1. INTRODUCTION It is well known that in the context of wireless mesh networks, a random access based MAC protocol such as CSMA/CA cannot provide QoS for applications such as real time voice, video streaming, etc. [26]. A TDMAbased approach is necessary for good throughput as well as low delay/jitter. However a look at the history of wireless data standards reveals that more often than not, multi-hop TDMA has only been part of the standard on paper, without any implementation based evaluation. A classic example is that of Bluetooth, with scatternets never seeing the light of day. Even with the more recent WiMAX standard, we are not aware of any implementation based evaluation of the multi-hop mesh mode. A notable exception to this pattern is the case of 82.15.4 mesh networks [18]. However, this is intended only for low data rate sensing applications. What makes multi-hop TDMA in wireless tough? The first is the issue of fine-grained time synchronization across multiple hops. This must be achieved and maintained in the face of wireless packet losses. Second, to achieve high performance, it must provide support for spatial reuse as well as multiple channels. Intuitively, this can most effectively be done with a centralized approach [14]. Third, nodes in the network must agree upon a schedule of transmission. In the context of centrally determined time-slot schedules, this translates to schedule dissemination, from the central node to other nodes. This too must be done in face of wireless losses, as well as dynamic arrival and departure of network flows. And finally, the TDMA MAC must integrate easily with a routing mechanism. It is this set of formidable challenges that have made most wireless data standards shy away from taking multihop TDMA to practice. Even recent research based efforts have only been piece-meal in nature. For instance, the multi-hop TDMA implementations in [15, 12] deal primarily with time synchronization, and have been evaluated only in relatively interference free (i.e. using 82.11a) indoor settings. And while Overlay MAC [22] considers a distributed mechanism for nodes to agree upon a schedule, it works under the restricted assumption of single-channel operation, and a 2-hop interference model. In our recent prior work [14], we have designed LiT MAC, a full-fledged multi-hop TDMA mechanism, which addresses all the above-listed challenges. [14] reports an 82.15.4-based implementation of LiT MAC and its evaluation. The contributions of this paper are threefold. (1) First, we implement the various features of LiT MAC on commodity WiFi hardware, using modifications to the open-source MADWIFI driver. The use of commodity hardware is of practical significance since it allows us to ride on the low-cost benefits of these platforms. (2) Next, we evaluate the implementation along various dimensions: synchronization error, time granularity of scheduling, achievable throughput for UDP 1

& TCP flows, jitter for CBR flows, spatial reuse, etc. We have used indoor as well as outdoor testbeds for such evaluation. (3) Third, we have integrated a routing mechanism with the LiT MAC and performed a routing stability evaluation on the outdoor testbed. The stability analysis includes three routing metrics: the popular ETX metric [8], and the more recent ROMA [1] & SLIQ [7] metrics. Such evaluation is of significance since ultimately application QoS depends not only on the MAC performance, but also on the stability of routes during packet flow. Our highlight results are as follows. (A) By making use of hardware packet timestamping, we attain time synchronization accuracy with error of only about 1µs even at 7 hops. This implies a low guard time overhead. (B) Using only Linux s software timers, we can achieve a slot size as small as 2ms, while CPU interrupt overheads prevent the use of smaller slot sizes. But a slot size of 2-5ms is often small enough to achieve low delay/jitter in practice. (C) The integration of LiT MAC with the SLIQ routing metric works well; SLIQ s stability is much better than that of ROMA, which in turn is more stable than ETX. We believe that our implementation and evaluation is of practical significance to the mesh networking community. The rest of the paper is organized as follows. The next section (Sec. 2) presents related work. Sec. 3 briefly describes LiT MAC, to setup the context for a description of its implementation over commodity hardware, in Sec. 4. Subsequently, Sec. 5 presents microbenchmarks and parametric evaluations of the implementation, while Sec. 6 describes overall evaluations of our implementation. Sec. 8 concludes the paper with a few points of discussion. 2. RELATED WORK We now compare our work with prior related work. Single-hop TDMA: Single hop wireless TDMA systems are aplenty. Significant such systems closely related to this work are as follows. Our methodology of modifying the MADWIFI open source driver to build a TDMA WiFi system on commodity hardware is similar to [24, 16, 25]. Apart from the primary difference that these are evaluated for single hop settings while we focus on multi-hop settings, there are other differences too. For instance, [24] makes use of out of band ethernet based time synchronization, which is not applicable for mesh networks. [16] does not evaluate the time synchronization accuracy of the proposed mechanism, but its extension [25] reports a time synchronization accuracy of about 25µs. However, the slot size used by [25] is in the range of 2-6 ms, with a guard time of 4-12 ms. In comparison, we use slot sizes of 2-1 ms, with a guard time of 1µs. Multi-hop TDMA: Table 1 compares prior multihop TDMA MAC implementations with our work succinctly; the details are as below. Overlay MAC [22] also modifies the MADWIFI driver to implement a TDMA MAC. It proposes a distributed mechanism for nodes to arrive at an agreement on slot usage. However, unlike our work, [22] has the significant restriction that all nodes should use the same channel in all time slots. Also, [22] reports evaluation only in an interference free, indoor, 82.11a testbed. [15] outlines the design of a TDMA MAC protocol based on in-band time synchronization and takes into account hardware bottlenecks such as clock drift, processing delay, etc for calculating the guard interval, slot duration, etc. It also takes into account wireless errors for calculating the synchronization period, i.e. time interval after which the network needs to be re-synchronized. It outlines the implementation of the protocol on the proprietary WiLD MAC platform with time synchronization accuracy of the order of microseconds, and slot granularity of the order of milliseconds. [12] proposes a pairwise synchronization algorithm similar to LiT MAC, and takes into account propagation delay to calculate clock difference between two nodes. It makes use of a guard band to account for processing delays encountered while packet transmission, and carries out measurements to empirically tune the value of the guard band. Based on the measurements carried out on an indoor testbed, it reports synchronization accuracy of the order of microseconds at three hops. While [15] & [12] are closest to our work in terms of µs granularity of synchronization, the differences are many. [15, 12] have primarily focused on time synchronization, while we have also considered issues such as schedule dissemination (and the corresponding overhead), integration with a routing protocol, and also an implementation on commodity hardware. We have also evaluated the multi-hop TDMA MAC on interference prone 82.11b/g networks, while [15, 12] have only considered relatively interference free 82.11a settings. A further difference is that while the testbed used in [15, 12] are indoor, our testbed is outdoor; this is significant since after all mesh networks are intended for outdoor deployment. [17, 2, 6] outline TDMA-based MAC protocols for long-distance networks. Apart from the difference that these use loose time synchronization of the order of milliseconds, the primary difference is that these are applicable only for long-distance networks with highly directional antennas, while our work considers a generic mesh network with directional or omni-directional antennas. TSMP [18] describes an 82.15.4-based multi-hop TDMA MAC implementation. 82.15.4 not only has very different modulation scheme from 82.11, but also has much lower data rate (25Kbps max), and is intended for low data rate sensing applications. We consider more 2

Synchronization Schedule Routing Spatial Reuse Accuracy Dissemination Integration Integration Evaluation Overlay MAC [22] order of ms distributed No 1-chnl only 6 node, indoor, 82.11a Wildnet[17] a order of ms No No No 3 node, outdoor, 82.11b 2P [2] b loose No No restricted 3 node, outdoor, 82.11b TDM-MAC [15] order of µs No No No 8 node, indoor Soft-MAC [12] order of µs No No No 4 node, indoor TSMP [18] order of µs Yes Yes No 24-44 node, indoor, 82.15.4 LiT MAC on WiFi 9 node, outdoor, 82.11g order of µs Yes Yes Yes (our work) 15 node, indoor, 82.11g Table 1: Comparison with prior TDMA multi-hop implementations a The TDMA throughput evaluation in [17] was only in 3-node, 2-hop settings, while their testbed itself had 6 nodes. b 2P has no strict time sync requirement; and its spatial reuse is restricted to directional links in a bipartite network. traditional applications such as bulk data transfer, audio/video, etc. which require high data rates. Routing stability studies: Our work includes an integration of LiT MAC with routing metrics and a routing stability study. [19, 9] are among earlier works to study routing instability due to link metric variations. [19] studies the extent of instability with the use of ETT metric in mesh networks. [9] on the other hand studies how do different factors such as link metric variations, channel conditions, time of the day, background traffic, etc affect routing stability. However, both these studies have carried out measurements for CSMA/CA based mesh network where the protocol behaviour and background traffic impacts the link metric computation. Our work is in context of TDMA mesh networks wherein links metric probes are transmitted in dedicated time slots. Further, our study includes the recently proposed ROMA [1] and SLIQ [7] metrics which offer much better stability compared to the ETX metric considered in [19, 9] (see Sec. 6.2 for quantitative differences). [7] proposes the use of SLIQ metric to provide improved stability. However [7] presents only trace based analysis, of traces from a CSMA/CA based network. In our work, we have integrated SLIQ with the LiT MAC, and have quantified SLIQ s benefits in terms of routing stability, in a realistic outdoor testbed. 3. LIT MAC: PROTOCOL DESIGN We now give a brief description of the salient features of LiT MAC; specifically, time synchronization, schedule dissemination, and how these work robustly in the presence of wireless errors. This sets up the context for later sections. Time synchronization & schedule dissemination To facilitate time synchronization and TDMA schedule dissemination, LiT MAC follows a frame based mechanism, wherein a slot is a unit of allocation, and a frame consists of three different types of slots: control, contention and data slots (see Fig. 1). The use of these three types of slots is as follows. Control slots: These are used for time synchronization as well as to convey resource allocation i.e. to disseminate the TDMA schedule. For the transmission of control packets, the network topology is arranged as a tree rooted at the root node. For time synhronization, LiT MAC makes use of a pairwise synchronization mechanism wherein a given node is time synchronized with its parent in the control tree. The control packet consists of following three types of information, time synchronization information, the control schedule, and the data schedule. The control schedule is nothing but the allocation of control slots to each node in the network. The control slots for all the nodes need not be in a single frame: different nodes can be allotted slots in different frames; Fig. 1 shows such an example where the control schedule spans two successive frames. Contention slots: In the contention slots the nodes transmit packets probabilistically. These slots are used to facilitate asynchronous events such as a new node joining the network, resource reservation for a new flow, or a routing metric update. A new node joins the network by sending node-join request packets toward the root node, using the contention slots. Likewise, slot allocation for a new flow is accomplished by making use of flow request packets. The node which wants to initiate a flow transmits flow request packets in the contention slot, which then propagates up the control tree towards the root node. Depending on the availability the root node allocates slots to the new flow, which is communicated to the nodes via the data schedule contained in the control packets. To facilitate dynamic routing, LiT MAC makes use of topology update packets that can be used by the nodes to convey routing metric information to the root node. Data slots: Data slots are used for the actual transfer of the data. As mentioned above, a node is communicated about its data slots via the data schedule contained in the control packets. As with control slots, the data schedule may span across multiple frames. Figure 1: LiT MAC: An example slot allocation Fig. 1 shows a possible control schedule for an ex- 3

ample topology and a data schedule for a bi-directional data flow between nodes 3 and 2. In this example, the control schedule as well as the data schedule span two adjacent frames. Handling wireless losses Any realistic deployment will face wireless losses due to fading and interference. To handle wireless losses, LiT MAC makes extensive use of soft-state mechanisms. For instance, soft-state based periodic transmission of the control & data schedules ensures that all nodes eventually follow the same schedule. Likewise nodes transmit topology update packets periodically to the root node to inform that they are alive. Soft-state is used to setup, maintain, and terminate flows too. That is, a flow-request is repeatedly sent until the corresponding slot allocation is given (or a timeout is reached): so the loss of flow-request packets are handled elegantly. Even after flow setup, flow-renewal packets are periodically sent, failing which the root node terminates a flow (i.e. deallocates the corresponding data slots). Although such use of soft-state mechanism enables LiT MAC to elegantly handle wireless losses and maintain the network in consistent state, it has a cost. The overheads of the frame structure and the soft-state mechanism are quantified in our performance evaluation (Sec. 6). We now move on to the description of the LiT MAC implementation. 4. IMPLEMENTATION OF LIT MAC Our goal is to implement a full-fledged multi-hop TDMA MAC on commodity hardware, to take advantage of the low-cost benefits of the same. This involves resolving some important issues, as we describe below. 4.1 TDMA Mechanism There are two important components of the TDMA mechanism 1. time synchronization i.e. how accurately the clock across the different nodes is synchronized with each other, and 2. how accurately we can control packet transmission times, so that nodes transmit only in their allotted data slot. We describe the implementation details for both of these below. 4.1.1 Time synchronization Time Synchronization Mechanism: We make use of the following characteristics of the WiFi hardware for time synchronizing the nodes. As per 82.11 standard the beacon packets contain the exact timestamp at which these packets get transmitted. Likewise at the receive side the hardware records the time at which the packet is received. We make use of this transmit and receive timestamps to calculate the difference in the clock between the two nodes. We term this difference as offset. Since LiT MAC makes use of pairwise synchronization along the control tree, every node propagates its offset with respect to the root node down the control tree, and thereby every node is synchronized with the root node. Discrepancy in WiFi hardware: The hardware timestamp in beacon packets, mentioned above, corresponds to the time at which 25th byte of the packet is transmitted on air. We observed that even when we set the transmit rate to more 1Mbps, the WiFi NIC computes the time assuming that the packet transmission rate is 1Mbps. So for control packets that are transmitted at data rate more than 1Mbps, we accounted for this discrepancy by recomputing the timestamp at the receiver appropriately by taking into account the packet transmit rate. 4.1.2 Controlling packet transmissions The mechanism that we follow for packet transmission is that, at the start of the transmission slot of a node the software driver inserts the number of packets that can be transmitted in that slot into the hardware queue of the WiFi NIC, which then transmits the packets. Now to accurately control the packet transmission time, there are two requirements, 1. The packet transmission event should be triggered exactly at the start of the slot, and 2. Once the packet is put in hardware queue, it should be transmitted immediately without waiting for the medium to be free. We describe the implementation details for these two requirements below. Use of timers: To trigger packet transmission event, we need to make use of timer. We have option of using either the 1ms granularity software timer provided by linux kernel or the hardware timer provided by WiFi commodity hardware. However, for hardware timer we observed that under heavy load (where load implies WiFi packet reception) the timer accuracy is very low [11]. So we make use of 1ms software timer, which is quite accurate, for our purpose. However, for making use of software timer we need to resolve the following issue. The linux software timer is of 1ms granularity, whereas the hardware clock of the WiFi NIC is of 1µs granularity. So it is likely that millisecond boundary of the software timer is not aligned with the slot boundary of the TDMA i.e. the software timer can trigger at-most 1ms before or after the start of the TDMA slot. The event wherein the timer triggers before the slot start can be handled easily by doing busy wait till the start of the slot as shown in Fig. 2(a). However the event when the timer triggers after the slot start is undesirable as it leads to inefficiency, as for the time since the slot start and triggering of the timer no packets are transmitted. So in case the timer triggers after the start of the slot, we descrease the timer trigger interval by 1ms so that for the next as well as subsequent TDMA slots the timer will trigger before the start of the slot as shown in Fig. 2(b). Discrepancy in WiFi hardware: Another issue that our measurements revealed is that there is drift 4

Figure 2: Triggering packet transmission event between the clock of the software timer and WiFi NIC clock. We observed that the software timer clock typically drifts behind the WiFi NIC clock by 1µs in 4ms. So the time difference between the software timer trigger and slot start (busy wait period in Fig. 2(a)) slowly increases. However our scheme ensures that this difference does not increase beyond 1ms. This is since we use WiFi NIC clock for time synchronizing the nodes and also to identify TDMA slot boundaries. Software timer is used only to trigger packet transmission event. Now when the difference between software timer trigger and slot start becomes more than 1ms, we trigger next packet transmission event after interval of slot size with respect to the millisecond boundary of software timer clock just before the slot start, and not the current software timer trigger time as shown in Fig. 2(c). This ensures that for the next software timer trigger the difference with respect to TDMA slot start again decreases to a small value as shown in Fig. 2(c). CCA disable and guard time: Once the packet is inserted into the hardware queue by the driver, it is essential that the the packet is transmitted on-air immediately. However, the WiFi hardware carries out clear channel assesment (CCA) and transmits the packet only when the medium is free. So in presence of interference the packet transmission can get delayed. So the CCA needs to be disabled [?]. Apart from disabling CCA, another factor that needs to be accounted for is the software and hardware delay in transmitting the packet, clock drift between the nodes, synchronization error, etc. Our prior work [11] has carried out measurements to determine the guard time and observed that a value of 1µs is sufficient to account for these factors. 4.2 Error recovery To account for packet loss present in realistic setting, it is essential to incorporate error recovery mechanism. For error recovery we make use of automatic request for retransmission (ARQ). Now the ARQ scheme used by 82.11 standard is per packet acknowledgement with immediate retransmission of the lost packet. For our TDMA MAC protocol also similar scheme can be followed. However, we observed that the retransmission of the lost packet is carried out in hardware. As we do not have access to the firmware code in hardware, we have implemented our TDMA protocol by making changes in the driver code. To implement the similar scheme in software in the driver, we require that the driver be notified immediately about the lost packet (i.e. the packet for which acknowledgement is not received). But we observed that for the hardware we use, this notification is received at the driver, after about 2-3 µs. Waiting for such long time to retransmit the lost packet can be inefficient. So, instead the scheme that we follow is that we transmit n distinct packets in a given slot, where n is the number of the packets that can be transmitted in that slot. Of these n packets, if any of the packets is lost then we retransmit it in the next allotted data slot for the node. Assuming that the next allotted data slot for the node is after maximum notification delay (i.e. about 3 µs), the notification would have been received by the time the next data slot starts and the node can retransmit the lost packet if needed. Our use of per-packet acknowledgement with retransmission of lost packet in next data slot is different from bulk acknowledgement and adaptive Forward Error Correction (FEC) based error recovery schemes employed by [17]. We give a quantitative comparision of our scheme vis-a-vis the schemes employed in [17] in Sec.??. 4.3 Routing and spatial reuse Another benefit of centralized design of LiT MAC is that it facilitates easier integration of generic routing as well as techniques such as spatial reuse. We have incorporated changes in the LiT MAC to incorporate routing and spatial reuse which we describe below. Routing: There are two aspects as regards to incorporate routing mechanism in the MAC protocol. Firstly the computation of the routing metric and secondly since we follow centralized design, the routing metric has to be conveyed to the root node. The routing metric is typically computed by measuring packet delivery ratio over a given link. For metric computation, it is essential to answer the following questions. 1. What packets to use to compute the metric? 2. Should separate probe packets be sent or existing traffic be used? 3. If the existing traffic need be used then packets in which slots should be used?. 5

We cannot use the packets sent in contention slot as they have low periodicity and may get lost due to collisions. Also, we cannot use data slots as these are allotted only to nodes that have data traffic to send. Explicitly alloting data slots to nodes for sending probe packets can be inefficient. Control packets, on the other hand are broadcasted periodically by each node. So by making use of control packets we can facilitate routing metric computation at no additional overhead. For periodically conveying the routing metric information to the central root node, we make use of the topology update packets sent in the contention slots. Spatial reuse: Spatial reuse means alloting same data slots to the nodes that do not affect each others transmissions. For incorporating spatial reuse we make use of the Signal to Interference (SIR) ratio based technique outlined in [21]. SIR technique requires that received signal strength (RSS) profile, i.e. RSS of all the neighbours for each of the node in the network. Since in LiT MAC, the central root node carries out the scheduling it is essential that the RSS profile be constructed at the root node. The methodology that we follow for contructing the RSS profile is as follows, every node records the RSS of the control packet that it receives from its neighbours. It then includes this RSS information of its neighbours in its control packet. Assuming a bi-directional link, the control packet of the child node will be received by the parent node. The parent node collects this RSS information contained in the control packet of its children and includes this information also in its control packet along with RSS information of its own neighbours. Likewise the RSS information of the entire network reaches the root node which then constructs RSS profile. We carry out evaluation for routing and spatial reuse in Sec.??. In the next section we evaluate how different parameters of the TDMA MAC protocol affect the performance [?]. 5. PARAMETRIC EVALUATION In this section, we study the effect of different parameters such as slot size, number of per-hop retransmissions of lost packet, etc on the performance of the TDMA MAC protocol. These measurements serve two purpose, firstly they provide an insight into what should be these parameter values for application-specific requirement and secondly they enable us to verify the performance of our implementation methodology for TDMA mechanism and error recovery outlined above. The hardware that we have used consists of single board computers, Mikrotik RB 433AH/411AH/411AR [1]. The wireless cards that we have used are Ubiquiti XR2 [2] and Mikrotik R52/R52H [1]. Some of the nodes run Openwrt Kamikaze v8.9, while other nodes run Openwrt Backfire v1.3 [3]. We have modified Madwifi driver code v.9.4 revision 3314 [4] to implement the LiT MAC protocol. Time Synchronization Accuracy: The time at which the packet is received is recorded accurately by the hardware [13]. We make use of this fact To measure the time synchronization accuracy over a mesh networks and carry out the following experiment. Figure 3: Experimental set-up to measure time synchronization error We enforce a 8 node 7 hop linear control tree of LiT MAC with a laptop kept in the vicinity as shown in Fig. 3. The laptop transmits measurement beacons every 1ms. As every LiT MAC node is synchronized with the root node, it records the sequence number and the global time (time at the root node) at which the measurement beacon is received. We now measure synchronization error for a given node as global time recorded by the root node minus the global time recorded by the given node for a given measurement beacon. We run the experiment for an hour and take the average synchronization error across the measurement beacon packets received in that duration. We also have continous backlogged UDP traffic running on the nodes to verify if synchronization accuracy is affected under the presence of heavy load. Avg Synchronization Error(us) 1 8 6 4-2 2-4 -1-8 -6-12 -14-16 -18-2 1 2 3 4 5 6 7 8 Hops Figure 4: Time synchronization error Accordingly Fig. 4 shows the average synchronization error in µs, as function of number of hops from the root node. The error bars in Fig. 4 give standard deviation. As can be seen the synchronization error increases as the hop count from root node increases, but the error is still about -1 µs even at 7 hops. Negative sign indicates that the global time as recorded by non-root nodes is more than that by the root node. The standard deviation of synchronization error, as can be seen from Fig. 4, is also about 4-6 µs except for the node at second hop for which the standard deviation is about 1 µs. Thus, these results indicate that using commodity hardware, it is possible to provide synchronization accuracy of the order of microseconds in mesh networks 6

Effectiveness of disabling CCA: It is essential to quantify how accurately can the packet transmission time be controlled with CCA disabled for which we carried out the following experiment. In a two node topology with a root node and a first hop node, the root node transmits backlogged UDP traffic at the data rate of 6M to the first hop node, which measures the inter-arrival time of the packets received. We carry out these experiments for the cases of CCA enabled and CCA disabled. For the case of CCA disabled we carry out experiments both in absence as well as presence of external interference. For producing external interference, we make a laptop connect to an WiFi access point in the vicinity of the node and carry out continous data transfer. We measure inter-arrival time for ten thousand packets from the root node. % of packets 1 8 6 4 2 CDF of inter pkt gap 1 15 2 25 inter pkt gap (us) (a) CCA enabled % of packets 5 4 3 2 1 without ext traffic with ext traffic 78 8 82 84 86 88 9 94 inter-arrival time (us) (b) CCA disabled Figure 5: Inter-arrival packet time Accordingly Fig. 5 shows the CDF of inter-arrival time for the case of CCA disabled and frequency plot of inter-arrival time for the case of CCA enabled, both in absence as well as presence of external traffic. As can be seen from Fig. 5(a), only about 5% of the packets have inter-arrival time less than 9 µs, whereas about 4% of the packets have inter-arrival time in the range 9-14 µs. While for the case of CCA disabled inter-arrival time is in the range 8 to 9 µs for both the cases of with and without external traffic. Though for the case of CCA disabled also we observed that the maximum inter-arrival time was about 2 µs, but the inter-arrival time was more than 9µs for only a tiny fraction (less than.1%) of the total of ten thousand packets. Now the expected inter-arrival time is, = SIFS + ack transmission time + DIFS = 1 + 38.6 + 3 = 78.6µs As can be seen from Fig. 5, the measured inter-arrival time is only about 5-1 µs more than the expected time for most of the packets. Thus, these results indicate that with CCA disabled it is possible to precisely control packet transmission times over WiFi commodity hardware. 5.1 Effect of slotsize To study the effect of slot size on different QoS paraemeters such as throughput, jitter, and delay, we carried out measurements across 4 hops, for a 5 node, 4 hop linear topology in our lab set-up. The values for different TDMA parameters that we used are as follows, number of control slots = 5, number of contention slots = 2, and number of data slots = 96. As there is no interference in 82.11a in our lab, we carried out experiments on channel 149 at the data rate of 6M of 82.11a. Figure 6: Slot allocation schemes To study if the slot allocation strategy has any effect on the above mentioned parameters, along with slot size, we carried out the experiment for two different slot allocation schemes shown in Fig. 6. As can be seen from Fig. 6 the data schedule consists of five data slots as there are five nodes in the network. For the first slot allocation scheme, root node is allotted first data slot, 1 st hop node second data slot and so on. While for the second scheme slot ordering is reversed, i.e. now 4 th hop node is allotted first data slot, 3 rd hop node second data slot and so on. For both the slot allocation schemes, data transfer is carried out between root node and 4 th hop node and we measure uni-directional UDP/TCP throughput and jitter by varying the slot size from 3ms to 3ms. We use iperf tool v2..4 [5] to carry out the measurements. Each experiment duration is 3 seconds and the packet size with MAC layer overheads is 1586 bytes for slot sizes of 3ms or more, while the packet size is 52 and 112 bytes for slot sizes of 1 and 2ms, as the slot sizes of 1 and 2ms cannot accomodate 1586 byte packet at data rate of 6M. We take average across the 5 experiments to report the results for the two slot allocation schemes. Slot allocation scheme 1: Fig. 7 shows the throughput and jitter for the first slot allocation scheme for different slot sizes. Fig. 7 also shows the expected UDP throughput which is computed taking into account the LiT MAC header as well as 82.11 header that we attach over the LiT MAC header to enable per packet acknowledgement. Throughput (Kbps) 1 8 6 4 2 UDP expected UDP measured TCP measured 1 2 3 6 1 15 2 3 Slot size (ms) (a) throughput Jitter (ms) 25 2 15 1 5 jitter 1 2 3 6 1 15 2 3 Slot size (ms) (b) jitter Figure 7: Effect of slot size on throughput and jitter As can be seen from Fig. 7, the UDP throughput increases as the slot size increases. This is since we make use of guard time of 1µs between the slots, as the slot size increases the overhead of guard time decreases. But the overhead of guard time for 3ms slot size is about 3%, 7

while for slot size of 3ms it is about.3%, however the increase in throughput from 3ms (721.6Kbps) to 3ms (936Kbps) is about 3%. Now the packet transmission time for 1586 byte packet, including acknowledgement, is about 2224µs. As for 3ms slot, only one packet is transmitted, the idle time is (29-2224) 676µs, which leads to inefficiency of about 22%, whereas for 3ms 13 packets are transmitted leading to inefficiency of only about 3% and thereby the throughput for 3ms slot is much higher. But more importantly, the throughput measurement results show that the measured UDP throughput is close to the expected value, thereby implying that the implementation mechanism for TDMA outlined in Sec. 4 is working properly. The TCP throughput also increases with increase in slot size, but the increase in TCP throughput is slightly less than UDP throughput. This is since for TCP we transmit both the TCP data packet as well as the acknowledgement in the same data slot. So for a particular slot size the overhead of TCP acknowledgement transmission can cause fewer number of TCP data packets to be transmitted in that slot than the number of UDP packets leading to lesser TCP throughput. For jitter as can be seen from the Fig. 7, the jitter value is quite small for slot size upto 3ms, whereas for other slot sizes the jitter varies from about 15ms to 2ms. This is since for slot size upto 3ms slot as only one packet is transmitted, there is fixed delay equal to schedule length i.e. 5 slots between receiving any two consecutive packet leading to low jitter. But consider the case of 6ms, wherein two packets are transmitted in a slot. Now the delay between receiving the two consecutive packets that are transmitted in the same slot is very small equal to inter-packet arrival time of 9µs as measured in Sec. 4. But the delay between the second packet of a slot and the first packet of the next slot is again equal to schedule length. Thus as the variation in delay between receiving the consecutive packets is more for slots that transmit more than one packet in a slot the jitter is also more. Thus these results indicate that to keep jitter value small, the slot size should be small such that only one packet is transmitted. Slot size of 1ms: As can be seen from Fig. 7 the measured throughput for slot size of 1ms is about 2 Kbps less than the expected throughput with a packet loss of 4%. Ji tter value, for 1ms, is also slightly more than that for 2 and 3ms. Unexpectedly, we observed a packet loss of about 4% even when we had retransmission of lost packet enabled or disabled. To ascertain the reason, we carried out measurements for time synchronization error and error in triggering packet transmission event, but we observed that error for these was similar to that as observed for slot size of 2ms or more. On more detailed investigation we obsered that the packets were being dropped at the MAC driver level as the receive interrupts for the packet were being missed. Now since the packet loss is similar for the case of packe ts enabled or disabled, it is likely that the CPU acts as a bottleneck for processing of the received packets, since the timer trigger frequency for the slot size of 1ms is more. To verify we carried out measurements at data rate of 24M, with slot size as 1ms, by transmitting only packet of 1586 bytes and 3 packets of 35 bytes in a slot. We observed that for the later case the packet loss was about three times (12-15%) compred to the former case (4-5%). This indeed indicates that, at 1ms slot s ize, the CPU is not able to handle both the timer trigger events and packet reception events leading to packet loss. We discuss the implication of this result from point of view of TDMA MAC implementation on commodity hardware later in Section 7. Slot allocation scheme 2: We do not show the results here for second slot allocation scheme, but we observed that the results for throughput and jitter are similar as for first slot allocation scheme. However, the packet delay values are certainly different for the two slot allocation schemes. For eg. for the 1 st slot allocation scheme the packet delay between root node and 4 th hop node is 4 times the slot size, while for the second allocation scheme the packet delay is atleast 4 times the data schedule length, where data schedule length in this case is equal to 5 slots. As the two slot allocation schemes that we have carried out measurements for are reverse of each other, this indicates that the UDP/TCP throughput and jitter values likely do not depend on the slot allocation scheme used. 5.2 Number of per-hop retransmissions It is also essential to understand what effect does varying the number of lost packet retransmissions have on UDP/TCP throughput and jitter. For this we carry out experiment, as before, for a 5 node 4 hop linear topology. We keep the values of all the TDMA MAC parameters same as for the experiment above, except the slot size that we take as 1ms. We deliberately induce a loss of 2% for each of the links and measure the throughput between the root and the 4 th hop node by varying the maximum number of retransmissions. Throughput (Kbps) (Expected UDP Throughput without packet drop - 867.7Kbps) 7 TCP UDP 6 5 4 3 2 1 1 3 5 6 8 Max number of retransmissions (a) throughput Packet Loss (%) 1 1 1.1.1.1.1 measured pkt loss expected pkt loss 1 3 5 6 8 Max number of retransmissions (b) packet loss Figure 8: Effect of maximum number of retransmissions on throughput and packet loss Accordingly Fig. 8 shows the measured UDP/TCP 8

throughput as well as measured and expected packet loss by varying the maximum number of retransmissions from to 8. As can be seen, the TCP throughput increases significantly as the maximum number of retransmissions is increased, while as is expected the UDP throughput value does not change much by increasing the number of retransmissions. Also the measured packet loss is quite close to the expected value (note that y axis for the packet loss plot uses log scale), indicating that the implementation of the per-hop acknowledgement scheme is working properly. We do not show the results here, but as is intuitive we observed that jitter values do not change much by increasing the number of retransmissions. Thus these results suggest that choosing the higher number of retransmissions results in higher TCP throughput alongwith lower packet loss. We also carried out measurements to evaluate how much performance improvement we can get with our error recovery scheme as the error rate increases. For this we carried out experiments to measure the UDP/TCP throughput and jitter by varying error rate for the links, in 5 node 4 hop topology, from to 3%. We kept the number of retransmissions to 8. Throughput (Kbps) 1 8 6 4 2 TCP_no_retransmission TCP_with_retransmission UDP_no_retransmission UDP_with_retransmission 5 1 2 3 Error per link (%) (a) throughput Jitter (ms) 5 4 3 2 1 no_retransmission with_retransmission 5 1 2 3 Error per link (%) (b) jitter Figure 9: Throughput and jitter for different error rates per link Accordingly Fig. 9 shows the UDP/TCP throughput as well as jitter for different error rates per links, for the both cases of with and without retransmissions. As can be seen the benefit of incorporating retransmissions is particularly apparent in case of TCP throughput wherein without retransmissions TCP throughput is practically for error rate of 5% or more, whereas with retransmissions even at 3% error rate the TCP throughput is slightly more than half of the TCP throughput at %. Jitter values, also, with retransmissions are slightly less than that without retransmissions. Thus these results indicate that with our error recovery scheme, even at 3% error, we get significant improvement in performance than without incoporating any error recovery mechanism. Comparison with Wildnet: [17] employs bulk acknowledgement and adaptive FEC based error recovery schemes. In bulk acknowledgement scheme, at the end of a given data slot, the receiver transmits a single acknowledgement packet for all the data packets received in a particular slot. This bulk acknowledgement can be piggybacked with data packets of the receiver. But if the receiver does not have packets to send, dedicated slot needs to be triggered at the end of data slot to transmit bulk acknowledgement. Now as our measurements show the minimum slot size that can be reliably used is 2ms, which can incur high overhead compared with out per-packet acknowledgement scheme which incurs an overhead of only about 48 µs (SIFS + ack tx time) at 6M data rate. FEC based scheme results in lower end to end delay compared with bulk acknowledgement scheme, but incurs high throughput overhead. For e.g. to reduce the packet loss from 1% to 1% incurs about 3% overhead, while to reduce the packet loss from 3% to 1% it incurs 1% throughput overhead. For our scheme, as can be seen from Fig. 9 to reduce the packet loss from 1% to % the throughput overhead is about 24%, while to reduce the packet loss from 3% to % the overhead is only about 4%. 6. PERFORMANCE EVALUATION We now study the QoS performance of LiT MAC in a realistic setting of an outdoor testbed and evaluate the efficiency and control overhead for QoS provisioning in 15 node indoor testbed. The hardware and software specifications for the experimental set-up is same as that for our experiments in previous section except that for our outdoor testbed, we use 15dBi omni-directional antenna installed on building rooftops, whereas for the indoor we use 2dBi omni-directional internal antennas connected to the nodes. Fig. 1 shows the outdoor testbed. Figure 1: Nine node outdoor testbed 6.1 LiT MAC Evaluation Throughput evaluation: We carried out measurements, spread across 7 days, to study how effectively LiT MAC can fulfill QoS requirements of the application in a realistic setting of the outdoor testbed. The experiment that we carried out was that we started three simultaneous uni-directional CBR flows of 4 Kbps, for a duration of one minute, across the following three paths, KR-MB-Old GH (2hop), KR-GG-SOM-New GH(3hop- 1) and KR-H13-H12-H14 (3hop-2) and for each of these flows we measured average throughput, packet loss and jitter. This experiment was then repeated after interval 9

of five minutes. Likewise we carried out the measurements with these experiments running continously, after interval of five minutes each, for a period of 7 days. Slot duration 5ms Num. control slots 2 Num. contention slots 1 Num. data slots 33 Num. retransmissions 8 2+1 Control overhead 2+1+33 = 8.33% Table 2: TDMA MAC parameter values The values of the TDMA MAC parameters that we used are given in Table 2. Now, the performance of the TDMA MAC protocol depends largely on the scheduling algorithm used. But the focus of the current work is to evaluate the protocol performance irrespective of the scheduling algorithm. So we use the following flow dependant slot allocation scheme. Since all the three flows start at the root node (i.e. node KR), the root node is allotted first three slots in the data schedule. Then all the first hop nodes are allotted one slot each in the data schedule, then second hop nodes and so on. Now the schedule length (i.e. number of slots after which the schedule is repeated) for this slot allocation scheme is 11 slots, so the end to end delay incurred by each flow is atmost only 11 x 5 = 55ms. % of expts 1 9 8 7 6 5 4 3 2 1 2hop 3hop-1 3hop-2 5 1 15 2 25 3 35 4 Throughput (Kbps) (a) Throughput CDF % of expts 1 9 8 7 6 5 4 3 % of expts 1 9 8 7 6 5 4 3 2 2hop 1 3hop-1 3hop-2.1.2.3.4.5.6.7.8.9 1 Packet loss (%) (b) Packet loss CDF 2 2hop 1 3hop-1 3hop-2 5 1 15 2 25 3 Jitter (ms) (c) Jitter CDF Figure 11: QoS performance for CBR traffic Fig. 11 shows the CDF of throughput, packet loss and jitter for the 1 minute experiments carried out for the period of 7 days. Now, based on the TDMA MAC parameter values, the expected throughput for each of the flow is 392 Kbps. As can be seen from Fig. 11 for the flow 3hop-1, about 9% of the expriments, while for the other two flows about 75% of the experiments have throughput over 3 Kbps. The packet loss for over 95% of the experiments is less than.1% for the flows 3hop1-1 and 3hop-2, while for the flow 2hop packets loss is less 1% for about 9% of the experiments. The jitter for about 9% of the experiments for the flows 2hop and 3hop-1 while for about 8% of the experiments for the flow 3hop-2, is less than 1ms. Even though the CDF of packet loss for the flow 2hop is more, the jitter is less than the flow 3hop-2. This is since we observed that the packet loss for the experiments of the flow 2hop was typically less than the that for the flow 3hop-2 leading to lower jitter. However the flow 2hop experienced bursty packet loss especially for the experiments during the day time when the external interference was generally more leading to higher aggregate packet loss. Throughput (Kbps) 4 35 3 25 2 15 1 5 2hop 3hop-1 3hop-2 1 2 3 4 5 Time (hours) (a) Day time (1-6pm) Throughput (Kbps) 4 35 3 25 2 15 1 5 2hop 3hop-1 3hop-2 1 2 3 4 5 Time (hours) (b) Night time (1-6am) Figure 12: Throughput variation across time of day To ascertain the reason for slightly less throughput for the flows 2hop and 3hop-2 (as can be seen in Fig. 11), we analyzed the measurements at various times of the day. Typically during day time the presence of external WiFi interference is more in the campus. So we observed that for the experiments carried out during day time the throughput for two flows was slightly less compared with the flow 3hop-1, as can be seen from Fig. 12 which shows the throughput for the experiments carried out during day and night time. As against this for the experiments carried out during night time, the throughput for the three flows is more similar as the presece of external is less. This explains why the aggregate throughput for the flows 2hop and 3hop-2 is slightly less compared with the flow 3hop-1. Thus these experiments further ascertain that the implementation of the LiT MAC over commodity hardware performs reliably in the realistic setting wherein there is presence of wireless and can efficiently fulfill the QoS requirement of the application. Apart from this we observed that a node typically experienced one or two system reboots in a day, but these system reboots did not have any effect on the performance. Comparison with CSMA: We now compare the performance of the LiT MAC with 82.11 CSMA/CA. For this we carry out uni-directional UDP/TCP and bidirectional UDP transfer across the three paths 2hop, 3hop-1 and 3hop-2, mentioned above, at 82.11g data rate of 6M at channel 8. We keep the values of different LiT MAC parameters as shown in Table 2, except the slot duration which we change to 3ms. Accordingly Fig. 13 shows the throughput values for the three flows. The error bars in the graph represent the standard deviation. Now the expected throughput for the TDMA for unidirectional UDP is 326 Kbps, whereas for bi-directional UDP it is 163 Kbps for each flow. As can be seen from 1