QoS Evaluation of Sender-Based Loss-Recovery Techniques for VoIP

QoS Evaluation of Sender-Based Loss-Recovery Techniques for VoIP Teck-Kuen Chua and David C. Pheanis, Arizona State University Abstract Voice over Internet Protocol (VoIP) is a technology that transports voice data packets across packet-switched networks using the Internet Protocol (IP). Losing packets in the network is inevitable, and losing voice packets degrades audio quality. There are many loss-recovery techniques that designers can use to mitigate the undesired effects of packet loss. Some of these loss-recovery techniques use senderbased procedures, and others use receiver-based procedures. We examine several well-known sender-based loss-recovery techniques and evaluate the feasibility and effectiveness of each one in real-time interactive VoIP applications. We analyze the bandwidth requirements, buffering delays, and perceptual sound qualities of these techniques. We study the effectiveness of these approaches under various packetloss conditions, and we also compare the effectiveness of these techniques against a speech codec that has high degree of packet-loss robustness. V oice over Internet Protocol (VoIP) is a telephony technology that commonly uses the real-time transport protocol (RTP) to transport voice packets over a packet-switched network. RTP runs on top of the user datagram protocol (UDP), and UDP is an unreliable delivery protocol. When routers are overloaded, they drop packets. Therefore, some voice packets inevitably disappear in the packet-switched network. Furthermore, in real-time communication a voice packet that arrives at the receiving endpoint too late is useless and is equivalent to a lost packet. When we lose voice packets, the quality of the audio at the receiving endpoint degrades because the receiving endpoint does not have voice data for regenerating the lost segment of the audio. Researchers have proposed many techniques for improving quality of service (QoS) in the face of packet loss. Some of these techniques employ receiver-based packet-loss concealment (PLC) approaches. Many audio encoder/decoders use PLC algorithms to synthesize audio when packets of audio data are missing. There are also sender-based loss-recovery techniques, whereby the sender assumes an active role to help the receiver recover lost data or improve QoS when packet loss occurs. Commonly, sender-based techniques are independent of receiver-based techniques, so designers can employ both types of loss-recovery methods simultaneously. The choice of any particular audio encoder/decoder (i.e., codec) with PLC transparently provides the degree of receiver-based packet-loss recovery that the selected codec offers. Designers normally consider packet-loss resilience along with several other factors when choosing codecs. Sender-based packet-loss recovery can supplement or even entirely replace receiver-based packet-loss concealment, so designers must also consider sender-based packet-loss recovery as part of the overall design for improving QoS. Most sender-based loss-recovery mechanisms work by retransmitting data or by transmitting additional data. These approaches consume additional resources such as network bandwidth and CPU capacity. The increase in the consumption of network bandwidth puts more load on the network and can potentially result in the loss of more packets, perhaps ironically exacerbating the very problem we are trying to solve. Sender-based loss-recovery techniques typically introduce added end-to-end delay into the media stream. Generally, humans cannot even perceive a one-way delay of less than 1 ms, and most users can tolerate a one-way delay of up to 25 ms. If the one-way delay exceeds 25 ms, however, the delay can result in a serious talker-overlap effect that is intolerable for most users [1]. Therefore, we must consider added delays as well as other factors such as bandwidth consumption when evaluating the feasibility and effectiveness of loss-recovery techniques. In this article we examine the degrees of audio quality that popular sender-based techniques achieve for VoIP communications under various packet-loss conditions. We briefly describe several sender-based loss-recovery techniques, and we analyze the bandwidth requirements and the buffering delays of these approaches. Then we evaluate the perceptual sound quality of selected techniques that are suitable for realtime conversation-type IP communications. We study the effectiveness of each approach under various packet-loss environments, and we also compare the effectiveness of these techniques against an audio codec that has superior packetloss robustness. Audio Codecs The audio codecs that we use in this study are the Internet Low Bitrate Codec (ilbc) and the International Telecommunication Union (ITU-T) standard G.729A CS-ACELP codec. We chose ilbc because its design specifically provides a high degree of receiver-based packet-loss robustness for communications in a VoIP environment. We chose G.729A for its abili- 14 89-844/6/$2. 26 IEEE IEEE Network November/December 26

1 3 Figure 1. Two-frame interleaving. 2 4 5 7 6 8 1 2 3 4 5 6 7 8 1 5 9 13 2 6 1 14 3 7 11 15 4 8 12 16 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 Figure 2. Four-frame interleaving. ty to deliver speech with near toll-quality audio at a very low bit rate. The low bandwidth requirement of G.729A compensates for the additional bandwidth utilization of the senderbased loss-recovery techniques, thus making the loss-recovery techniques more effective. Internet Low Bitrate Codec (ilbc) ilbc is a speech codec designed for narrow-band speech with an 8 khz sampling rate. Although ilbc is a narrow-band speech codec, it uses the full 4 khz frequency band. It uses a block-independent linear predictive coding (LPC) algorithm. ilbc is a history-independent codec where previous lost packets do not degrade the audio quality of any future packets [2]. Its controlled response to packet losses is similar to pulse code modulation (PCM) with packet-loss concealment (PLC), the approach of the ITU-T standard G.711. ilbc supports two basic frame lengths, 3 and 2 ms. In the mode with 3 ms frames, ilbc produces 4 bits of encoded data per block of 3 ms of raw audio samples, so this mode requires a payload bit rate of 13.33 kb/s. With the 2 ms block length, ilbc generates 34 bits per block, resulting in a payload bit rate of 15.2 kb/s. Tests have shown that the 2 ms mode demonstrates higher packet-loss robustness than the 3 ms mode [3], but the 2 ms mode also requires more bandwidth than the 3 ms mode. ITU-T Standard G.729A Codec The ITU-T standard G.729A codec is an audio speech codec that uses the conjugate-structure algebraic code excited linearprediction (CS-ACELP) algorithm [4, 5]. CS-ACELP codecs are important because they deliver speech with near toll quality at very low bit rates. They achieve this performance by using a linear-prediction filter that models the properties of the human vocal tract, so they work well with speech but not with music or tones. G.729A is the reduced-complexity version of the ITU-T standard G.729 codec. Like ilbc, G.729A is a narrowband speech codec (3 34 Hz). Like other CELP codecs, G.729A is very successful at providing good audio quality with very low bit rates. Since CELP codecs exploit the interdependencies that exist between consecutive segments of speech, CELP codecs are heavily dependent on the history of previous speech data. Studies show that a lost packet in CELP encoding does, indeed, distort the audio for several frames after the lost packet [2]. G.729 supports a frame length of ten ms and provides payload bit rates of 6.4 kb/s (Annex D), 8 kb/s (original, Annex A, and Annex C), and 11.8 kb/s (Annex E). G.729 Annex B provides a silence-compression scheme that can potentially reduce the payload bit rate. Loss-Recovery Techniques When trying to combat the effects of packet loss in VoIP systems, designers typically do not use sender-based loss-recovery techniques. One of the main reasons for avoiding senderbased loss-recovery techniques is that all of the existing approaches require additional network bandwidth. As the utilization of the data bandwidth increases, the data network becomes more congested and can discard more packets. Consequently, sender-based loss-recovery techniques, which are supposed to improve QoS when packet loss occurs, can actually degrade the audio quality by causing more lost packets. We cannot overstress the importance of using an approach that does not inordinately increase bandwidth consumption. We now consider several delivery techniques. Plain Delivery Plain delivery is more prevalent than any other delivery technique in VoIP solutions. Plain delivery does not provide any sender-based effort to improve audio quality when packet loss occurs. We include this approach to provide a baseline for purposes of comparison. This technique simply packages each block of encoded audio data into an IP packet and transmits the packet. For example, if we use a 2 ms frame length, we package 38 bytes of ilbc encoded data into an IP packet for transmission. In the case of G.729A, we package 2 bytes of encoded audio data for a 2 ms period into an IP packet for transmission. In either case, we transmit 5 packets per second. If we transmit the packets via Ethernet, the header overhead is 78 bytes per IP packet (12 bytes for the interpacket idle time, 26 bytes for Ethernet overhead including CRC and preamble, 2 bytes for IPv4, 8 bytes for UDP, and 12 bytes for RTP). Consequently, we consume 46.4 kb/s for ilbc and 39.2 kb/s for G.729A. Plain delivery introduces 2 ms of delay into the audio stream. Interleaving Interleaving [6] is not really a loss-recovery technique, since this approach is not able to recover any lost data. However, this method attempts to reduce the degradation of perceptual audio quality by distributing lost data into several small gaps instead of having one large gap of lost data. Many researchers believe that listeners can mentally patch over a loss more easily if we disperse the loss into several small parts. Since this technique does not transmit additional information, it requires the same bandwidth utilization that plain delivery uses. Interleaving is feasible only if we transmit multiple frames of audio in each IP packet. If we packaged two frames of audio into one IP packet, we would transmit packets of interleaved audio frames, as Fig. 1 illustrates. We could further scatter lost frames by interleaving more frames into each IP packet. Figure 2 illustrates interleaving with four audio frames per IP packet. Increased interleaving provides improved dispersal of lost audio frames, but increased interleaving also introduces a larger delay into the media stream. Larger delays can cause serious problems such as talker overlap, so interleaving improves some aspects of quality while degrading others. Forward Error Correction Forward error correction (FEC) is a sender-based technique for mitigating the undesired effects of packet loss [7]. FEC works by transmitting redundant packets for error correction. There are many different variants of the FEC technique. We consid- IEEE Network November/December 26 15

C A B D A xor B Figure 3. Parity-coding piggyback FEC with n = 2. er the Reed Solomon encoding scheme and the parity-encoding scheme as examples of FEC. Reed Solomon encoding [8] is a well-known block-based error-correction mechanism. Reed-Solomon codes enjoy widespread use in storage applications and digital communications ranging from Compact Disk and DVD to wireless communication, satellite communication, digital television, and high-speed modems. The Reed Solomon encoding scheme works by generating parity bits and sending the parity bits along with the data values. If data values are missing or corrupt, the Reed Solomon decoder can reconstruct the original data by using the redundant information from the parity bits. The measure of the redundancy in the block determines the error-correcting ability of the Reed Solomon code. Reed Solomon coding can be computationally quite expensive to implement on a general-purpose microprocessor, especially in a real-time software implementation such as VoIP. Therefore, we will not select Reed Solomon encoding in our study of perceptual sound quality. Parity encoding uses a simpler algorithm that transmits redundant packets for error correction. For every n packets of audio data, this approach transmits an error-correcting packet that contains the exclusive-or of the previous n packets of audio data. If the network loses one of the n audio packets, we can reconstruct the lost packet from the other (n 1) audio packets and the error-correction packet. This technique fails completely, of course, if we lose more than one packet out of a group of (n + 1) packets. With n = 5, as an example, this approach increases bandwidth requirements by 2 percent. With 2 ms frames, we would need 55.68 kb/s for ilbc and 47.4 kb/s for G.729A, including header overhead. However, with n = 5, this technique introduces too much delay into the audio stream to be a feasible approach for real-time VoIP communications. If we lost a packet, we would need to wait for the other audio packets in the group and the errorcorrection packet to arrive before we could reconstruct the missing audio packet. If n = 5 with 2 ms of audio data per packet, this approach introduces at least 1 ms of delay into the audio stream. We can reduce the delay and increase the effectiveness of this technique by choosing a smaller n, but a smaller n results in more bandwidth consumption. If we use n = 2, for example, the delay shrinks to a minimum of 4 ms while the required bandwidth rises to 69.6 kb/s for ilbc and 58.8 kb/s for G.729A, an increase of 5 percent over the bandwidth requirements for plain delivery. We can use a small value of n and still be reasonably bandwidth efficient by employing a technique called piggybacking FEC (pfec) [9]. Using pfec, we transmit the error-correction information along with the subsequent audio data packet, as Fig. 3 illustrates. This approach eliminates the packet overhead for the correction frame by including the correction frame in the same packet that contains the next audio frame. If we choose n = 2 with 2 ms frames, pfec requires 54 kb/s for ilbc (16 percent more than plain delivery for ilbc) and 43.2 kb/s for G.729A (1 percent more than plain delivery for G.729A). This configuration of the pfec approach introduces 6 ms of delay into the media stream. E C xor D Redundant Data Transmission Redundant data transmission (RDT) works by transmitting audio data more than once [1]. This technique includes previously transmitted audio data along with new audio data in a single IP packet. Every IP packet contains both redundant audio data and new audio data. In the event of a lost packet, we can still recover the lost audio data from another IP packet that contains the lost data as redundant data. The amount of redundant data that we include in each IP packet can vary to provide differing degrees of effectiveness. This approach is similar to pfec with n = 1, so RDT is essentially a simple variation of pfec with n = 1. In our study, we package a 2 ms block of previously transmitted audio data along with a 2 ms block of new audio data. Therefore, we transmit every 2 ms block of encoded data twice. With 2 ms frames, for example, we transmit frames 1 and 2 in one packet, then transmit frames 2 and 3 in the next packet, send frames 3 and 4 in the following packet, and so on. In the event of a lost packet, we can still recover all of the lost information. If we lose two or more consecutive packets, however, we will actually lose audio data. This particular approach adds 4 ms of delay into the audio stream, and it uses 61.6 kb/s for ilbc (33 percent more than plain delivery for ilbc) and 47.2 kb/s for G.729A (2 percent more than plain delivery for G.729A). A variant of this technique uses one encoding scheme for the original data and a different encoding scheme for the redundant data. Usually, the encoding scheme for the redundant data has a lower bit rate to reduce the amount of additional bandwidth that the approach requires. The lower bit-rate encoding scheme normally has lower quality than the primary encoding scheme. Jiang et al. [9] conducted a study to compare pfec and low bit-rate redundancy (LBR) and showed that pfec is more effective than LBR in terms of delivering voice quality in the face of packet loss. Duplicate Packet This loss-recovery technique attempts to improve packet-loss robustness by transmitting duplicate packets. This approach is similar to RDT in that it transmits audio data more than once. However, this technique transmits the redundant data in separate IP packets and thereby increases bandwidth consumption by requiring additional header overhead. Transmitting duplicate packets is very expensive in terms of bandwidth. It requires double the bandwidth of the plain-delivery method, 92.8 kb/s for ilbc and 78.4 kb/s for G.729A. In other words, we would need to double the bandwidth of our data network or reduce the number of supported channels by half to use this technique in place of the plain-delivery technique. As a result, this approach is too costly to be a practical solution, even though it does have the benefit that RTP supports it without change. Note that RTP also supports RDT without change, and RDT requires considerably less bandwidth than the duplicate-packet technique. The duplicate-packet approach introduces at least 2 ms of delay if we package 2 ms of audio data into an IP packet. Proponents of this method prefer to transmit the duplicate packet at a much later time rather than transmitting it immediately after the original packet, however. A router drops packets when it is overloaded, so a router is likely to drop packets that arrive during a busy time. A router that drops the original packet is therefore likely to drop the duplicate packet as well if we transmit the duplicate packet immediately after the original packet. If we delay the transmission of the duplicate packet, we reduce the chance of losing both packets but also need to add this delay into the overall delay that this approach introduces into the audio stream. 16 IEEE Network November/December 26

Consecutive lost packets Plain loss % Table 1. 1 percent random loss. Retransmission Retransmission provides loss recovery by resending lost data upon request by the recipient. The receiving endpoint implements a loss-detection algorithm to detect lost packets. Once the receiver categorizes a missing packet as a lost packet, the receiving endpoint sends a retransmit request to the sender. Upon receiving the resend request, the sender retransmits the lost packet. Some loss-detection algorithms also try to detect a lost retransmission packet. H.P. Sze et al. [11] proposed an effective retransmission technique that combines the gapdetection and timeout-detection mechanisms. The retransmission technique has variable additional bandwidth requirements it consumes more bandwidth when there are more lost packets. A number of researchers have studied and recommended various retransmission techniques that are more bandwidth efficient than simple retransmission. Nonnenmacher et al. [12] proposed an approach that combines FEC and retransmission by using parity FEC packets to repair multiple losses with a single retransmission, thus achieving substantial bandwidth savings. Retransmission occurs only at the explicit request of the recipient, so retransmission requires a round trip that inherently induces a large end-to-end delay. Consequently, retransmission is not suitable for interactive real-time communications such as VoIP. A conversation-type real-time communication system requires an end-to-end delay of less than 25 ms [1]. Otherwise, the system would introduce the unpleasant effect of serious talker overlap that degrades the overall quality of the session. Selecting Methods for Study RDT loss % pfec Loss % 1 8.1..45 2 1.62.81 1.256 3.243.162.27 4.32.24.29 5.4.3.4 6... Total 1. 1. 1.9 All of the sender-based loss-recovery techniques that we have discussed are relatively simple to implement, both for the sender and for the receiver, so implementation difficulty is not an issue. In fact, RTP already supports plain delivery, RDT, and duplicate packets with no changes at all. As RFC-2198 [13] illustrates, RTP can additionally support interleaving and pfec with minor extensions to the receiver in order to accommodate payload modifications. For our evaluations of audio quality, we select only the sender-based loss-recovery techniques that are suitable for real-time interactive VoIP communications. Throughout the study, we measure the perceptual audio quality of ilbc with the plain technique, G.729A with the plain technique, G.729A with the RDT technique, G.729A with the pfec technique using n = 2, and G.729A with the two-frame interleaving technique. The duplicate-packets approach is very bandwidth expensive, so it is not practical or cost effective in real-world VoIP products. The retransmission technique potentially introduces large delays and would cause unacceptable talker overlap in real-time interactive IP communications. Consequently, the retransmission method is not a viable solution for our target applications. Nonetheless, the retransmission approach can be an effective loss-recovery solution in non-interactive applications that do not have tight latency bounds. Packet-Loss Characteristics Many of the sender-based loss-recovery techniques have levels of effectiveness that are heavily dependent on the characteristics of packet loss in the network. We therefore analyze the performance of the sender-based loss-recovery techniques against three different packet-loss profiles. We examine the recovery techniques with random loss, with burst loss, and, most importantly, with a loss characteristic that we modeled based on statistics that we collected from real IP networks. Random Loss With random loss we simply lose packets randomly. If packet loss were entirely independent from one instant to the next, we would experience random loss. Random loss is a condition that is very favorable for many of the sender-based loss-recovery techniques due to the relatively rare occurrences of multiple consecutive lost packets with random loss. The probability of losing k consecutive packets in a random-loss scenario drops sharply as k increases. Consider the RDT technique, which suffers no audio loss at all unless we lose two or more consecutive packets. Even in the consecutive-loss situation, the actual information that we lose with RDT is much less than what we lose with the plaindelivery technique because RDT compensates for part of the loss. Similarly, the approach of sending duplicate packets typically endures no loss at all, unless we lose multiple consecutive packets. The pfec technique fails completely and experiences a total loss when we drop more than one packet among a group of (n + 1) packets. However, the likelihood of such complete failure with the pfec technique is low in the random-loss environment. The interleaving technique always has the same amount of loss as the plain technique regardless of the loss characteristic. However, multiple consecutive losses widen the loss gap that this technique seeks to overcome. Therefore, the interleaving technique is most effective at achieving its goal of dispersing the loss gaps when multiple consecutive losses are rare, as they are in the random-loss condition. Network loss characteristics do not affect the effectiveness of the retransmission technique, since the receiver can request retransmission whenever it detects a lost packet. Table 1 and Table 2 show the actual losses of information in a random-loss environment for plain delivery, for RDT, and for pfec with n = 2. The tables indicate that the actual losses of information for the pfec technique with n = 2 are significantly lower than the corresponding losses for the plaindelivery approach, and the tables show that the RDT technique does even better than the pfec method. We can analytically compute the percentage of lost packets for various burst lengths in a random-loss environment for plain delivery, RDT, and pfec. For plain delivery, the proportion of lost packets to total packets for a loss of k consecutive packets is kp k (1 p) 2, where k is the burst length and p is the probability of a single packet loss. For the RDT technique, the proportion of lost packets to total packets is somewhat similarly (k 1)p k (1 p) 2, where the (k 1) factor IEEE Network November/December 26 17

Consecutive lost packets Plain loss % RDT loss % pfec loss % 1 9 1 12.8. 1.28 2 5.12 2.56 4.96 3 1.536 1.24 1.331 4.41.37.369 5.12.82.94 6.25.2.23 Loss that RDT eliminates (%) 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 Number of consecutive lost packets 7.7.6.6 Figure 4. Burst loss (percent loss that RDT eliminates). Total 2. 4. 7.2 Table 2. 2 percent random loss. 1 5 9 13 Lost packet Lost packet Lost packet replaces the k factor of plain delivery because of the loss recovery that RDT provides. The situation for pfec is a bit more complex. In the case of k = 1, the proportion of lost packets for pfec with n = 2 is [(k 1 + p)p k (1 p) 2 ]/2 or simply [p k+1 (1 p) 2 ]/2. When k > 1, the proportion of lost packets for pfec with n = 2 is [k + (k 1) + p]p k (1 p) 2 /2. The formulas in this paragraph and the previous paragraph produce the values given in Table 1 and Table 2. Burst Loss With burst loss we assume that a condition that causes the loss of a packet persists for some period of time and therefore causes us to lose one or more subsequent consecutive packets. The RDT technique is not as effective in an environment with burst loss as it is in the random-loss scenario. Although the RDT technique still reduces the experienced loss relative to the total loss, the effectiveness of the RDT technique diminishes quickly as the number of consecutive lost packets increases. RDT always compensates for the final lost packet in any burst of lost packets, so the portion of the loss that RDT eliminates for a burst loss of k consecutive packets is (1/k). Figure 4 graphically illustrates how much of the total loss RDT eliminates as a function of the number of packets in each burst of lost packets. When the network becomes congested and loses multiple consecutive voice packets, pfec is very likely to lose more than one packet among (n + 1) packets. In this circumstance, pfec fails completely and is unable to rectify the loss at all. Likewise, the duplicate-packet approach suffers in a burst-loss environment because the network is likely to drop both the original packet and the duplicate packet that the sender transmitted soon after the original voice packet. However, a large transmission delay with the duplicate-packet method can help recover the corresponding lost data. As we discussed above, loss characteristics do not influence the effectiveness of the retransmission technique. Similarly, the interleaving technique still achieves its goal of dispersing a long loss into several smaller losses in a burst-loss environment. Figure 5 illustrates a four-frame interleaving scenario with three consecutive lost packets. Real-Network Loss Accurately modeling the loss characteristics of a real IP network is a difficult task. Every IP network is unique and has its own distinctive characteristics. Additionally, the behavior of any particular IP network changes, often significantly, from 1 5 9 13 Figure 5. Four-frame interleaving with burst loss. day to day and from time to time within a day. One segment of a network could be highly congested while another segment of the same network could be idle. The loss statistics that we have collected from real IP networks for our study reveal the volatile characteristics of the networks. We collected information over a period of several months from a corporate LAN with a mix of VoIP and TCP/IP traffic, and we collected similar information by testing over the Internet. We injected a stream of VoIP packets at intervals of 2 ms and recorded packet losses in that stream so we could model the pattern of packet losses for a VoIP conversation in a real network. Our real-network loss statistics interestingly show that transmissions from endpoint A to endpoint B can suffer substantial losses while simultaneous transmissions from endpoint B to endpoint A have no loss at all! In addition, two virtually identical sessions transmitting from endpoint A to endpoint B during the same time period on the same network can produce significantly different loss statistics. Seung-Hwa Chung et al. conducted a study [14] showing that even an underutilized network can lose packets. Their study shows that a network loses packets during a microburst of traffic congestion that lasts for a short period of time. Even a DiffServ network can lose packets, so packet-loss recovery can improve QoS even in a DiffServ environment. Of course, packet-loss recovery improves QoS only when packet loss occurs, so the benefit of packet-loss recovery is proportional to the loss rate of the network. The real-network loss statistics that we have collected are consistent with the results obtained by Chung et al. Since a real network loses packets during periods of traffic congestion, a real network may have higher rates of multiplepacket loss than we see with a random-loss scenario. Our measurements of real networks confirmed this speculation and allowed us to quantify the burst-loss behavior of a real network. The behavior of a real network is at least somewhat similar to the behavior that we produced for the random-loss condition, though we consistently observed that over half of all packet-loss occurrences in the real IP networks that we studied were single-packet losses. 18 IEEE Network November/December 26

Loss occurrences (%) 8 7 6 5 4 3 2 1 1 2 Random loss 33% Random loss 4% Network loss A Network loss B 3 4 5 6 7 8 Number of consecutive lost packets MOS rating 4. 3.8 3.6 3.4 3.2 3. 2.8 2.6 ilbc (2 ms) G.729A (plain) G.729A (RDT) G.729A (pfec) G.729A (interleave) Figure 6. Network loss statistics. Figure 6 illustrates the burst distributions of packet loss for two real networks versus the burst-loss distributions for two random-loss networks. We can easily observe that the pattern of multiple-packet loss for a real network is not dramatically different from the pattern of multiple-packet loss in the random-loss situation. However, our statistics for real networks reveal that a real network does typically have more burst loss and less single-packet loss than we observe in a random-loss scenario, and we take that fact into account when we simulate a real network. (The real-network loss characteristics are quantified later in Table 4.) The overall loss rates of 33 and 4 percent for the randomloss networks in Fig. 6 are considerably higher than the overall loss rates for the real networks in the figure. The real networks had overall loss rates of only about 1 percent during the extended periods for which we measured them, but the average overall loss rate for a real network does not correspond to the average overall loss rate for a random-loss network. Real networks exhibit long periods with little or no loss and occasional short periods of congestion and high loss. The periods of congestion dominate the burst-loss profile and cause a real network with an overall loss rate of only about 1 percent to match the burst-loss profile for a random-loss network with an overall loss rate that is much higher. In effect, a random-loss network provides a close approximation for a real network during a period of network congestion. Audio Quality 2.4 2.2 1 2 3 Random packet loss (%) Figure 7. MOS ratings with random loss. We use the ITU-T standard P.862 Perceptual Evaluation of Speech Quality (PESQ) algorithm to analyze audio quality. We present the audio-quality results in terms of the mean opinion score (MOS), which ranges from 1 (worst) to 5 (best). However, using the PESQ algorithm, a rating of 5 is not possible since the highest MOS rating we can obtain is 4.5 when we compare an audio clip to itself without any distortion. The PESQ algorithm is nearly ideal for our study because it measures distortion between an original (transmitted) audio clip and another (received) audio clip. We send the original audio clip through the codec under study (ilbc or G.729A), and we send the resulting packets through our packet-loss simulator, which simulates random loss, burst loss, or real-network loss. Whenever we simulate packet loss, our simulator drops the same packet(s) for all of the loss-recovery methods to provide fair comparisons. Our packet-loss simulator also applies the loss-recovery technique under study, and we send the resulting data through the codec to get the output audio clip that the receiver hears. We then use PESQ to measure the distortion between the original audio clip and the resulting audio clip. This approach gives us a clear measure of the distortion that is due to packet loss under various combinations of codecs, loss characteristics, and loss-recovery techniques. Our use of the PESQ algorithm does not include delay variations, and we do need to consider delays because the delays vary among the loss-recovery techniques. We have a base delay of 2 ms for plain delivery, a delay of 4 ms (i.e., an increase of 2 ms) for RDT or two-way interleaving, and a delay of 6 ms (i.e., an increase of 4 ms) for pfec with n = 2. To evaluate the impacts of these differing delays, we use the ITU-T G.17 E-model, an analytical model that evaluates the conversational quality of a telephony system. The E-model includes many items (e.g., room noise, echo, and circuit noise) that are independent of both packet loss and delay, but we isolate the effect of delay and thereby determine the quality differences that are due to the delay variations of the lossrecovery techniques. When the total delay does not exceed 1 ms, the E-model indicates that there is no degradation at all due to the delay. This case is the relevant case for most VoIP systems, since designers try to make the delays low enough to eliminate or at least minimize the effects of delays. The delay degradations remain insignificant until the overall delay reaches a level of about 2 ms, and the degradation grows as the delay approaches the talker-overlap threshold of 25 ms. The worst case occurs when an added delay of 2 or 4 ms above the base delay pushes the total delay beyond the talker-overlap threshold, in which case the MOS rating degrades by approximately.1 for an added delay of 2 ms or by.2 for an added delay of 4 ms. This worst-case situation is not important for practical applications, though, because any system that is close to the talker-overlap threshold is already a marginal system for VoIP. Random Loss We have used our random-loss simulator to model a randomloss environment with specified degrees of loss, and our findings are illustrated in Fig. 7. ilbc indeed attains its goal of providing good packet-loss robustness. ilbc consistently achieves a higher MOS rating than G.729A over a series of tests. The two-frame interleaving technique, however, does not demonstrate any effectiveness in improving the MOS rating. The pfec technique with n = 2 consistently improves the audio quality of G.729A. The pfec approach is capable of improving the audio quality of G.729A to the extent of increasing the G.729A MOS rating above the MOS rating for IEEE Network November/December 26 19

MOS rating 4. 3.8 3.6 3.4 3.2 3. 2.8 1 ilbc (2 ms) G.729A (plain) G.729A (RDT) G.729A (pfec) G.729A (interleave) 2 3 4 5 6 Number of consecutive lost packets MOS rating 4. 3.8 3.6 3.4 3.2 3. 2.8 2.6 2.4 2.2 1 ilbc (2 ms) G.729A (plain) G.729A (RDT) G.729A (pfec) G.729A (interleave) 2 3 4 5 6 Number of consecutive lost packets Figure 8. MOS ratings with 1 percent burst loss. Figure 9. MOS ratings with 2 percent burst loss. ilbc. The RDT technique consistently achieves the highest MOS ratings in all of the random-loss tests. The RDT approach is so effective that it allows G.729A to achieve significantly higher MOS ratings than either ilbc or any of the other loss recovery techniques. Burst Loss In order to study the effects of bursts of lost packets on audio quality, we programmed our simulator to simulate burst loss by discarding lost packets in clusters of specified sizes. For example, when testing with 1 percent single-packet loss, we randomly drop a single packet 1 percent of the time so as to provide a single-packet-loss profile. Similarly, when testing with 1 percent two-packet burst loss, we randomly drop two consecutive packets 5 percent of the time so as to achieve a burst loss of two packets, and other cases are analogous. Figure 8 shows the MOS ratings with 1 percent packet loss at various burst sizes, and Fig. 9 shows the MOS ratings with 2 percent packet loss at various burst sizes. Our study indicates that ilbc frequently achieves a higher MOS rating than G.729A in various consecutive-loss tests. However, our test results do not show any correlation between the MOS ratings and the number of consecutive lost packets for either ilbc or G.729A. The interleaving approach clearly does not show any improvement in the MOS rating of G.729A in a burstloss condition. The pfec technique with n = 2 often fails completely when losing two or more packets, so the pfec approach does not show much improvement in the MOS ratings in our burst-loss tests. The RDT technique significantly improves the MOS rating of G.729A with a low number of consecutive lost packets. As the number of consecutive lost packets increases, RDT becomes less effective at improving the audio quality. Nonetheless, RDT still demonstrates its effectiveness most of the time, even with a high number of consecutive lost packets. Just as it was in a random-loss environment, the RDT approach is still the most-effective technique in a burst-loss environment. Real-Network Loss Since the loss characteristics of a real IP network are not radically different from the characteristics that we see with random loss, the MOS ratings shown in Fig. 1 for a real network demonstrate a pattern that is somewhat similar to the pattern in Fig. 7 for random loss. The real IP network has a significant percentage of single lost packets, and single-packet loss allows the RDT and pfec techniques to deliver substantial improvements in audio quality. Even when a burst with a high number of consecutive lost packets does occur, our study shows that RDT still improves the audio quality. On the other hand, pfec is significantly less effective at improving audio quality when experiencing a MOS rating 4. 3.8 3.6 3.4 3.2 3. 2.8 2.6 2.4 2.2 2. 1 2 3 Network packet loss (%) Figure 1. MOS ratings with network loss. ilbc (2 ms) G.729A (plain) G.729A (RDT) G.729A (pfec) G.729A (interleave) burst of consecutive lost packets. Again, the interleaving approach does not seem to be effective at improving the MOS rating for G.729A. The RDT technique consistently allows G.729A to achieve the highest MOS ratings in our tests with real-network losses. The RDT technique is capable of improving the audio quality of G.729A beyond that of ilbc in our simulated realnetwork loss environment. The pfec approach, in contrast, is somewhat less effective. Although the pfec technique does improve the MOS rating of G.729A in the real-network loss condition, pfec consistently fails to achieve the level of improvement that we see with RDT. The network statistics that we collected over a period of several months gave us more than 3 packet-loss profiles for real IP networks, so we have run simulations with many different loss profiles. Table 3 contains the results for a simulation with the average real-network loss profile. We can adjust for delay differences in the worst case when the delays approach the talker-overlap threshold by subtracting.1 from the MOS of both interleaving and RDT and by subtracting.2 from the MOS of pfec. However, a system that approaches the talker-overlap threshold would be marginal and impractical for VoIP, so the numbers in Table 3 are representative for typical cases that we actually see in practice. In addition to running simulations with the average packetloss profile for real networks, we have also run many simulations with other profiles to cover the full range of behaviors that we recorded for real networks. Some of these other pro- 2 IEEE Network November/December 26

Consecutive lost packets Packets lost Loss % % of occurrences Consecutive lost packets Average % of occurrences Standard deviation 95% conf. interval 1 29 21.854 83.333 2 64 4.823 9.195 3 36 2.713 3.448 4 16 1.26 1.149 5 25 1.884 1.437 6 3 2.261 1.437 7.. Total 461 34.74 1. 1 83.3 15.1 ±4.1 2 1. 8.4 ±2.3 3 3.3 4.2 ±1.1 4 1.7 2.7 ±.7 5.7 1.6 ±.4 6.6 1.3 ±.4 7.3.7 ±.2 Table 4. Statistics for real-network losses. MOS of ilbc = 2.575 MOS of G.729A = 2.378 MOS of G.729A with Interleave = 2.487 MOS of G.729A with pfec (n = 2) = 2.733 MOS of G.729A with RDT = 3.18 Table 3. Network loss with average loss profile. files were much less friendly to RDT than the average realnetwork profile they had fewer single-packet losses and higher frequencies of multiple lost packets. Even in these less RDT-friendly scenarios, our tests consistently showed that G.729A with RDT achieves a higher MOS rating than other approaches. Table 4 shows the averages, standard deviations, and 95 percent confidence intervals for the loss-rate statistics that we collected from real networks. These values illustrate the range of behaviors that we observed for real networks and examined with our loss-recovery simulations. Discussion of Results Among the loss-recovery approaches that we analyzed, RDT achieves the best results for every one of our tests. RDT saves a substantial amount of bandwidth by piggybacking the redundant data with the next segment of new audio samples. Using RDT with G.729A, we achieve significantly better QoS than we get from ilbc with the plain technique, and the added expense of RDT over ilbc is a mere.8 kb/s or 8 b/s plus an increase of just 2 ms in the delay. Practical applications have delays that are well below the talker-overlap threshold, and the E-model shows that in this environment the 2 ms difference in the delay has a negligible impact, if any. Although pfec with n = 2 requires less bandwidth than RDT, pfec becomes increasingly less effective than RDT as the loss rate increases. As the loss rate of a data network grows, the incidence of losing two or more consecutive packets increases. Hence, pfec becomes more likely to completely fail to recover any lost data in the face of increasing network loss. Additionally, pfec introduces more delay into the audio stream than RDT does, but this difference is not significant in practical applications. Despite the bandwidth efficiency of the interleaving technique, interleaving does not provide consistent improvement. In some tests, interleaving produces worse results than the plain method. Our finding on the ineffectiveness of the interleaving technique is an eye opener, since previous researchers have reported that interleaving can bring significant QoS improvement without additional bandwidth cost [6, 7]. Technique Table 5. Summary. Bandwidth (kb/s) Delay (ms) RDT 47.2 4 A ilbc (plain) 46.4 2 B pfec, n = 2 43.2 6 B G.729A (plain) 39.2 2 C Two-way interleave 39.2 4 C FEC, n = 2 58.8 4 Duplicate 78.4 2 Retransmission Varies Large Quality rating Table 5 summarizes the results of our study for the various loss-recovery techniques that we examined. The quality ratings in the table are subjective interpretations of the objective results of our tests. The quality rating for a loss-recovery technique indicates how well the technique compensates for packet loss. Note that the bandwidth requirements for the two best techniques are nearly the same, so the comparison between RDT with G.729A and ilbc with plain delivery is a fair comparison. The other approaches require less bandwidth, but they suffer disproportionate penalties in terms of quality. Since the three best techniques consume approximately 1 to 2 percent more bandwidth than plain delivery with G.729A does, we must consider the potential costs of increased bandwidth. An increase in bandwidth can, in a heavily loaded network, cause a rise in the rate of lost packets and thereby counteract the very benefits that we seek to gain by using a loss-recovery system. For this reason, we do not recommend using loss-recovery techniques that increase bandwidth requirements in a network that is already overloaded. In most VoIP applications, however, the network is not overloaded and can easily tolerate the modest bandwidth increase that RDT requires. Implementing VoIP on an overloaded network is a fruitless task, since the resulting audio quality is intolerable, not just because of lost packets but also because of excessive delay, significant jitter, and other problems. Consequently, people typically do not even try to implement VoIP on over- IEEE Network November/December 26 21

loaded networks, so networks that carry VoIP are rarely overloaded and can easily accommodate RDT. Remember that even a lightly loaded network still suffers packet loss because of microbursts of activity, and our study proves the effectiveness of RDT in the face of these microbursts. If a network has enough bandwidth to accommodate the modestly increased bandwidth requirements of RDT or ilbc, we recommend using RDT to improve voice quality. Conclusion Our analysis shows that not every sender-based loss-recovery technique is suitable for real-time interactive VoIP communications. Although previous researchers have reported that interleaving can reduce the degradation of perceptual audio quality, our study shows that the interleaving technique does not achieve any consistent or significant improvement in an environment of lost packets. The pfec technique with n = 2 is effective for improving the audio quality in random-loss and network-loss environments. However, pfec frequently fails to improve the audio quality at all when losses of multiple consecutive packets occur. Among the techniques that are practical for our target applications, RDT is the most effective approach for improving audio quality under various loss conditions. The RDT solution is so effective that using it with G.729A provides significantly better packet-loss robustness than any other loss-recovery method that we examined. RDT with G.729A outperforms even ilbc. References [1] P. T. Brady, Effects of Transmission Delay on Conversational Behavior on Echo-Free Telephone Circuits, Bell Sys. Tech. J., vol. 5, Jan. 1971, pp. 115 34. [2] S. V. Andersen et al., ilbc A Linear Predictive Coder with Robustness to Packet Losses, IEEE Wksp. Speech Coding 22, Tsukuba, Ibaraki, Japan, Oct. 22. [3] Global IP Sound, ilbc-designed for the Future, Global IP Sound, white paper, Oct. 24, www.globalipsound.com/solutions/ solutions_codecs.php [4] ITU-T Recommendation G.729, Coding of Speech at 8 kb/s Using Conjugate-Structure Algebraic-Code- Excited Linear-Prediction (CS-ACELP), Mar. 1996. [5] ITU-T Recommendation G.729 Annex A, Reduced Complexity 8 kb/s CS- ACELP Speech Codec, Nov. 1996. [6] C. Perkins, O. Hodson, and V. Hardman, A Survey of Packet Loss Recovery Techniques for Streaming Audio, IEEE Network, vol. 12, no. 5, Sept. Oct. 1998, pp. 4 48. [7] J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach Featuring The Internet, 3rd Ed. (Addison Wesley, 24). [8] I. S. Reed and G. Solomon, Polynomial Codes over Certain Finite Fields, The SIAM J. Applied Mathematics, vol. 8, no. 2, June 196, pp. 3 4. [9] W. Jiang and H. Schulzrinne, Comparison and Optimization of Packet Loss Repair Methods on VoIP Perceived Quality under Bursty Loss, Proc. 12th Int l. Wksp. Network and Operating Systems Support for Digital Audio and Video, NOSSDAV 22, Miami, FL, May 22, pp. 73 81. [1] T. K. Chua and D. C. Pheanis, Comparative Analysis of Audio Coders and Packet-Loss Recovery, Proc. Commun. Systems and Applications, CSA 25, Banff, Alberta, Canada, July 25, pp. 176 81. [11] H. P. Sze, S. C. Liew, and Y. B. Lee, A Packet-Loss- Recovery Scheme for Continuous-Media Streaming Over the Internet, IEEE Commun. Letters, vol. 5, no. 3, Mar. 21, pp. 116 18. [12] J. Nonnenmacher, E. Biersack, and D. Towsley, Parity-Based Loss Recovery for Reliable Multicast Transmission, Proc. ACM Special Interest Group on Data Communications, SIGCOMM 1997, Cannes, France, Sept. 1997, pp. 289 3. [13] IETF RFC-2198, RTP Payload for Redundant Audio Data, Sept. 1997. [14] S.H. Chung et al., Analysis of Bursty Packet Loss Characteristics on Underutilized Links Using SNMP, IEEE/IFIP Wksp. End-to-End Monitoring Techniques and Services, E2EMON 24, San Diego, CA, Oct. 24. Biographies TECK-KUEN CHUA (TeckKuen.Chua@gmail.com) earned a Ph.D. degree in computer science and engineering at Arizona State University in 26. He is a senior software design engineer at Inter-Tel, Inc., where he works with advanced VoIP technologies, embedded systems, and real-time applications for digital signal processors. DAVID C. PHEANIS (David.Pheanis@ASU.edu) is Professor Emeritus of Computer Science and Engineering at Arizona State University and is the Principal of Western Microsystems. He works with embedded systems and real-time applications of microcontrollers. He earned a Ph.D. degree at Arizona State University in 1974. 22 IEEE Network November/December 26