Receiver-based adaptation mechanisms for real-time media delivery Prof. Dr.-Ing. Eckehard Steinbach Institute of Communication Networks Media Technology Group Technische Universität München Steinbach@ei.tum.de http://www.lkn.ei.tum.de/~steinb 1 Outline Low delay media streaming over the Internet State-of-the-art Adaptive media playout Receiver-driven rate scalability Low delay voice transmission over the Internet Transmission with packet path diversity Adaptive playout scheduling for one path Adaptive two-stream playout scheduling 2 1
Video Streaming over the Internet DSL Workstation Internet 56K modem Video Server best-effort network variable throughput variable delay variable loss rate GPRS Typical initial delay: 5-10 seconds receiver buffer + retransmissions PC 3 State-of-the-Art Throughput Target rate Receiver buffer level t Target buffer level Prebuffering delay Start of of playout t Buffer underflow frame frame freeze freeze and and audio audio interruption 4 2
Idea: Adaptive Playout Throughput Target rate Receiver buffer level t Target level Initial delay Start of of playout Reduced playout speed t buffer underflow avoided 5 State-of-the-Art Requirement: Low initial latency small buffer State-of-the-Art Result of this work 5% packet loss Low initial delay small robustness against variations in transmission quality (3x frame freeze) Low initial delay and high robustness 6 3
Modification of Playout Speed Video: adaptation of display rate Audio and speech: Stretching based on time-domain interpolation algorithm WSOLA [Verhelst et al., 1993, Liang 2001] Pitchperiod Template 0 1 2 3 4 Original packet Output packet 0/1 1/2 2/3 3 4 7 Limits of Speech Stretching Original Stretching: s=1.3 More than 25% is is annoying 8 4
Demo: Low Mean Delay Identical transmission quality Constant playout speed 3 x buffer underflow Adaptive media playout Max stretching s=1.25 Max compressing f=0.75 1 x buffer underflow Identical mean delay 9 Streaming Media System Model λ s : s : mean number of of packets arriving per per frame period 10 5
Reduction in Mean Delay λ G =1.333, λ B =0, T G =28.5 sec, T B =1.5 sec, T RTT RTT =220 ms s=1.00, f=1.00 s=1.25, f=0.75 s=1.50, f=0.50 25-35% reduction in in mean latency 11 Receiver-driven rate scalability Server Channel (mean throughput) (29.2 db, 53.3 kbps) 25 kbps 20 kbps 50 kbps 95 kbps 55 kbps 85 kbps 100 kbps (33.1 db, 93.6*0.9=84.2 kbps) 12 6
Flexibility Gain PSNR 34 db 33 db 32 db 31 db 30 db 29 db 28 db 27 db 26 db Constant playout speed Scaled playout speed Server bit rates 100 kbps 80 kbps 64 kbps 50 kbps 40 kbps 32 kbps 25 kbps 10 20 30 40 50 60 70 80 90 100 (kbps) Mean throughput 13 Outline Low delay media streaming over the Internet State-of-the-art Adaptive media playout Receiver-driven rate scalability Low delay voice transmission over the Internet Transmission with packet path diversity Adaptive playout scheduling for one path Adaptive two-stream playout scheduling 14 7
Voice over IP (VoIP) VoIP is rapidly growing 900% 1998-1999 5000% 1999-2004 2004 135 billion minutes 1999 2.7 billion minutes 1998 310 million minutes [Source: IEEE Spectrum, Mai 2000] 15 Requirements of VoIP Sender 1 2 3 4 5 6 7 8 Time Receiver Playout Time Time Small end-to-end delay for conversational services ( <150ms ) Delay variations (jitter) have to be smoothed using receiver buffer Late packets are lost, no time for retransmissions Small residual packet loss rate is ok Trade-off between end-to-end delay and late loss rate 1 2 3 4 5 6 7 8 Packetization Missed deadline Receiver buffer time 16 8
Packet Path Diversity 1 Default path D Relay 2 Idea: set up multiple connections along different paths Improved congestion resiliency Lower combined latency Better loss characteristics Realization in current Internet, e.g., through relay server cross traffic Media traffic cross traffic S 17 Multiple Description Coding for VoIP Even samples: 8 bit Odd samples: 2 bit Stream 1 E O E O E O E O Stream 2 O E O E O E O E Odd samples: 8 bit packet i i+1 i+2 i+3 Even samples: 2 bit 18 9
Adaptive Playout Scheduling Sending on path 1 1 2 3 4 5 6 Time Receiving on path 1 Constant Playout 1 2 3 4 5 6 Adaptive Playout 1 2 3 4 5 6 stretching compressing If past delay values indicate congestion delay playout of next packet(s) by stretching speech signal If past delay values are small advance playout of next packet(s) by compressing speech signal 19 Constant Playout Trade-off between packet loss and delay late loss Constant playout deadline 20 10
Adaptive Playout Adaptation to delay variation (jitter) Adaptive playout deadline 21 Late Loss Mean Delay Adaptive playout Constant playout 22 11
Packet Path Diversity for Low Delay VoIP Sending on path 1 1 2 3 4 5 6 Time Receiving on path 1 Playout 1 2 3 4 5 6 Receiving on path 2 Sending on path 2 1 2 3 4 5 6 Packet path path diversity reduces effective delay jitter jitter and and therefore late late loss loss rate rate 23 Internet Experiment Stanford 192.84.16.176 (5 ms) Exodus Exodus Comm. Comm. (45 ms) BBN BBN Planet Planet (5 ms) (40 ms) (5 ms) Destination MIT 18.184.0.50 Source Qwest Qwest Harvard 140.247.62.110 Relay Explicit path selection using relay server [Apostolopoulos 01] UDP packets with payload of 240 bytes 24 12
Delay Measurement Delay in ms Maximum Delay Packet number 25 Late Loss versus Delay 30 Packet loss rate in % 25 20 15 path 1 only 10 5 path 1 und 2 path 2 only 0 40 60 80 100 120 140 160 180 Delay in ms 26 13
Adaptive Two-stream Playout Scheduling Combination of Packet Path Diversity and Adaptive Playout Minimization of Lagrangian cost function playout playout deadline 2 i play λ1 Pr { play} i { only one description lost d play} C = d + both descriptions lost d Variation of i i i play + λ Pr = d + λ pˆ pˆ + λ ( pˆ (1 pˆ ) + pˆ (1 pˆ )) 1 1 2 2 1 2 2 1 Histogram of past delay values estimate of of loss loss probability i d play i d play ˆp 1 Delay 27 Measured Packet Delay Trace Delay in ms Packet number 28 14
Adaptive Two-stream Playout Scheduling Delay in ms Packet number 29 Comparison: Single-path Transmission with FEC Stream sent 1 1 2 2 3 3 4 Stream received with packet loss 1 1 2 3 3 4 Stream reconstructed 1 2 3 4 Packets protected with FEC FEC: adds redundancy by sending one or more copies of the source signal in the following packet(s) [Bolot 96] FEC protected single-stream For fair comparison Primary copy: quantized at fine resolution (8-bit) Secondary copy quantized at coarser resolution (2-bit) Same data rate as transmission with Packet Path Diversity Same adaptive playout scheduling technique 30 15
Results Packet loss rate in % >45% delay (ms) 31 Demo Original With Packet Path Diversity One-path (same data rate) mean end-to-end delay: 84 84 ms error concealment: packet repetition 32 16
Summary IMS: Adaptive media playout reduces initial and mean delay for Internet media streaming IMS: Adaptive media playout leads to receiver-driven rate scalabilty VoIP: Adaptive media playout allows playout latency adaptation within talkspurts VoIP: Transmission with packet path diversity improves network QoS VoIP: Both techniques can be nicely combined and lead to significant improvements 33 References E. Steinbach, "Adaptive Abspieltechniken für Internet Media Streaming," FKT, Ausgabe 1-2, pp. 22-25, 2003. M. Kalman, B. Girod, and E. Steinbach, "Adaptive Playout for Real-time Media Streaming," International Symposium on Circuits and Systems, ISCAS 2002, Scottsdale Arizona, May 2002. Yi J. Liang, E. Steinbach and B. Girod "Real-time Voice Communication over the Internet Using Packet Path Diversity," Proc. ACM Multimedia 2001, pp. 431-440, Ottawa, Canada, Sept./Oct. 2001. E. Steinbach, N. Faerber, and B. Girod, "Adaptive Playout for Low-Latency Video Streaming," Proc. International Conference on Image Processing, ICIP- 2001, pp. 962-965, Thessaloniki, Greece, Oct. 2001. Y. J. Liang, N. Färber, and B. Girod, "Adaptive playout scheduling using timescale modification in packet voice communications," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2001, vol. 3, pp. 1445-1448, Salt Lake City, UT, May 2001. 34 17
PESQ Results Perceptual Evaluation of Speech Quality (ITU-T Rec. P.862, Feb. 2001) PESQ can be used for end-to-end quality assessment Ranges from 0.5 to 4.5 but usually produces MOS-like scores between 1.0 and 4.5 35 Speech and Audio Scaling Speech scaling original stretched: s=1.3 compressed: f=0.7 Audio scaling original stretched: s=1.3 compressed: f=0.7 36 18