Future Buffer based Adaptation for VBR Video Streaming over HTTP Tuan Vu 1, Hung T. Le 2, Duc V. Nguyen 2, Nam Pham Ngoc 1, Truong Cong Thang 2 1 Hanoi University of Science and Technology, Hanoi, Vietnam 2 The University of Aizu, Aizuwakamatsu, Japan Abstract HTTP streaming has become a cost effective means for video delivery nowadays. To enable adaptivity to networks and terminals, a provider should generate multiple representations of an original video as well as the related signaling metadata. So far, most previous studies have just focused on the case of CBR (constant bitrate) video. In this paper, we propose a novel adaptation method for VBR (variable bitrate) video streaming. Based on a trellis representation to estimate future buffer levels, the proposed method can provide smooth video quality while avoiding buffer underflows. The experimental results show that our approach can perform effectively under drastic changes of both connection throughput and video bitrate. Keywords HTTP streaming, VBR, adaptivity I. INTRODUCTION Thanks to the abundance of web platforms and broadband connections, hypertext transfer protocol (HTTP) streaming has become a cost effective means for multimedia delivery [1][2]. Besides, due to the heterogeneity of today s communication networks, adaptivity is the most important requirement for any streaming client [2]. Especially, transmission control protocol (TCP), the underlying layer of HTTP, is notorious for its throughput fluctuations [3]. Moreover, when a video is encoded in variable bitrate (VBR) mode, its bitrate may also vary widely according to the characteristics of the video content [4]. So, the mismatch of throughput and video bitrate is a big challenge for adaptive streaming. In order to enable adaptivity to networks and terminal capabilities, an HTTP streaming provider should generate multiple alternatives (or versions) of an original video as well as the signaling metadata that contains the characteristics of the alternatives, such as bitrate and resolution [1]. Based on the metadata and the status of terminal/networks, the client makes decisions on which/when media parts are downloaded. Generally, adaptation methods for HTTP streaming can be divided into two types: throughput-based methods and bufferbased methods [2]. Throughput-based methods decide the version based on the estimated throughput only. Therefore, the client will quickly react to the network variations, which results in the fluctuations of video quality. Meanwhile, bufferbased methods decide the version usually based on some predefined buffer thresholds. Essentially, the client will maintain a version in a certain range of buffer level. However, if the buffer level reduces drastically, it will cause sudden changes of versions. Moreover, it is difficult to decide values for empirical thresholds. So far, most previous studies have just focused on the case of CBR (constant bitrate) video [2]. Our previous work in [5] is the first research on HTTP streaming for VBR video, where both throughput and video bitrate are dynamically estimated. This approach can provide a very stable buffer and a CBRlike service from VBR videos. In [6], a buffer-based adaptation method is proposed, where the buffer is divided into multiple ranges. Based on the partial-linear trend prediction of buffer level and the buffer thresholds, different actions are applied when the buffer level stays in different ranges. If the estimated change of buffer level is low, the client will maintain the current version for next segment. However, it will cause sudden version changes if the actual buffer level reduces drastically. In this paper, we propose a novel buffer-based adaptation method that can provide smooth video quality while still avoiding playback interruptions. The proposed method develops a trellis, first introduced in our previous work [2][7], to represent the possible changes of versions and the corresponding buffer levels in the near future. Based on the bitrate estimation of [5], the client can estimate the buffer levels for some next segments. A heuristic method is then presented to build a path of versions for some next segments. The experimental results show that our proposed method can avoid large version changes while still guaranteeing that the buffer is never underflown. The paper is organized as follows. In Section II, we first describe the adaptation problem together with bitrate and buffer level estimation. After that, Section III presents the details of our adaptation method. In Section IV, we present an evaluation of the proposed method and two reference methods. Finally, conclusions are given in Section V. II. ADAPTATION PROBLEM A. Adaptation Overview In this part, we present the adaptation problem, highlighting the relationship between throughput, bitrate, and buffer level. Some of the important notations and their definitions are shown in Table I. We first have the following definitions (Fig. 1) [2][7]: - Arrival curve: represents the accumulated data size received by the client at a given time instant. - Playout curve: represents the accumulated data size consumed by the player at a given time instant. 978-1-4673-7478-1/15/$31.00 2015
Symbol τ TABLE I. NOTATIONS AND DEFINITIONS Definition The actual throughput of segment i The estimated throughput for segment i The bitrate of segment i The current buffer level The buffer level after receiving segment i The estimated buffer level for segment i Index of the selected version for segment i The highest version whose bitrate is lower than the estimated throughput The bitrate of version k of segment i The segment duration Though the video is encoded in VBR mode, we suppose that the bitrate in each short segment is constant. So, the playout curve can be represented by linear sections, of which the slopes are bitrates of the media segments. Due to the fluctuations of instant throughput, the arrival curve actually contains non-linear sections. However, for simplicity, we just consider the points right after receiving a media segment. The arrival curve is then composed of linear sections connecting these points. Figure 1: Illustration of arrival curve and playout curve Suppose that the client starts receiving data at. At, the clients starts consuming data from the receiving buffer. Essentially, the horizontal distance between the arrival curve and playout curve is the buffer level of the client. If average arrival rate (or throughput) is equal to playout rate, the buffer will be stable. In the following, we assume that for segment i the server has a set of K versions, corresponding to K bitrates R i =,0. Note that in VBR video streaming, R i R l, or. At time, suppose that after completely downloading the current segment of version, the client will decide the versions for the next N segments,, corresponding to a sequence of bitrates,,,. Let us use the term path P to denote a sequence of versions:,,,,. (1) The decision problem can then be represented as a trellis containing all possible paths for a number of segments. If the current buffer level is large enough, a good path could be found to meet different criteria. However, if the current buffer level is very low, the client should aggressively switch to lower versions to avoid buffer underflows. B. Bitrate and Buffer Level Estimation It can be seen that if the instant bitrate of the versions can be estimated, the trellis representation can be used to estimate the buffer level for next segments. Based on the estimated buffer level in the near future, the client can adjust the playout curve in order to avoid buffer underflows. For this purpose, we adopt the bitrate estimation method in [5]. Basically, bitrate estimation is divided into intra-stream estimation and inter-stream estimation. The former means predicting the bitrates of segments within a version, while the later implies predicting the bitrates of segments across different versions. Given a bitrate, the time to download the current segment i with throughput is. After fully receiving that segment, the buffer has one more segment (or τ seconds of media). Therefore, the change of buffer level can be approximated by: 1. (2) In the near future, suppose that the throughput remains stable and close to the estimated throughput:. So, we obtain the following general equation: 1. (3) As a consequence, the estimated buffer level of a next segment is computed as follows. 0 0. (4) If the client changes step by step from version I i to version k, the change of buffer level can be estimated using Eq. (3): where is the number of changed versions. On the other hand, if the client maintains a version, e.g. version k, for next segments, the estimated change of buffer level, will be computed using Eq. (3) as follows., (5), 1. (6) III. PROPOSED ADAPTATION METHOD A. Overview In this section, we will propose an adaptation method that tries to maintain smooth video quality and a stable buffer
throughout a streaming session. Our proposed method is divided in two cases: Down-case (when the current segment bitrate is higher than the estimated throughput) and Up-case (in the opposite situation). In all cases, the client has to avoid buffer underflows while responding to the mismatch of actual throughput and segment bitrate. Because the buffer level should approach to the maximum buffer level (or buffer size ) in the near future, the proposed method will decide the appropriate moment that the client needs to switch to another version. To avoid buffer underflows, we will select a path P where all values are higher than a predefined minimum buffer threshold and the buffer level at least equals to ( ) at the end of the path. Especially, during the interval of the current (decided) path, if the actual throughput changes, resulting in a significant buffer variation, it is necessary to rebuild the path. In our work, if the mismatch of both the estimated buffer level and the real buffer level at is larger than one segment duration, i.e., we will rebuild the path P at. Note that, when path P is rebuilt, the client can goes into either down-case or up-case. For convenience, we use Figs. 2 and 3 to illustrate the version change in time. The general procedure is given as follows. Input: The current path P The current point :, Output: The version of the next segment If and j n s then : 1 of the current path P Else : & : 1 Rebuild path P by down-case or up-case B. Down-case (when ) In this case, the challenge is to reduce the video quality smoothly and to prevent buffer underflows. The basic idea of our heuristic is that, the client will decide a path P with n s stable (i.e. same version) segments to be maintained and then n c segments for gradual quality change (Fig. 2). If n s is 0, the client switches down the quality by one version and again finds a new path P with a positive value of n s for maintaining the current quality. So the role of n c is just to compute the value of n s. If the client should reduce the video quality step by step from the current version to the lowest one,, can be estimated by (5): (7), 1 where. Yet, if the estimated buffer level is greater than at the end of the path P, i.e.,, the client could maintain stable video quality for some next segments. So, the number of stable segments (from start point to stable point in Fig. 2) is computed as the highest n s that satisfies,,. (8) Figure 2: Possibilities of version changes in down-case If there is no n s satisfying (8), the client will decrease the video quality to avoid significant drops of buffer level. If the current buffer level is less than the minimum threshold, the client will switch down immediately to version (defined in Table I). However, if the current buffer level is higher than, the client will switch down to the target version step by step. At each lower version (e.g. version k ), we will compute, by using Eq. (7), where. Also, the value of n s is recomputed using Eq. (8) with, to check if this version can be maintained. Note that n s should be less than /, which is the maximum number of segments in the buffer. So, the client will select the path that can maintain the highest possible version while the buffer is well protected. C. Up-case (when ) In this case, the objective is to find the appropriate moment to increase the video quality by deciding n s stable (i.e. same version) segments. It is also expected that the client buffer level can reach right after the client has received the last segment of path P. Similar to the above case, if the client decides to increase the video quality step by step from the current version to the target version, the change of buffer level is computed by using Eq. (5): (9), 1 where. If,, the client should not change the video quality until the amount of buffer, can support the client to increase one version. The number of stable segments is again the highest n s that satisfies:,,. (10) Otherwise, if,, the client can increase the video quality immediately. In VBR streaming, if we observe a positive spike of bitrate of the same version lasting for few segments, we do not want to switch to a higher version since we would be immediately forced to switch back. Therefore, each time the client increases to a higher version k(corresponding to each up-point in Fig. 3), we will compute, by using Eq. (9), where. Then, n s is recomputed using Eq. (10) with, to check if this higher
version can be maintained. In Up case, note that n s should be greater than 10 segments, which is an empirical value to keep a stable quality. After that, the client will select the path that can maintain the highest possible version while guaranteeing a safe buffer. The proposed method also uses delay mechanism to keep when the current buffer level is greater than. Figure 3: Possibilities of version changes in up-case IV. EXPERIMENTS A. Experiment Settings In this section, we carry out experiments to evaluate our method in comparison with two other methods, which are the instant throughput-based method [5] and the buffer-based method proposed in [6]. For simplicity, these methods are called Aggressive method and Threshold-based method. The test video, which is taken from Tokyo Olympics sequence [10], is 400 secondss long. All versions are encoded by the main profile of AVC (Advanced Video Coding) [11] and all media segments have the same duration of 2 seconds. We encode VBR video versions with different values of QPs, namely 10, 16, 22, 24, 28, 34, 38, and 42. Fig. 5 shows the segment bitrates of all versions. The average bitrate of each version as well as the version indexes and QP are listed in Table II. In all experiments, the buffer size is equal to 15 segment durations (i.e. = 30s). We implement the Threshold- (,,, ) based method with buffer thresholds = (10s, 20s, 25s, 30s). The safety margin to be used with throughput estimate is 5% for Aggressive method. Meanwhile, our method is deployed with threshold buffer 10 and throughput estimation method of [8]. TABLE II. THE INDEX, QP, AND AVERAGE BITRATE OF VERSIONS Index QP 1 42 2 38 3 34 4 28 5 24 6 22 7 16 8 10 Average bitrates (kbps) 85.47 143.17 238.86 512.14 857.47 1129.02 2470.58 4923.99 IP Networks Dummy Net Web server Figure 4: The test-bed organization for the experiments Fig. 4 depicts our test-bed organization used for the experiments in this paper. On the server side, the Apache HTTP server of version 2.2.22 is installed on Ubuntu 12.04 LTS. For alive connections, the server s Timeout is set to 100s and MaxRequest to 0 (i.e. unlimited). The client is implemented in Java language and runs on a Windows 7 Professional notebook with 2.7 GHz Core i7-2620m CPU and 4GB RAM. Our test-bed uses DummyNet tool [10] installed at the client to simulate network characteristics with RTT = 40ms. The packet loss is set to 0% assuming that the fluctuations due to packet loss are already included in the bandwidth trace. Especially, this helps avoid having different throughput curves in different runs. Figure 5: Segment bitrates of different versions of test video Figure 6: Bandwidth trace used in the evaluation [12] B. Experiment Results This part shows the comparison of adaptation methods using a real bandwidth trace (Fig. 6) obtained from a mobile network [12]. Fig. 7 shows the results of the experiments. It can be seen that the threshold-based method tries to maintain the constant version, resulting in a unstable buffer (e.g. at t = 100s and t = 248s). On the contrary, the Aggressive method results in the most fluctuating version curve. However, compared to other methods, the buffer of this method has the smallest variation. Meanwhile, our proposed method maintains the unchanged version only if the future buffer is estimated to be safe. For example, at t = 61.3s, the proposed method goes into down-case and expects that the client can maintain 11 stable segments based on the estimated change of buffer level. However, at t = 84.2s, the segment bitrate is much higher than the actual throughput (4623kbps in comparison with 1464kbps), which results in a big mismatch of the estimated buffer level and the current level. So, the proposed method rebuilds the path and decides to lower the video quality to
avoid buffer underflows. The results in this experiment show that the client can maintain 6 stable segments over 11 expected ones. That means, our proposed method can capture quickly the changes of buffer level, and then adjust the video quality accordingly (e.g. at t = 125.7s, t = 235.6s, etc.). Therefore, the version index curve of our method is smoother than those of the other methods while the buffer is well protected. (a) Adapted bitrate than that of the threshold-based method. Therefore, the video clip provided by our method has a less negative impact on endusers. V. CONCLUSIONS In this paper, we have presented an adaptation method for VBR video streaming over HTTP, aiming at providing smooth video quality. The method was based on a trellis that represents the possible changes of versions and the corresponding buffer levels in the near future. Based on the trellis, we presented a heuristic method to build a path of versions for some next segments. The experimental results showed that the proposed method was effective to maintain stable video quality under fluctuations of bandwidth and video bitrate. ACKNOWLEDGMENT The authors are grateful to Prof. Christian Timmerer of Klagenfurt University for providing the bandwidth trace used in this study. REFERENCES (b) Resulting buffer level (c) Adapted video quality Figure 7: Adaptation results of three adaptation methods using the complex bandwidth trace TABLE III. STATISTICS OF ADAPTATION METHODS. EXCEPT THE BUFFER LEVEL, THE UNIT OF OTHER PARAMETERS IS VERSION INDEX. Aggressive Thresholdbased method method Proposed Statistics method Min buffer level 23.4s 3s 7.7s Average of versions 6.13 5.07 6.17 Max version 8 8 8 Min version 3 2 4 Max switch 3 4 1 Number of switches 43 22 17 Some statistics provided in Table III reflect well the behaviors of the methods. As expected, our method has the smallest number of switches (only 17). Moreover, the minimum version is the highest and the maximum switch (i.e. version difference in a switch) is the smallest (only 1). Meanwhile our method provides a much more stable buffer [1] T. C. Thang, Q-D Ho, J. W. Kang, A. T. Pham, Adaptive Streaming of Audiovisual Content using MPEG DASH, IEEE Transactions on Consumer Electronics, vol. 58, no. 1, pp. 78-85, Feb. 2012. [2] T. C. Thang, H. T. Le, A. T. Pham, Y. M. Ro, An evaluation of bitrate adaptation methods for HTTP live streaming, IEEE Journal on Selected Areas in Communications, vol. 32, no. 4, pp. 693-705, Apr. 2014. [3] S. Tullimas, T. Nguyen, R. Edgecomb, S. C. Cheung, Multimedia streaming using multiple TCP connections, ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 4, no. 2, pp. 1-20, Feb. 2008. [4] T. V. Lakshman, A. Ortega, and A. R. Reibman, Variable bit-rate (VBR) video: Tradeoffs and potentials, Proceedings of the IEEE, vol. 86, no. 5, pp. 952 973, May 1998. [5] T. C. Thang, H. T. Le, H. X. Nguyen, A. T. Pham, J. W. Kang, Y. M. Ro, Adaptive video streaming over HTTP with dynamic resource estimation, Journal of Communications and Networks, vol. 15, no. 6, pp. 635-644, Dec. 2013. [6] Y. Zhou, Y. Duan, J. Sun, Z. Guo, Towards simple and smooth rate adaption for VBR video in DASH, in Proc. Visual Communications and Image Processing Conference, pp. 9-12, Valletta, Malta, Dec. 2014. [7] H. T. Le, V.D. Nguyen, P.N. Nam, T. C. Thang, A. T. Pham, Bufferbased bitrate adaptation for adaptive http streaming, in Proc. of IEEE ATC2013, pp. 33 38, HoChiMinh City, Vietnam, Oct. 2013. [8] T. C. Thang, A. T. Pham, H. X. Nguyen, P. L. Cuong, J. W. Kang, Video streaming over HTTP with dynamic resource prediction, in Proc. of IEEE ICCE2012, pp. 130-135, Hue City, Vietnam, Aug. 2012. [9] L. Rizzo, Dummynet: A simple approach to the evaluation of network protocols, SIGCOMM Comput. Commun. Rev., vol. 27, no. 1, pp. 31 41, Jan. 1997. [10] G. Van der Auwera, P. T. David, M. Reisslein, Traffic and quality characterization of single-layer video streams encoded with H. 264/MPEG-4 advanced video coding standard and scalable video coding extension, IEEE Transactions on Broadcasting, vol. 54, no. 3, pp. 698-718, September 2008. [11] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, Overview of the H. 264/AVC video coding standard, IEEE Transactions on Circuits and System for Video Technology, vol. 13, no. 7, pp. 560-576, 2003. [12] C. Muller, S. Lederer, and C. Timmerer, An evaluation of dynamic adaptive streaming over HTTP in vehicular environments, in Proc. ACM Multimedia Systems Conference 2012 and the 4th ACM Workshop on Mobile Video, North Carolina, Feb. 2012.