An Objective Multi-layer QoE Evaluation For TCP Video Streaming Peng Yu, Fan Liu, Yang Geng, Wenjing Li, Xuesong Qiu State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing 100876, China Abstract It s a challenge to effectively assess Quality of Experience (QoE) for TCP video streaming with network performance parameters, to resolve this problem, an objective hierarchical Evaluation for Transmission Control Protocol (TCP) video streaming is proposed under video playback scenarios. QoE assessment for TCP video streaming is resolved into two sub-steps. In the first place, in consideration of video playback performance parameters affecting QoE, the authors demonstrate three novel application-layer metrics. Further, impact of network status on video playback performance is investigated and the authors propose high level network-layer parameters. Then the correlation between the network-layer parameters and application-layer metrics is characterized through analysis and inference. In the secondly place, subjective tests are conducted to evaluate QoE from application-layer metrics. Ultimately, the authors validate analysis and model by simulations and experiments in real networks. The experimental study shows that the proposed method performs well in assessing QoE of TCP video streaming. Keywords QoE; TCP; application-layer metrics; network parameters; subjective experiment I. INTRODUCTION Internet video has been fast growing last years. It was predicted that consumer Internet video traffic would comprise around 69 percent of all consumer Internet traffic in 2017,up from 57 percent in 2012 [1]. With the ever-increasing of video traffic, whether video stream quality is acceptable by end users or not is of great importance and urgency for network operators, content providers, streaming servers and ISPs [2]. As applications such as IP-based TV, video sharing websites and P2P streaming are promptly and widely adopted in the Internet, and hypertext transfer protocol (HTTP) video streaming over TCP is becoming more and more popular. In this paper, we focus on QoE assessment of TCP video streaming. To evaluate QoE for HTTP/TCP videos with network performance parameters under video playback scenarios, the authors propose an objective hierarchical QoE Evaluation for TCP video streaming. In consideration of video playback performance parameters affecting QoE, the authors present three novel application-layer metrics: initial buffer delay, mean re-buffering duration and re-buffering frequency. Further, how network status impacts video playback performance is investigated and the authors demonstrate network-layer metrics. Then the correlation from the network-layer parameters to application-layer metrics is analytically characterized. Besides, the authors perform subjective tests to evaluate QoE from application-layer metrics. Ultimately, the authors validate analysis and model by simulations and experiments in real networks. The experimental study shows that the proposed method performs well in assessing QoE of TCP video streaming. The remainder of the paper is structured as follows. Section II describes related works and highlights the method adopted in this article. Section III contributes the correlation between network-layer parameters and application-layer metrics by analytic derivation, and simulation experiments are conducted to validate theoretical analysis. Section IV illustrates subjective experiments to evaluate QoE from application-layer metrics. Besides, the simulation results and our main findings are also demonstrated in this section. Our conclusion is drawn in section V. II. RELATED WORK Conventionally, Internet video traffic runs over user datagram protocol (UDP) instead of TCP. The trend of TCP streaming rely on the fact that the deployment and use of HTTP/TCP multimedia applications is easier than UDP based multimedia applications in consideration of the wide use of network address translation (NAT) and firewalls[3]. Distinct from UDP-based streaming, reliable connection-oriented TCPbased streaming presents new features. Retransmission mechanism handles packet loss, and avoid occurrence of frame miss, thus video quality not degrading in various network environments. Moreover, progressive download technology achieves that end users can watch incompletely downloaded video clips. With the combination of subjective and objective analysis, many metrics have been proposed for QoE assessment of UDPbased video streaming, such as the Peak Signal-to-Noise-Ratio (PSNR) and Structural Similarity Index Measurement (SSIM) [4]. These assessment methods mainly rely on video artifacts such as such as slice error, blocking, ghosting and freeze frame. These metrics cannot actually apply TCP video QoE assessment with reliable transmission. To improve quality of TCP-based video streaming application, metrics such as startup delay (namely, the initial buffer delay), the buffer under-flow probability and the buffer 978-3-901882-76-0 @2015 IFIP 1255
overflow probability in to consideration in [3]. But QoE metrics is ignored. Amy Csizmar Dalal et al. [5] assessed QoE of TCP video streams from objective, application-layer measurement. However, they just consider application metrics influence on video quality, which may provide little information for network optimization and quality improvement. A quality monitoring mechanism with new defined MOS based on the number and duration of occurred stalling events [6], for adaptive TCP-based video streaming, the average and maximal values of interruption delays and Quantization Parameter is used to evaluate QoE [7]. Correlations among QoE and several impairment factors for TCP-based video streaming such as number of pauses, their duration and temporal location are considered in [8]. However, important parameters reflecting network performance, e.g. throughout and buffer size are ignored. Taking network performance into consideration, [9] discusses the three-layer QoE assessment architecture, but mean opinion score (MOS) value is get arbitrary without enough verification. Based on our previous work for TCP-based QoE assessment [10], the authors propose three simple application performance metrics to measure video streaming quality but also study how video quality is influenced by network throughput and buffer settings. Still, back propagation neural network (BPNN) is used to verify the efficiency for our proposed method. Re-buffering frequency (denoted by F rebuf ): This metric reflects the frequency of re-buffering event while the whole video playback. B. Network-layer Performance Parameters TCP video streaming is transmitted over various network conditions, resulting in diverse video quality degrades. Network parameters, including packet loss, delay, packet duplication and re-ordering, demonstrates network performance and throughput. In this paper, the authors investigate how network status affects video application-layer metrics and further user QoE. Accordingly, we measure highlevel network performance parameters to obtain various network conditions. In TCP video streaming transmission, the receiver holds playback buffer to eliminate or decrease the effect of network throughput fluctuations. The structure of playback buffer is depicted in Fig. 1. As is shown, B 0 (bit) is the playback buffer in the receiver and is large enough to store video of several seconds. B max (bit) indicates the maximum threshold for video playback and B min (bit) the minimum threshold. While loading video streaming, initial delay takes place from the moment when the buffer receives data until buffer occupancy reaches B max. During video transmission, Pause or re-buffering event occurs as soon as the buffer size is less than B min, and the video will not start to play out until the buffer size surpasses B max. III. NETWORK-LAYER PARAMETERS AND APLLICATION- LAYER METRICS In this section, for TCP-video streaming with playback scenarios, the authors first present three simple applicationlayer performance metrics significantly influencing QoE of video streaming and network-layer performance parameters conclusively affecting video application-layer metrics. Secondly, correlation from network-layer parameters to application-layer is analytically derived. A. Application-layer Performance Metrics For TCP-based video streaming, if the TCP throughput is lower than the playback rate, the video playback will pause and wait for new video data[9]. Then the end users may suffer re-buffering more than once with diverse durations, which crucially influence end users perceived quality. Considering video playback state and influences on QoE, the authors propose three quantifiable application-layer performance metrics to quantify video playback status: Initial buffer delay (denoted by D init ): This metric measure duration between the time video starting to be loaded and starting to play. Mean re-buffering duration (denoted by D rebuf ): This metric measure the average period of re-buffering whiles the whole video playback. Fig. 1. Structure of video playback buffer in the receiver The process of video streaming transmission and playback resembles a leaky bucket, video transmission acting as data inflow and playback as data outflow. Similarly, three scenarios may encounter during video playback, illustrated in Fig. 2, in which η (bit/s) represents the average TCP throughput and λ (bit/s) denotes the video playback rate. When the TCP average throughput (η) is larger than the playback rate (λ), the buffer occupy will continue to increase as long as buffer in the receiver is large enough and video plays out smoothly after the initial buffer delay. In the case TCP average throughput equals video playback rate, buffer occupancy in the receiver keeps around B max and video solely suffers initial buffer delay without pause event afterwards. Whereas TCP average throughput is lower than video playback rate, the video playback will pause and wait for new video data inflowing. Only when the data stored in the buffer reaches B max can playback recovery. 1256
Fig. 2. Buffer scenarios in the receiver during video transmission and playback Consequently, for a specific receiver with buffer requesting video with given playback rate in the Internet, network throughput determines the video application performance metrics and further user QoE. By adopting TCP Reno flow [11], the average TCP throughput η is as a function of packet loss rate, round-trip time (RTT) and delay, which can be expressed as: 1 η= 2bp 3bp R +T p p 3 + 8 2 0 min 1, 3 (1 32 ) Which R denotes round-trip time, p is the packet loss rate, b is the number of acknowledged packets by an ACK, and T 0 is the retransmission timeout. In general, b is set as 2 and T 0 is set as four times of RTT. Thus, packet loss rate and RTT are measured to quantify network performance. C. Correlation Function from Network-layer parameters to Application-layer metrics It s presumed that the end users do not interact with the video during the playback, that is, the end users act not pausing and forward/backward seeking, to simplify the model. Based on the assumption and derivative analyze, we establish model to correlate network-layer parameters with application-layer performance metrics as below. 1) Initial buffer delay (D init (s)) The initial buffer delay refers to duration from the moment data flowing in the receiver buffer and buffer occupancy reaches the maximum threshold B max, this is: 2) Mean re-buffering duration (D rebuf (s)) (1) D init = B max / η (2) Fig.3 illustrates typically playback and pause event while video transmission. As is depicted, at time t 1, the buffer occupancy reaches lower than B min and pause event occurs. The client must wait until time t max when the buffer occupancy surpasses B max. Whenever the buffer occupancy attains B max, playback will recover until the next pause at time t 2. Thus period from time t 1 to t max represents a pause event and period from t max to t 2 denotes a play event. Fig. 3. Video playback and pause event during playback During pause event or video buffering, mean re-buffering duration estimates time intervals from buffer occupancy of minimum threshold B min to maximum threshold B max, that is: 0, η λ Drebuf = Bmax Bmin, η < λ η As in the case of η > λ or η = λ, video can play out smoothly without re-buffering. 3) Re-buffering frequency (F rebuf (1/s)) If the TCP throughput η larger than or equal to video playback rate λ (which means η > λ or η = λ), F rebuf is set as 0. In the condition of η < λ, re-buffering frequency estimates time intervals from buffer occupancy of under minimum threshold B min to the next time under minimum threshold B min, and is derivate as below. Duration of pause or buffering event, we have: During playing event, we have: (3) B max - B min = η (t max - t 1 ) (4) B max - B min = (λ - η) (t 2 - t max ) (5) From (4) and (5), the period of a completely pause-play event is given by: Bmax Bmin Bmax Bmin αλ t2 t1 = + = η λ η η( λ η) Where α is defined as B max - B min. Accordingly, re-buffering frequency is given by: F rebuf 1 η( λ η) = = t t αλ 2 1 We (2)~(8) we can obtain quantitive correlation among network parameters and application-layer metrics. D. Simulation Experiment and Analysis In this part, simulation experiment is performed to validate the correlation function from network-layer parameters to application-layer metrics. (6) (7) 1257
1) Simulation Experiments set-up Our simulation platform for evaluating function from network-layer performance parameters to application-layer metrics is depicted in Fig.4. A web multimedia server, with Apache Tomcat 8.0 to store video clips for the client to request and download, sits on an isolated subnet behind the router. The client runs on Google Chrome, video playing out and monitoring page with a HTML5 script application to record the playback status of video. The router, separating the web server from the subnet, engenders different network packet loss and RTT. Fig. 4. Simulation platform for modeling from network QoS to application performance metrics In the web server, a video flow widely popular in YouTube, was stored for the client to download and play out. The video is H.264 encoded with FLV format and length of 141 seconds. TABLE I. THE NETWORK-LAYER PARAMETERS EMULATED IN THE ROUTER Network QoS network connection Parameters TCP(Reno) 0 to 5% with 0.5% interval 5% to 10% with 1% interval ±0.25 packet loss burst 0 to500ms with 50ms interval 500ms to 1s with 100ms ±10ms random variation packet loss rate (%) RTT (ms) The router runs netem software [12] to control the level of network congestion between the server and client. The network QoS parameters, packet loss rate and RTT, are described in Fig. 5. Fig. 6. 1258 TAB.I. Thus the experiment results are more reliable and valuable. Refer to simulation setting in [10], the packet loss rate was varied from 0% to 10% to emulate real network packet loss. Besides, we have added the optional correlation to simulate packet burst losses. We also chose delay in the route between 0~500 ms to investigate the impact of network delay. Random variation was added to network delay parameter to emulate network variability. As is illustrated in Table I, we have set 16 16 = 256 network conditions with various performance parameters. In the client, we have developed a little-application based on Google Chrome to record values of the three applicationlayer metrics while video playing out. On a specific network condition, the client requests and downloads the video flow from the web server for three times, and we calculate the average value to signify values of application-layer metrics. 2) Objective simulation results and Analysis Fig.5 demonstrates the histograms of the application-layer metrics under different network conditions. As is shown in histograms, initial buffer delay, mean re-buffering duration and re-buffering frequency all increase with the packet loss rate and RTT. Besides, it can be noticed that the impact extent of packet loss rate and RTT on the three application-layer metrics is distinct. Fig.6 compares the value of application-layer metrics estimated by proposed correlation function with experimental results. As is depicted, in contrast with the mean re-buffering duration and re-buffering frequency, the initial buffer delay prediction performs larger error. In consideration of TCP transmission, since we assume network throughput to be steady and evaluate average throughput, the larger error may be attributed to the gradually changing congestion window when TCP connection is established initially. The histograms of three application-layer metrics in various sets of packet loss rate and delay Comparison the value of application-layer metrics estimated by proposed correlation function with experimental results
IV. QOE EVALUATION In Section 3, correlation from network-layer parameters to application-layer metrics is analytically derived and experimentally validated. In this section, we establish mapping model from application-layer metrics to QoE. A. Subjective Experiments set-up In Section 3, simulation for verifying function from network-layer parameters to application-layer metrics is conducted, in which we set 16 16 = 256 different network scenarios and accordingly obtain 256 video sequences with various buffering. These 256 video sequences are available for subjective experiment. After disposing video sequences, we performed subjective quality assessment to gain mean opinion score (MOS) values. We apply Single Stimulus method [13] to conduct our subjective assessment. 140 subjects from all walks of life with age ranges from 20 to 40 are selected to score the video sequences. In order to achieve a more accurate and stable MOS value, the 140 subjects are divided into four groups at random, each group with 35 people. Similarly, three sets of video sequences are randomly separated. On account of video quantity and visual fatigue, each group watches and scores a set of video sequences. Besides, subjects watch and score a set of 64 video sequences by eight times, each time grading eight sequences. By means of subjective assessment, the author get 35 different set of MOS value data. We analyze the sample data by means of the mean score and confidence interval. We apply the method proposed in [14] to screening the observers and exclude liars. The ultimate subjective MOS value is taken the average of subjective score obtained in the subjective test. After analysis of MOS value data, BPNN is design to model the relationship between the application-layer metrics and QoE. The process to establish BPNN with the selected application-layer parameters is depicted in Fig. 7. We apply the methods proposed in [10] to implement our designed BPNN. Fig. 7. The process of establishing BPNN model B. Subjective Experiment Result and Analysis After collecting subjective MOS value data and designing BPNN, the sample data is normalized and trained. Performance of our BPNN model is presented in Figure 8, in the form of the Regression R values. Regression R values measure the correlation between output and target, i.e. correlation coefficient of estimated MOS and subjective MOS value. From Fig.8 we can find that Regression R is almost 100%, which means correlation estimated MOS and subjective MOS value are nearly coincident. While the BPNN model is trained with high performance, the mapping model is tested and validated. The authors chose another popular video on YouTube to perform the verification experiment. Combinations of three application-layer factors, which are initial buffer delay, mean re-buffering duration, rebuffering frequency were used to create 12 12 = 144 videos with various buffering. On one hand, estimated MOS data was got by our trained BPNN model. On the other hand, we select 30 persons from various professions for subjective testing to obtain the subjective MOS values. Fig. 8. Training Performance of designed BPNN model Afterwards, we compare the estimated MOS and the subjective MOS to illustrate the evaluation performance. We find that Mean Squared Error (MSE) and Regression-R Values are 0.0286809 and 96.0732% respectively. Then we can conclude that the estimated MOS and subjective MOS with a high correlation and the evaluation method show significant performance. Furthermore, we conduct regression analysis to compare the distinct influence degree of three application-layer performance factors on QoE. The correlation coefficient between D init, D rebuf, F rebuf and QoE are 0.1412, 0.1927, 1. 3111 respectively. So we identify the re-buffering frequency to be the main factor affecting end user s QoE, which is consistent with conclusions in [9]. V. CONCLUSION In this paper, an objective hierarchical QoE Evaluation for TCP video streaming is proposed. The authors decompose QoE evaluation into two sub-steps: firstly correlations function from network-layer parameters to application-layer metrics, and secondly mapping model from application-layer metrics to QoE. In consideration of video playback performance parameters affecting QoE, the authors present three novel application-layer metrics to quantify video playback status. Furthermore, analysis of network condition impact on application-layer metrics, network-layer parameters are demonstrated. Through validating the above two sub-steps with correlation analysis and mapping model by simulations and experiments in real networks. The results show that the proposed method performs well in QoE evaluation of TCP video streaming. Thus, it may be useful when subjective test is hard to operate. Next we will explore settings of monitoring point for TCP-based video 1259
transmissions. Moreover, content and user behavior effect for QoE under wireless networks should be discussed as well. ACKNOWLEDGMENT This research is supported by National Nature Science Foundation of China (61271187), National Key Technology R&D Program (2012BAH06B02) and Chinese Universities Scientific Fund (BUPT2014RC1104). REFERENCES [1] Cisco. Visual Networking Index: Forecast and Methodology, 2012 2017. [2] Tavakoli S, Shahid M, Brunnstrom K, et al. Effect of content characteristics on quality of experience of adaptive streaming, 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2014: 63-64. [3] Yan J, Mühlbauer W, Plattner B. Analytical framework for improving the quality of streaming over TCP. IEEE Transactions on Multimedia, 2012, 6(14): 1579-1590. [4] Weiwei Li, Rehman H U,Kaya D, et al. Video Quality of Experience in the Presence of Accessibility and Retainability Failures. 2014 10th International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine), IEEE, 2014: 1-7. [5] Ameigeiras P, Ramos-Munoz J J, Navarro-Ortiz J, et al. QoE oriented cross-layer design of a resource allocation algorithm in beyond 3G systems. Computer Communications, 2010, 33(5): 571-582. [6] Eckert Marcus,Knoll Thomas Martin, and Schlegel Florian. Advanced MOS calculation for network based QoE Estimation of TCP streamed Video Service. 2013 7th International Conference on Signal Processing and Communication Systems (ICSPCS), 2013: 1-9. [7] Singh Kamal Deep, Hadjadj-Aoul Yassine, Rubino Gerardo. Quality of experience estimation for adaptive HTTP/TCP video streaming using H.264/AVC. The 9th Annual IEEE Consumer Communications and Networking Conference. 2012: 127-131. [8] Rodriguez D Z, Abrahao J, Begazo D C, et al. Quality metric to assess video streaming service over TCP considering temporal location of pauses. IEEE Transactions on Consumer Electronics, 2012, 58(3): 985-992. [9] Mok R K P, Chan E W W and Chang R K C. Measuring the Quality of Experience of HTTP Video Streaming. 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM), 2011: 485-492. [10] Ruiyi Wang, Yang Geng, Yifan Ding, et al. Assessing the quality of experience of HTTP video streaming considering the effects of pause position. 2014 16th Asia-Pacific Network Operations and Management Symposium (APNOMS), 2014: 1-4. [11] J. Padhye, Y. Firoiu, D. Towsley, and J. Kurose. Modeling TCP Reno performance: A simple model and its empirical validation. IEEE/ACM Transactions on Networking. 2000, 8(2): 133-145. [12] The Linux Foundation. Netem. http://www.linuxfoundation.org /collaborate/workgroups/networking/netem. [13] I. R. Assembly, Methodology for the subjective assessment of the quality of television pictures. International Telecommunication Union, 2003. [14] Dalal A C, Bouchard A K, Cantor S, et al. Assessing QoE of on-demand TCP video streams in real time. 2012 IEEE International Conference on Communications (ICC), IEEE, 2012: 1165-1170. 1260