Master Thesis. Performance Evaluation of Dynamic Adaptive Streaming on a Global Scale

Size: px

Start display at page:

Download "Master Thesis. Performance Evaluation of Dynamic Adaptive Streaming on a Global Scale"

Timothy Dawson
5 years ago
Views:

1 Saarland University Faculty of Mathematics and Informatics (MI) Department of Computer Science U N I V E R S I T A S S A R A V I E N I S S Master Thesis Performance Evaluation of Dynamic Adaptive Streaming on a Global Scale submitted by Pablo Gil Pereira on Aug 4th, 2017 Supervisor Prof. Dr.-Ing. Thorsten Herfet Advisor Andreas Schmidt, M.Sc. Reviewers Prof. Dr.-Ing. Thorsten Herfet Prof. Dr.-Ing. Holger Hermanns

Master Thesis Pablo Gil Pereira 2565544 Performance Evaluation of Dynamic Adaptive Streaming on a Global Scale With the advent of streaming services such as YouTube or NetFlix, the Internet is facing

3 Master Thesis Pablo Gil Pereira Performance Evaluation of Dynamic Adaptive Streaming on a Global Scale With the advent of streaming services such as YouTube or NetFlix, the Internet is facing multimedia distribution over many point-to-point connections using protocols that are not designed for multimedia transport. Dynamic Adaptive Streaming over HTTP (DASH) is the most prevalent solution in this area and builds on top of the fully reliable, but not timesensitive TCP protocol. While this seems suboptimal from a theoretical perspective, the solution is still in wide use and enormous efforts are put into developing this approach even further. At the same time, it is unknown how well these perform on a global scale and how large the gap between practical performance and theoretical bounds really is. Measuring DASH and TCP performance is done using Quality-of- Service (QoS) and Quality-of-Experience (QoE) metrics. This is due to the different views of network operators and users, which have varying interests, but consider the same infrastructure. To this end, it is necessary to execute the evaluation inside global networking testbeds, in order to get reliable data. The goal of this thesis is to use global network testbeds to provide insights into how well TCP is performing on a global scale and how this impacts DASH as an application layer protocol. In particular the thesis includes the following tasks: Introduction of DASH concepts, including strengths, weaknesses, interconnections with TCP, opportunities for improvement and current directions of research. Implementation of evaluation tools for quantifying DASH performance within selected testbed environments. Evaluating DASH performance, providing relationships between QoS and QoE metrics, to guide further improvement of DASH solutions. Telecommunications Lab Department of Computer Science Prof. Dr.-Ing. Thorsten Herfet Universität des Saarlandes Campus Saarbrücken C6.3, 10. OG Saarbrücken Phone: Fax: Advisor: Andreas Schmidt, M.Sc. Supervisor: Prof. Dr.-Ing. Thorsten Herfet

4 4 Abstract There is an increasing trend over the last years to deliver different types of applications, and in particular multimedia applications, over the Internet. The prevalent solution for video streaming over the Internet is Dynamic Adaptive Streaming over HTTP (DASH), which provides a new paradigm to overcome the Internet restriction for multimedia delivery by adapting its data rate to the network conditions at each time. DASH is usually deployed on top of protocols in theory not intended for multimedia delivery, such as HTTP and TCP. Cross-layer analysis are needed to understand the real effects of these protocols on DASH performance. This thesis focuses on the effects of TCP on DASH performance, and in particular TCP s congestion control and transmission segmentation. The lately released BBR congestion control is said to provide better performance than the widely used CUBIC algorithm, but its effect on DASH performance are still unknown. Moreover, transparent transmission segmentation promises performance improvements when the TCP connection is established throughout multiple physical links with different characteristics. The objective of this thesis is to analyse these techniques and provide insights on their effects on DASH performance in terms of Quality-of-Service and Quality-of-Experience.

5 5 Acknowledgements First and foremost, I want to thank Andreas Schmidt for providing so much support and valuable advice for this thesis. Working with him is something I enjoyed so much and it has been a really enriching experience. I would also like to thank Prof. Dr.-Ing. Thorsten Herfet for giving me the opportunity to work at his chair. I also thank Prof. Dr.-Ing. Holger Hermanns for agreeing to review my thesis despite his busy schedule. I must also thank to all the guys at the Telecommunications Lab, especially those that also contribute to this thesis with any piece of advise. Thanks to all the friends from high school to university, from any of whom I have learnt something. In particular thanks to Eloy, Andrés and Javi for making my feel like I never left home. I would also like to thank my parents, grandparents and brothers, namely Macarena, Emilio, Enrique, Maricarmen, Marta and Jorge, because without their love and support any of this would have been possible. Finally, I would also like to thank Kay for her constant support throughout this challenging year, but especially for being the best one.

6 6 Evaluación del Rendimiento de Streaming Dinámico Adaptativo a Escala Global defendido por Pablo Gil Pereira el 28 de agosto de 2017 Supervisor Dr. Thorsten Herftet Tutor Mtr. Andreas Schmidt Universidad del Sarre Facultad de Matemáticas e Informática Departamento de Informática Resumen Durante los últimos años ha habido un aumento en la utilización de servicios de streaming de vídeo, lo que supuso que el 73% del tráfico de Internet en 2016 fuera vídeo [? ]. Además, se espera que esta tendencia continúe, haciendo que para 2021 alcance un 82% del tráfico total. Este crecimiento viene motivado por la consolidación de los grandes proveedores de contenido, como YouTube o Netflix, que utilizan la tecnología conocida como streaming dinámico adaptativo para la distribución de vídeo sobre Internet. La principal característica de de los sistemas de streaming adaptativo es su adaptación a las condiciones de red, las cuales fluctúan con el tiempo. Para ello, múltiples representaciones del mismo vídeo son almacenadas en el servidor de vídeo. Cada una de estas representaciones tiene una calidad distinta, y por lo tanto un tasa de bit diferente, la cual es mayor conforme la calidad aumenta. El cliente selecciona la calidad que mejor se adapta a las condiciones de red en cada momento, de acuerdo a diferentes parámetros, como pueden ser el ancho de banda o la cantidad de vído almacenada en el buffer de aplicación. La popularidad de los sistemas de streaming adaptativo ha resultado en el estándar DASH (Dynamic Adaptive Streaming over HTTP) [? ]. Las aplicaciones de streaming de vídeo necesitan cierta cantidad de recursos, de tal forma que el vídeo llegue al cliente antes de que deba ser reproducido y así se reproduzca fluidamente. El streaming dinámico adaptativo utiliza los protocolos de Internet, que sin embargo no garantiza el acceso exclusivo a los recursos, ya que éstos fueron diseñados para ofrecer un servicio best-effort. La capacidad de adaptación de los sistemas de streaming adaptativo minimiza los efectos de la fluctuación de los recursos asociados al cliente. Estos sistemas habitualmente ofrecen el vídeo sobre HTTP, el cual utiliza el protocolo de transporte TCP. TCP implementa las funciones de red de control de flujo y congestión, que controlan la cantidad de datos que pueden ser transmitidos a

7 7 la red. Por lo tanto, estas funciones controlan la cantidad de recursos que una aplicación obtiene, y en consecuencia influyen en cómo el vídeo es descargado. El control de flujo es utilizado por el receptor de una conexión TCP para comunicar al remitente la cantidad máxima de datos que puede aceptar. Sin embargo, el control de congestión es utilizado para adaptar la tasa binaria del remitente a las condiciones de red, de tal modo que no la congestione. Hay diferentes implementaciones del control de congestión y cada uno de ellas utiliza diferentes estrategias, que resultan en un comportamiento y un rendimiento únicos. La implementación más aceptada actualmente es CUBIC [? ] y recientemente ha sido lanzada BBR [? ], la cual podría mejorar el rendimiento alcanzado con CUBIC. Debido a que estos algoritmos controlan la tasa binaria del servidor de vídeo en DASH, el algoritmo de control de congestión utilizado es importante para su rendimiento. Además, en los últimos años se ha desarrollado una nueva tecnología conocida como segmentación transparante de transmisiones, la cual proporciona mejoras en el rendimiento de conexiones TCP sobre varios enlaces con diferentes características. Esto se consigue dividiendo una única conexión TCP, habitualmente de gran longitud, en dos o más segmentos. Si los paquetes se pierden en un segmento, no necesitan ser descargados de nuevo desde el servidor, sino que son retransmitidos directamente desde dicho segmento. Por lo tanto, esta tecnología reduce los tiempos de recuperación tras pérdidas, así como otros efectos positivos en diferentes dominios. En este trabajo se han analizado los algoritmos de control de congestión CUBIC y BBR y la segmentación transparante de transmisiones para cuantificar sus efectos en el rendimiento de DASH. Para ello se han utilizado diferentes métricas de rendimiento [?? ], las cuales cuantifican lo que se conoce como calidad de servicio (QoS). Sin embargo, estas métricas únicamente consideran los aspectos técnicos relacionados con el rendimiento, de tal forma que aspectos subjetivos que afectan la percepción final del usuario no son considerados. Los aspectos subjetivos se cuantifican mediante métricas de calidad de experiencia (QoE), las cuales miden el grado de satisfacción de los usuarios con un servicio. Las métricas de QoS y QoE permiten realizar análisis de rendimiento de los sistemas DASH. Por lo tanto, estas métricas son utilizadas en este trabajo para cuantificar el rendimiento de DASH, de forma que se consideren tanto aspectos técnicos como subjetivos. Para realizar los experimentos que computar estas métricas, es necesario un entorno de pruebas para el despliegue de un sistema completo de evaluación. Para ello se ha utilizado el entorno de pruebas GENI [? ], el cual proporciona un entorno distribuído donde desplegar y controlar máquinas virtuales de forma remota. Además de las máquinas virtuales, GENI permite reservar enlaces de capacidad configurable entre ellas. Teniendo en cuenta todo lo mencionado hasta ahora, el objetivo principal de este trabajo es evaluar el rendimiento de DASH, para así encontrar relaciones entre la calidad de servicio y de experiencia en este tipo de sistemas. Para ello se ha estudiado el estándar DASH y su pila de protocolos, con el objetivo de analizar posibles interconexiones entre las diferentes capas que puedieran afectar al rendimiento de DASH. También

8 8 se ha estudiado el actual estado del arte en cuanto a métricas de QoS y QoE y las métricas seleccionadas han sido implementadas en un sistema de evaluación, el cual se ha utilizado para llevar a cabo varios experimentos. Las principales contribuciones de este trabajo resultantes de estos experimentos están expuestas a continuación. Debido a su naturaleza on-off, DASH obtiene peores resultados, tanto en términos de QoS como de QoE, con BBR que con CUBIC en presencia de tráfico cruzado cuando los routers intermedios tiene buffers pequeños. Esto se debe a que, debido su comportamiento on-off, bajo estas circunstancias DASH funciona en la fase de recuperación rápida por el desbordamiento de los buffers (ver Sección??). Por lo tanto, DASH necesita buffers de gran tamaño cuando funciona con BBR debido a su comportamiento on-off, de tal forma que pueda beneficiarse de las características deseables de BBR. Por el contrario, esta es una necesidad particular del tráfico on-off, ya que cuando el tráfico es constante BBR es capaz de alcanzar su punto de operación (Sección??). El bajo rendimiento de BBR en estas circunstancias se traduce en una peor selección de la calidad de vídeo a descargar y un mayor retraso inicial de la reproducción del vídeo. Por el contrario, en presencia de pérdidas y sin tráfico cruzado, BBR obtiene mayor tasa binaria que CUBIC. En un sistema DASH, esto significa una mejor selección de la calidad de vídeo y menor retraso inicial de reproducción. Además, los experimentos muestran que DASH tiene alta sensibilidad a las pérdidas cuando la calidad de vídeo seleccionada demanda la mayoría del ancho de banda disponible. En este caso, el cliente no puede descargar con antelación una cantidad de vídeo suficiente como para evitar la congelación de la reproducción. Este último efecto se produce para ambos algoritmos, BBR y CUBIC. Por último, la segmentación transparente de transmisiones no produce un aumento significativo de la calidad de servicio. Sin embargo, se consigue una mejor calidad de experiencia como resultado de la disminución de la latencia para paquetes perdidos. La principal mejora es la disminución del tiempo que la reproducción está parada en los periodos de congelación. La contrapartida es que, para transmisiones segmentadas, se obtiene un mayor retraso inicial. en la reproducción. Esto se considera aceptable, ya que el retraso inicial tiene un menor efecto en la calidad de experiencia que la congelación de la reproducción [? ].

9 9

11 Contents 1 Introduction 17 2 Fundamentals Dynamic Adaptive Streaming over HTTP System Description Segment Length TCP Network Functions CUBIC BBR Transparent Transmission Segmentation Cross-Layer Interactions with DASH Performance Metrics Quality-of-Service Quality-of-Experience GENI Methodology and Metrics Evaluation System Overview DASH Client Quality of Experience Metric Video Dataset Data Analysis Experiments Evaluation Congestion Control Algorithms Analysis Cross-Traffic Scenario Packet Losses Scenario Relayed Connection Performance Analysis Conclusions and Future Work Conclusions Future Work

12 12 Contents Bibliography 79

13 List of Figures 1.1 Internet Traffic Growth Forecast [? ] MPEG-DASH Download Summary [? ] Evaluation System Capturing Architecture MPEG-DASH Download Summary [? ] DASH Architecture [? ] DASH Data Model [? ] CUBIC Stages [? ] TCP Connection Regions [? ] BBR Reaction to BtlBw Changes [? ] Scenario with Suboptimal Characteristics for TCP [? ] Simplified Relay Process [? ] DASH Segment Download Phases [? ] Liu et al. Model Building Blocks [? ] SQI Building Blocks ITU-T P.1203 Building Blocks [? ] GENI Architecture Evaluation System Capturing Architecture Open vswitch Router Configuration Evaluation System Analysis Architecture dashif Deployment Architecture Dataset Creation Process Cross-Traffic Experiment Topology BBR Results with Small Bottleneck CUBIC Results with Small Bottleneck BBR Results with Large Bottleneck CUBIC Results with Large Bottleneck BBR 10 Mbps Latency and Throughput iperf Latency Evolution for the 10 Mbps Scenario Losses Experiment Topology QoS Metrics Evolution over Losses

14 14 List of Figures 4.10 QoE Metrics Evolution over Losses Throughput Evolution with 0.001% Losses Buffer Level Evolution Throughput Evolution with CUBIC and 5% Losses Relay Experiment Topology Relay Implementation Status Relayed vs. Simple Download - QoS Metrics Relayed vs. Simple Download - QoE Metrics Relayed vs. Simple Download - Latency Evolution with 5% Losses

15 List of Tables 2.1 ITU-T Y.1541 QoS Classes [? ] DASH QoE Models Comparison GENI Racks Modes Comparison System Inputs and Outputs HTTP Metrics Temporal Quality Metrics Liu et al. Constants [? ] Dataset Specifications Python Libraries [? ] Nodes Configuration in Cross-Traffic Topology Links Throughput in Cross-Traffic Topology Cross-Traffic Experiments Results Cross-Traffic Significance Tests Results Nodes Configuration in Losses Topology Nodes Configuration in Relay Topology

16 16 List of Tables

Chapter 1 Introduction There has been a rapidly increasing usage of video streaming services over the last years, which was accounting for 73% of all the Internet traffic by 2016 [? ].

17 Chapter 1 Introduction There has been a rapidly increasing usage of video streaming services over the last years, which was accounting for 73% of all the Internet traffic by 2016 [? ]. Additionally, this growth is expected to continue, reaching 82% of the traffic by This means that, considering the Internet traffic growth forecast depicted in Figure??, there will be a threefold growth within these years. The consolidation of video content providers such as YouTube or Netflix is the main reason behind these developments and the technology they use for the video delivery is know as adaptive streaming. Figure 1.1: Internet Traffic Growth Forecast [? ] Adaptive streaming s main characteristic is that it can adapt itself to fluctuating network conditions. This is achieved by storing different representations of the same video, each of them encoded at a different quality, on the video server. Therefore, each representation has a different bitrate, which is higher as the video quality increases. The video clients can select the representation whose bitrate fits the current network conditions best, thus offering the best possible quality at each point in time (Figure??). The representation selection algorithms, know as adaptation logics, are implemented on the client and use different parameters to estimate network state, such as bandwidth, application buffer level or congestion. Owing to the popularity of adaptive streaming, standardization efforts were put on it, resulting in the Dynamic Adaptive Streaming 17

18 18 Chapter 1. Introduction Figure 1.2: MPEG-DASH Download Summary [? ] over HTTP (DASH) standard, developed under MPEG and published as an ISO/IEC norm [? ]. Video streaming applications need a certain amount of resources to be fluently played on the clients, so that the video chunks arrive before they must be played. Adaptive streaming uses the Internet protocols, which however do not guarantee these resources, as they were designed to provide a best-effort service. The quality adaptation based on network conditions minimizes the effects of the resource fluctuations resulting from the best-effort implementation. As depicted in Figure??, DASH is deployed on top of HTTP, which is used to fetch the video from the server. HTTP relies on the lossless, time insensitive TCP transport protocol, which implements the flow and congestion control network functions. This functions govern the data transmission by determining how much data can be injected into the network. Therefore, they control how many resources the application gets, and thus they have an influence on how video chunks are downloaded. Flow control is used by a connection s receiver to tell the sender how much data it is willing to receive and thus it depends on the receiver buffer. However, congestion control is designed to adapt the sender data rate to the network conditions, so that it does not get congested. There are different congestion control implementations, each of them using different strategies that result in unique behaviour and performance. The most accepted implementation nowadays is CUBIC [? ], while the lately released BBR [? ] is said to improve its performance. The sender in a DASH system is the video server and these algorithms control its data rate, so that they control the amount of video sent to the DASH client and how fast it is sent. Therefore, the used congestion control algorithm is important for DASH performance. Besides, a promising technology developed in the recent years is the transparent transmission segmentation [? ], which provides performance enhancements for TCP

19 Chapter 1. Introduction 19 Figure 1.3: Evaluation System Capturing Architecture connections over many links with different characteristics. This is achieved by dividing a single, long TCP connection into two or more different different segments. The segments interconnection node is known as a relay, and it keeps a copy of the transmitted packets. If packets are lost within a segment, they do not need to be fetched from the distant server, because the relay, which is closer to the client, can retransmit them. Therefore, this technology reduces the loss recovery times, while it also has positive affects in other domains. One of the objectives of this thesis is analysing these the congestion control and transparent transmission segmentation techniques and quantifying the performance improvement they can provide to DASH systems. The performance study is done by evaluating performance metrics, such as the ITU-T metrics for general services on the Internet [? ] or the ETSI metrics for DASH [? ]. These metrics quantify what is known as Quality of Service (QoS). Nevertheless, QoS metrics just consider the technical aspects of the performance, and thus they do not quantify users perception of the services, so that content and service providers cannot infer from these metrics how well-accepted their services are. Quality of Experience (QoE) metrics are defined to quantify a user s perception by measuring the degree of satisfaction with a service, considering perceptual factors. QoE metrics provide valuable information, such as how much users are willing to pay for a service or if they want to cancel their subscription. Moreover, this information might be used to compare services in terms of users satisfaction or to develop QoE-aware systems or networks. As DASH provides video streaming in a novel manner, it introduces new factors to be considered for QoE modeling, and thereby previously existing approaches cannot be directly applied to these systems. This resulted in the development of multiple DASH QoE models over the last years [??? ], which are analysed within this thesis in order to choose the most suitable one for our experiments. QoS and QoE metrics allow us to do performance studies of DASH systems. These metrics are used in this thesis to quantify DASH performance, so that the provided results are no only evaluating technical improvement, but also how this improvement is perceived by the user. Providing insights in the relationships between QoS and QoE metrics in DASH systems is other of the objectives of this thesis. A testbed is needed to deploy the evaluation tools, so that experiments can be run and the system performance measured. There are several available networking platforms

20 20 Chapter 1. Introduction nowadays, among which GENI [? ] has been selected. It provides a distributed testbed where virtual machines can be remotely deployed and configured. Besides these computation resources, network resources connecting them can also be configured by selecting a link s throughput, thus providing complete control of the experiment. Finally, an evaluation system has been implemented for the performance experiments and has been deployed on the GENI testbed. This evaluation system follows the architecture depicted in Figure??, consisting of a video server and a DASH client. The client is in charge of collecting QoS and QoE metrics for their later analysis. Synopsis The thesis is structured as follows: Chapter?? provides a theoretical analysis of DASH systems and the TCP protocol, as well as the QoS and QoE metrics to be implemented and a description of the GENI testbed. Chapter?? describes the methodology used to perform the experiments, which includes a complete description of the evaluation system development and behaviour, the configuration of the deployed video dataset and the process used for the analysis. Chapter?? is a description of the experiments with a detailed explanation of the obtained results and their analysis. Finally, in Chapter?? concludes this thesis and highlights the future work.

21 Chapter 2 Fundamentals Adaptive streaming solutions provide video streaming over HTTP, and thus on top of the reliable, time insensitive TCP protocol. This seems suboptimal from a theoretical point of view and cross-layer effects with TCP should be analysed to understand how DASH performance is affected in practice. QoS and QoE metrics are used to do so, measuring performance from a objective and subjective point of view, respectively. 2.1 Dynamic Adaptive Streaming over HTTP Adaptive streaming is the multimedia transmission technology used by major content providers nowadays. It provides video streaming on the Internet by adapting its data rate to the fluctuating network conditions, thus providing the best possible quality for the network conditions at each point in time System Description Adaptive streaming solutions popularization has resulted in the Dynamic Adaptive Streaming over HTTP (DASH) standard, developed under MPEG and published as a ISO/IEC norm [? ]. The most important feature of DASH systems is to adapt to fluctuating network conditions by changing the quality of the transmitted video (Figure??). This is achieved by storing multiple representations of the same video on the server, each with a different quality and consequently a different bitrate. The video quality selection algorithm is implemented on the client and is known as adaptation logic, which is not standardized and might consider different parameters, depending on the implementation. Two elements are used to provide the service: the media presentation description (MPD) and the segment. The MPD is a XML file with enough information to access all the segments for a given video. This information consists of segment s bit rate and a URL to locate them, although other parameters are also provided. The bit rate allows the clients to select the most suitable segment for current network conditions, as each 21

22 22 Chapter 2.1: Dynamic Adaptive Streaming over HTTP Figure 2.1: MPEG-DASH Download Summary [? ] Figure 2.2: DASH Architecture [? ] segment has a different bit rate depending on the quality it is encoded at. The segments are the media delivery unit, thus they contain the video and audio binary representation. The system architecture is depicted in Figure??, where the client first fetches the MPD file and afterwards downloads the media segments announced on it. The HTTP protocol is used for this purpose, while the HTTP 1.1 version is assumed in the standard. Although HTTP was not designed for multimedia delivery, being mainly implemented on top of the time insensitive TCP protocol, it provides other advantages such as NAT and firewall traversal and deployability on Content Delivery Network (CDN), which explain the success of this technology. Additionally, the data model used by DASH systems follows the structure depicted in Figure??, which has been designed to enable the quality adaptation behaviour. First of all, the complete video is divided into periods of time, each of which consists of

23 Chapter 2.1: Dynamic Adaptive Streaming over HTTP 23 Figure 2.3: DASH Data Model [? ] several adaptation sets. The adaptation sets within a period are interchangeable versions of the same media component, such as video or audio, and their objective is to load balance the number of requests across multiple media sources. Moreover, there are multiple representations within every adaptation set, which are encoded versions of the media that provide the same content at different qualities. This allows clients to switch across representations to choose the quality that best fits network conditions. Finally, representations are divided into segments, which are the access and delivery unit. Multiple profiles are defined in the ISO/IEC [? ] in order to cover the different needs of video streaming services, which can be summarized in live and ondemand services. The major difference lies in the instant character of live content, as it is gathered and encoded on the go. This also resulted in the definition of two types of MPD files, dynamic and static. While the static remains unchanged during the whole session, dynamic MPDs must be reloaded by the clients as URLs of the newly generated segments are constantly added. Nevertheless, dynamic MPDs are not commonly used because of the overhead introduced by the multiple MPD requests. Instead, static MPDs with preconfigured URLs are used in live services. Regarding the adaptation logic, it is not standardized, which makes it an active field of research due to the lack of a prevalent, well-accepted set of metrics to be considered. These algorithms usually have a conservative behaviour at the beginning of the download, as low qualities are selected to avoid the congestion of the link while its characteristics are unknown, as well as to keep initial delay as low as possible. Afterwards, there are two

24 24 Chapter 2.2: TCP different approaches, being either more optimistic switching to higher video qualities or more conservative in order to avoid quality oscillations. Comparison of different approaches are provided in [?? ], which denotes that the performance of the DASH system depends on the selected adaptation logic Segment Length While DASH architecture, data model and profiles are relatively generic, some technical details must be analysed, such as the segment length 1 or the adaptation logic, as they influence the overall performance of the system. Regarding the optimal segment length, there is a drawback between adaptability (short segments) and encoding efficiency (long segments). In order to perform seamless switching, there should be an I-frame at the beginning of the segment, so that references to previous frames are not needed. This is usually achieved by restricting the group-of-picture (GOP) to the segment size. Therefore, the shorter the segment is, the more I-frames there are, which reduces the encoding efficiency, as I-frames have more bits than P-frames. Additionally, long segments are not desired for environments with high throughput changes, such as wireless connections or overloaded connections, as the client would not be able to adapt to throughput changes as quickly as they are produced. In these cases short segments are preferred, but they also introduce an overhead due to the increase in the number of requests. Depending on the system implementation, it might lead to opening a new TCP connection for each request, aggravating this effect. However, this can be mitigated by allowing the use of HTTP1.1 persistent connections [? ] at the server, which allows the client to send several request within the same TCP connection, at the expense of increasing the server complexity. The effects of segment length on video stalling and representation quality performance, which are usual Quality-of-Experience metrics in DASH systems (see Section??), are analysed in [? ], confirming the previously mentioned results. Particularly, better performance in representation quality is achieved for short segments (2 seconds), as a result of its adaptability to network conditions, which makes the client efficiently exploit the available throughput. On the contrary, less stalling events occur when segments are large (10 seconds), as more video is available at the player buffer at once. 2.2 TCP Video streaming services require time guarantees, as the video chunks must be available at the client in advance in order to avoid stalling events. On the contrary, full-reliability is not strictly required, mainly for live services, as the retransmitted chunks might arrive at the client when they are no longer necessary and decoders can conceal losses. DASH systems deliver video over HTTP, which is mostly implemented on top of TCP protocol. Nevertheless, TCP is full-reliable and not time sensitive, which make it seem suboptimal for video streaming from a theoretical perspective. Therefore, TCP should be studied 1

25 Chapter 2.2: TCP 25 in order to understand how these drawbacks affect the overall performance of DASH systems Network Functions Flow control is used by the receiver to tell the sender how much data it can accept. This information is included within the ACK header, where the receiver window (rwnd) denotes how much octets the receiver is willing to accept. Congestion control is designed to avoid overloading the network with more data than it can deliver. For this purpose the sender keeps track of how many bytes can be outstanding at any time with the congestion window (cwnd) parameter. As the bottleneck data rate is initially unknown, the algorithm starts with the slow start phase, whose purpose is to avoid congesting the network with a large burst of data at the beginning of the connection. The bottleneck data rate is determined with periodic increases of the cwnd until losses appear. Once losses arise, the algorithm reduces the sending rate and enters the congestion avoidance phase, were the cwnd is decreased to reach a stable state without network congestion. Error control provides reliability using Automatic Repeat-reQuest (ARQ) for the retransmission of erroneous data. Thereby, TCP keeps retransmitting the data upon timer expiration, unless a positive acknowledgement (ACK) arrives, confirming the data reception. As waiting for timer expiry is a slow reaction to losses, fast retransmit [? ] was developed, so that packet losses are detected by the sender upon duplicate ACKs arrival. Additionally, the fast recovery algorithm governs the data transmission after a loss and artificially inflates the cwnd by the number of segments that have traversed the network, which is equal to the number of received duplicate ACKs. Therefore, losses are overcome faster with these two algorithms. However, although this process has been speeded up, TCP is still not a time sensitive protocol as it provides full reliability, which means packets cannot be skipped when they expired. Network congestion might have an influence on latency, as it is incremented when packets are buffered when the link is saturated, and application throughput, as congestion avoidance algorithms keep it below the bottleneck throughput when losses arise. Therefore, the effects of the these algorithms on DASH systems should be studied due to their influence on the system performance. Multiple implementations have been released, such as Reno, Tahoe [? ], BIC [? ] or CUBIC [? ], where CUBIC is the most advanced option because of its RTT-independency and smooth steady state. Additionally, Google s BBR [? ] algorithm has been released lately, being said to provide better performance than CUBIC CUBIC CUBIC [? ] follows a logarithmical approach to increase the cwnd, an idea that was firstly introduced by the BIC algorithm. However, CUBIC improves its predecessor being faster when the cwnd is far from the maximum throughput and less aggressive when it is close to it, thus minimizing the possibility of packet losses. This behaviour

26 26 Chapter 2.2: TCP Figure 2.4: CUBIC Stages [? ] is achieved by applying the cubic formula depicted in equation??, where W (t) is the size of the congestion window at the time t and W max is the window size at the last window reduction. The other parameters in the model are β, which is the factor used to decrement the window after a loss (equation??), C, which is a scaling factor (equation??), t the time elapsed from the last window reduction and K determines how fast W max is reached (equation??). W max is computed as the value of W at which the last loss event occurred. W (t) = C (t K) 3 + W max (2.1) W (t loss + 1) = β W max (2.2) K = 3 β Wmax C C = 3 β 2 β W tcp = W max β β 1 + β t RT T (2.3) (2.4) (2.5) The complete evolution of the cwnd is depicted in Figure??, where two different phases are clearly differentiated: the steady state behaviour phase, which is the behaviour exposed so far, and the max probing phase. Once the cwnd grows past the maximum throughput (W max ) without losses, it means either the network conditions might have changed or the algorithm failed to probe the actual maximum in a previous iteration, and thus it searches for a new maximum within max probing. Additionally, unlike previous algorithm that update the window upon the arrival of an ACK, CUBIC s congestion window is real-time update, as depicted in equation??. Although this is a major advantage in the case of networks with high latencies, the cwnd is however more slowly incremented in CUBIC than in other algorithms when the RTT is low. Therefore, the TCP mode is included in order to overcome this issue, such that the TCP window adjustment is emulated. If the window defined by CUBIC is smaller than the one defined by TCP (equation??), the window size is set to W tcp, thus increasing the growth rate for short RTT networks.

Chapter 2.2: TCP 27 Figure 2.5: TCP Connection Regions [? ] 2.2.3 BBR BBR [?

27 Chapter 2.2: TCP 27 Figure 2.5: TCP Connection Regions [? ] BBR BBR [? ] is based on the fact that a TCP connection can be seen as a single link characterized by the path s round-trip time (RT prop) and the throughput of the bottleneck (BtlBw), which is the link with the lowest data rate in the whole path. These two variables constraint the performance of the connection, being both represented in Figure?? with a blue line for RT prop and a green line for BtlBw. When the application does not generate enough data to fill the link, the connection is constrained by RT prop; otherwise, BtlBw dominates. The best performance is achieved when the bottleneck data arrival rate is BtlBw, so that it runs at 100 percent utilization, and the in-flight data is at the same time BDP = BtlBw RT prop, which prevents bottleneck starvation without overloading the link. However, loss-based congestion control mechanisms such as CUBIC, operate at the right side of the bandwidth-limited region (Figure??). Therefore, although lossbased mechanisms operate at the maximum possible throughput, the packet latency is increased because of the appearance of buffer queues in the bottleneck, which is a result of receiving more data than it can deliver. Moreover, packet losses might appear due to buffer overflow, which is the mechanism used by these algorithms to detect the congestion. Nevertheless, BBR is the first congestion control algorithm that works around the BDP operating point, thus improving TCP performance [? ]. When BtlBw and BDP

28 28 Chapter 2.2: TCP conditions are simultaneously met, they guarantee that no queues are formed in the bottleneck and the link is filled, which makes the system work at its maximum throughput, minimizing the RTT at the same time. RT prop and BtlBw must be continuously estimated in order to achieve the two aforementioned conditions, as they vary over the life of a connection. An unbiased estimator for RT prop is given in equation??, where η t is the noise introduced by buffers and W R a window that typically goes from tens of seconds to minutes. Additionally, the data delivery rate, which is data t, is used to estimate BtlBw, being data known and t bigger or equal than the true arrival interval. Therefore, as the data delivery rate is upper-bounded by the bottleneck capacity, equation?? is used as an unbiased estimator for BtlBw, where the observation window W B is typically six to ten RTTs. As these parameters might change during a connection s life, BBR uses a 10 seconds filter for RT prop and a 10 RTT filter for BtlBw RT prop = RT prop + min(η t ) = min(rt T t ) t ɛ [T W R, T ] (2.6) BtlBw = max(deliveryrate t ) t ɛ [T W B, T ] (2.7) BBR uses two variables to control the in-flight data, which are pacing gain and cwnd gain. The pacing gain variable controls the sender s data rate, so that it is close to BtlBw. The cwnd gain variable controls the evolution of the cwnd, which is set to the value in equation??. The value of quanta is set to ensure full link utilization. More information about these two variables can be found on BBR s Internet-Draft [? ]. cwnd = cwnd gain BDP + quanta (2.8) The algorithm starts with a binary search to exponentially discover BtlBw within the startup phase, where the sending rate is doubled each round. In order to achieve 2 a smooth behaviour, pacing gain and cwnd gain are set to ln(2) upon entry into this phase, which is the minimum value that allows to double the sending rate each round. Later, while BtlBw keeps growing, pacing gain and cwnd gain smoothly grow at the same time. This process discovers the BtlBw at expenses of creating a queue in the bottleneck. When the algorithms detects the creation of a queue, which is done upon a sustained RTT increase, it changes to the drain phase. The queue is drained in one round by setting the pacing gain variable to the inverse of the value used in the startup phase, so that the operation point is set around BDP. Once the queue is drained, the algorithm runs the ProbeBw phase. In this phase the client data rate is BtlBw most of the time and the in-flight data is maintained at BDP. However, over a period of 8 RTTs, BBR runs the gain cycling routine, where the pacing gain variable is set to 1.25 in order to detect BtlBw modifications. In the case BtlBw has not changed, the RTT is increased as a result of the queue created at the bottleneck, which is removed by setting pacing gain to the inverse of the previously used value in the next RT prop. On the contrary, if BtlBw has changed, the delivery rate increases and BtlBw is immediately updated, resulting in an convergence to the

Chapter 2.2: TCP 29 Figure 2.6: BBR Reaction to BtlBw Changes [? ] Figure 2.7: Scenario with Suboptimal Characteristics for TCP [? ] new bottleneck rate in 3 gain cycles.

29 Chapter 2.2: TCP 29 Figure 2.6: BBR Reaction to BtlBw Changes [? ] Figure 2.7: Scenario with Suboptimal Characteristics for TCP [? ] new bottleneck rate in 3 gain cycles. The cwnd gain value is always 2 within this phase. This BtlBw increase detection is depicted in the left graph in Figure??. A sustained RTT increase tells that the BtlBw has decreased and thus a queue has been created due to the in-flight data excess. The cwnd is hence reduced in order to lower the amount of in-flight data, which reduces the RTT and finally the new operation point is reached. This behaviour is depicted in the right graph in Figure?? Transparent Transmission Segmentation TCP connections consider the transmission path as a single virtual link, where the characteristics of the individual physical links are aggregated. However, there are scenarios, such as the one depicted in Figure??, where this single link consideration might result in poor resource utilization. In a DASH download through this path, packets would arrive at the node with high probability, but they would be likely lost in the second segment of the connection. Moreover, as the retransmissions go over the complete path again, they experience the high delay of the first segment of the path. This would not occur if the packet was retransmitted from the node, which is one of the gains of transparent transmission segmentation [? ]. A SDN solution for this scenario is presented in [? ], where the node runs a relay software. Besides, there is a controller, which creates a new flow and configures the relay every time it detects a connection that should be relayed. In the configuration

30 Chapter 2.2: TCP Figure 2.8: Simplified Relay Process [? ] process, the controller configures ports and buffer sizes and also tells the relay to open a connection to the destination server.

30 30 Chapter 2.2: TCP Figure 2.8: Simplified Relay Process [? ] process, the controller configures ports and buffer sizes and also tells the relay to open a connection to the destination server. Once the connection is open, the client can complete the handshake as usual, but this communication is already relayed. However, as depicted in Figure??, although the relayed messages are rewritten by the node and the relay, the two ends of the connection are unaware of that, since they see the packets as they were sent. The relay solution has the drawback that the connection opening process lasts longer than the traditional approach, because of the extra messages interchanges with the controller. On the contrary, performance enhancements are also expected as a result of the relay segmenting the TCP connections, which reduces the latency for packet retransmission Cross-Layer Interactions with DASH The interactions between TCP CUBIC and DASH are studied in [? ], analysing the effects of DASH on-off traffic pattern on TCP performance, with a focus on packet losses. Unlike applications with a steady data stream, which use most of the available throughput, on-off traffic suffers a throughput drop proportional to 2xRTT. The DASH segment download is divided into the three phases depicted in Figure?? in order to study this poor efficiency. The first phase is initial burst, where the sender quickly fills its cwnd, injecting to the network a relatively high amount of data in a short period of time. When the sender receives ACKs and has more data to transmit, the download is in the ACK clocking phase. Finally, the download is in the trailing ACKs phase when the sender waits for outstanding ACKs at the end of the download. It should be noted that several segments are downloaded within the same TCP connection, making use of HTTP persistent connections [? ]. Therefore, the sender does not need to rerun the slow start phase any time a new segment is downloaded. Moreover, the three phases in Figure?? are independent of congestion control phases.

31 Chapter 2.2: TCP 31 Figure 2.9: DASH Segment Download Phases [? ] The most desirable phase is ACK clocking because all congestion control mechanisms are enabled, as a result of the network pipeline being full. Therefore, fast retransmit and fast recovery allow to overcome losses quickly and the cwnd is as it maximum. Steady streams keep the transmission at this phase, making the other phases neglectable. However, stop-and-start applications such as DASH spend more time in initial burst and trailing ACKs, were packet losses are more harmful. This explains the efficiency differences observed in [? ]. The sudden outstanding data increase in initial burst might provoke losses at the bottleneck, which triggers congestion avoidance or even slow start, depending on the number of lost packets. Nevertheless, trailing ACKs is even less desirable, as idle time is spent waiting for ACKs and losses are more harmful. Firstly, there is no fast recovery because the sender does not have new data to transmit, and fast retransmit only works if the loss occurs before the last 3 ACKs. Therefore, losses are not detected quickly. This lack of new data to transmit also affects the objective cwnd in the presence of losses, because W max is set to the outstanding bytes after a loss in CUBIC [? ]. As a result, the algorithms targets a throughput which is lower than the actual bottleneck data rate, then producing a slower failure recovery. Finally, in the worst case, losses might be detected upon timer expiration, triggering slow start. Esteban et al. [? ] suggests pacing, which spreads the transmission along the complete RTT, as a solution to the on-off pattern. Reducing buffer overflows at the bottleneck is the purpose of this solution. However, although losses are reduced in the initial burst phase, they are increased in trailing ACK, where they are more harmful. Besides, less than 5% of throughput improvement is achieved with pacing. Therefore, it is discarded as a suitable solution to this problem. Nevertheless, since Google s BBR does not use losses as congestion detection mechanism, it might effectively reduce buffer

32 32 Chapter 2.3: Performance Metrics overflow in all the phases. This affirmation holds as long as the buffers at the bottleneck are large enough to endure BBR s throughput test cycles (see Section??). A more recent study of cross-layer effects on DASH systems is provided in [? ], where results are presented from the QoE point of view. They proved that CUBIC outperforms other congestion control algorithms when available throughput is of the order of megabits per second. Additionally, QoE improvements are achieved with larger segments in combination with CUBIC. In addition, they show how latency and packet losses have a major influence on the QoE than available throughput. To summarize, latency and packet loss have been proven to be essential for both TCP efficiency and DASH QoE. Besides, the CUBIC algorithm has provided the best performance so far among the most common and accepted congestion control algorithms, but the latest Google s BBR algorithm promises to enhance both parameters. Therefore, provided that cross-layer interactions between BBR and DASH have not been studied yet, how much performance can be gained with it is an interesting research task. 2.3 Performance Metrics Quality-of-Service (QoS) metrics are used to measure performance from a technical and objective point of view. However, these objective metrics usually fail to compute users satisfaction, as they do not consider subjective factors. Therefore, in order to have a complete vision of a system s performance, subjective factors should be also measured by applying Quality-of-Experience (QoE) models Quality-of-Service There has been a trend over the last years towards the delivery of any type of application via the Internet, which have different performance requirements for which the Internet was not designed. Therefore, QoS mechanisms for traffic priorization and service differentiation were developed, so that end-to-end resources could be guaranteed. Although these mechanisms ensure the QoS, they can only be implemented by service providers, and thus they cannot be applied to DASH solutions when offering video streaming overthe-top. However, the metrics used by these mechanisms to define resource requirements are useful to characterize DASH systems, and thus to develop performance analysis. Traditional metrics are delay, jitter, throughput and packet error and loss rate [? ], which are standardized by the ITU-T Y.1540 [? ] and defined as follows: IP packet Transfer Delay (IPTD). Time elapsed between the first bit transmission by the source and the last bit reception at the destination. IP packet Delay Variation (IPDV). Also known as jitter, it is the difference between the IPTD of a given group of IP packets. IP packet Error Ratio (IPER). Ratio of total IP packets with an error to the total transmitted IP packets.

33 Chapter 2.3: Performance Metrics 33 Class IPTD IPDV IPLR IPER IPRR ms 50 ms 1x10 3 1x ms 50 ms 1x10 3 1x ms U 1x10 3 1x ms U 1x10 3 1x s U 1x10 3 1x U U U U ms 50 ms 1x10 5 1x10 6 1x ms 50 ms 1x10 5 1x10 6 1x10 6 Table 2.1: ITU-T Y.1541 QoS Classes [? ] IP packet Loss Ratio (IPLR). Ratio of lost IP packets to the total transmitted IP packets. IP-layer section capacity. Traditionally known as throughput, it is number of transferred bits within a given time interval. The ITU-T Y.1541 [? ] uses some of these metrics to define performance objectives for IP-based services depending on their resource needs, where video streaming services are included within the class 4 [? ] depicted in Table??. The values in Table?? can be used as a reference when analysing video streaming services performance. Besides, the ETSI TS [? ] defines DASH-specific QoS metrics related to HTTP transactions, quality switching events, average throughput, initial delay, buffer level, playlist and MPD information. These parameters measure DASH technical aspects that are related to the final user experience and for that reason they are called Qualityof-Experience (QoE), so that they can be used to develop QoE models Quality-of-Experience QoS metrics just measure the performance from a technical point of view, not quantifying users perception of the service, so that how well-accepted the service is cannot be infer from these metrics. Thereby, another set of metrics are implemented to quantify the degree of satisfaction with services by considering perceptual factors, which are the QoE models. These models compute the Mean Opinion Score (MOS) [? ], which is the metric typically used to measure subjective performance. The model considers the satisfaction-related QoS metrics, which are weighted depending on their influence on the users perception, so that the MOS is predicted as accurately as possible. Quantization, temporal and spatial quality are typically considered to model DASH QoE. The quantization quality measures the level of degradation due to quantization in the video encoding process. As DASH uses lossless protocols, the client receives a complete copy of the video, and thus the encoding process is the only source of impairments. The quantization quality is computed by the Video Quality Assessment (VQA) methods.

34 34 Chapter 2.3: Performance Metrics Figure 2.10: Liu et al. Model Building Blocks [? ] On the contrary, the temporal quality is related to the effect of initial delay and stalling on the quality perceived by the users. As the adaptation logics are implemented on the client, DASH mechanisms detect fluctuations in the network once they have already occurred, and consequently modify the requested bitrate. Thus, in the presence of throughput decreases or delays, the playout buffer might fill more slowly or even deplete, which might result in large initial delay or stalling events. This implies that initial delay and stalling must be considered when modeling QoE [? ]. Finally, DASH introduces a new perceptual dimension, as it changes the quality of the transmitted video [? ]. The users recognize the changes of quality, being pleased when the quality is increased and dissatisfied when it is decreased. Therefore, quality switches are the last factor to consider, which is known as spatial quality, as it computes the quality among contiguous frames. Quality-of-Experience Metrics Comparison The joint effect of aforementioned QoE factors were not considered until the appearance of DASH, as the video quality switch factor is directly related to the new video streaming approach. Therefore, the development of new QoE models for DASH has become an active field in the last few years. Examples of such models are the Streaming QoE Index (SQI) [? ] and the one proposed by Liu, Dey, Ulupinar, Luby & Mao, 2015 [? ], as well as the ITU-T P.1203 standard [? ]. In Liu et al. [? ] subjective experiments are performed to detect the factors that should be considered in the different modules of the metric. The results of these experiments confirm the aforementioned quality aspects (quantization, temporal and spatial), although the authors went even further and designed specific metrics related to them, resulting in the model depicted in Figure??. In addition, different impairment functions based on these metrics are provided, so that a final QoE metric can be computed.

35 Chapter 2.3: Performance Metrics 35 Figure 2.11: SQI Building Blocks First of all, the VQM metric [? ] is computed in the Quantization Module in order to measure the objective quality of the video. Moreover, this metric is also used to measure the spatial quality of the video, together with the number of quality switches, by computing the average quality level and the quality switches magnitude. Finally, the Temporal Module considers the effects of initial delay and stalls. The video motion is also considered within this module, as they concluded that it influences the perceived quality in the presence of stalling events enough to be considered it in the impairment functions. Once the quantization, temporal and spatial qualities are computed, the system returns the predicted MOS. The results of the metrics have been validated performing another subjective experiment, getting a correlation value of 0.91 with respect to the real results. On the contrary, just temporal and quantization factors are considered within the SQI metric, as shown in Figure??. In the Quantization Module any VQA metric can be used, although only a subset of them is validated in [? ], obtaining the best result when the SSIM and the SSIMplus are used. Moreover, the VQA results are also needed in the Temporal Module, as this module takes them as a reference of the current quality, which is decreased in the presence of stalls and kept unchanged otherwise. Stalling events are considered separately and memory effect is modeled. Therefore, the user dissatisfaction increases when the video stalls and it fades out from the moment the video starts to play again. Additionally, the initial delay is considered as a stall at the beginning of the playback, and thus it is not directly modeled. In the end, the effects of all the stalling events are summed up as if they were independent. Finally, SQI s authors performed subjective experiments in order to validate the results from the application of different VQA metrics, getting 0.9 correlation and 6 mean absolute error (on a [0,100] MOS scale) for the best metrics. However, the functions they used to get the MOS from the model are not provided, so that these results are not reproducible without the presence of subjective experiments as a reference. The ITU-T P.1203 standard not only considers video quality, but audio quality as well. As depicted in Figure??, there are two estimation modules, for audio [? ] and video [? ], as well as the quality integration module [? ], which considers stalls and quality switches.

36 36 Chapter 2.3: Performance Metrics Figure 2.12: ITU-T P.1203 Building Blocks [? ] The video quality estimation module computes the effects of spatial up-scaling and video compression, as well as video smoothness with regards to its frame rate. On the contrary, the audio quality estimation module reflects the impairments due to audio coding. Additionally, both metrics support 4 modes of operation, which are selected depending on the available input information and have increasing levels of complexity. The input information might be meta-data, frame headers and partial or complete access to the media stream, and thus the more information available, the more precise the results are. The audio and video qualities, together with the stalling and quality switches information, are used to compute the final MOS within the Quality integration module. Particularly, this module considers the number, the total duration and the frequency of the stalling events, as well as the quality change rate and the direction of such changes. At the end, the predicted MOS is given with a correlation of and a root-mean squared error of 0.33 (on a [1,5] MOS scale) for the most complex mode, which is the one that provides the best results. A summary of the three metrics is depicted in the Table??, where one can see that, even though all the metrics have a similar correlation, the Liu et al. metric and ITU-T P.1203 have a more complex design, considering all the QoE factors previously mentioned. Moreover, the functions to compute the final MOS are not provided for the SQI. Thus, the SQI metric is discarded for these two reasons. Regarding the two remaining metrics, the ITU-T P.1203 has been also discarded because its audio and video metrics must be implemented, while the VQM software is

37 Chapter 2.4: GENI 37 Liu et al., 2015 SQI ITU-T P.1203 Error Correlation Quality Modules Temporal, Spatial, Temporal, Temporal, Spatial, Quantization Quantization Quantization VQA VQM Anyone Spatial up-scaling, Jerkiness, Video compression artefacts Initial Delay Modeled As stalling As stalling Stalling Video motion, Memory behaviour, Stall number, Stall number, Stall duration, Stall duration, Stall duration, Stall number Stall frequency Stall frequency Level Variation Results Average level, Number of switches, Average switch magnitude MOS - MOS (functions not provided) Table 2.2: DASH QoE Models Comparison Change rate, Change direction MOS freely available 2. Therefore, given that both have a similar correlation and implement all the QoE factors in a similar manner, the ease of implementation made Liu et al. be the metric to be used for the remainder of this thesis. 2.4 GENI Global Environment for Network Innovations (GENI) [? ] is a virtual laboratory intended for networking and distributed systems experiments, which provides a large-scale infrastructure of programmable network and compute resources. Moreover, the concept of slices is implemented for experiments isolation, so that each slice is composed of the set of resources assigned to an experiment. The GENI network architecture is depicted in Figure??, which consists of racks, SDN switches and WiMAX Base Stations. The SDN switches, together with the links that interconnect them, form the underlying GENI data plane. Therefore, when users reserve networks resources, they are actually configuring an overlay network interconnecting multiple racks on top of this architecture. Different parameters are configurable, such as link throughput and link virtualization level (2 or 3). Additionally, the control plane is a usual Internet connection intended for user configuration access to the racks. Although there are exceptions, it does not provide high throughput as it is just intended for configuration traffic, which typically has low bit rates. The experiment isolation is achieved by the use of virtualization for both machines and link reservation. There are several virtualization modes with different technical specifications, so that GENI can provide multiple performance and isolation levels, making it suitable for any type of experiment regardless its requirements. Regarding link reser

38 Chapter 2.4: GENI Figure 2.13: GENI Architecture 3 vation, VLANs and GRE Tunnels are respectively used to configure level-2 and level-3 connections between machines.

38 38 Chapter 2.4: GENI Figure 2.13: GENI Architecture 3 vation, VLANs and GRE Tunnels are respectively used to configure level-2 and level-3 connections between machines. The compute resources virtualization depends on the deployed rack implementation. The most common ones are InstaGENI 4, which is a small ProtoGENI [? ] cluster, Exo- GENI [? ] and the recent OpenGENI 5. As derived from Table??. ExoGENI racks have the best specification among the three of them, and thus they are suitable for experiments with high performance requirements. On the contrary, OpenGENI racks, which are affordable racks intended to ease the deployment of new GENI sites at an affordable price, have the lowest technical characteristics and are still under development. Finally, InstaGENI racks are mid-range cost racks deployed at a large number of locations. Cisco GENI and Ciena GENI are also available, although their deployment is marginal and thus they are not further considered. Unlike network resources, not only can virtual compute resources be reserved, but also bare-metal machines, which guarantee a high performance than virtual ones. Furthermore, in order to allow wireless and mobility experiments, in some specific locations WiMax Base Stations are deployed. As with other resources, access to WiMAX is also virtualized via WiMAX channels, so that channels are mapped to VLANs when assigned to a slice. GENI racks run the Aggregate Manager (AM) function, which controls the useraggregate interaction through the AM API. This API supports four operations: list available resources, request resources for a slice, check resource status in a slice and delete resources from a slice. The interaction with the AM API is performed via Resource Specifications (RSpecs), which are XML files describing resources. Thereby, the users firstly request a Advertisement RSpec announcing the available resources in the aggregate and then they form a Request RSpec with this information and sends it to the AM

39 Chapter 2.4: GENI 39 Resources ExoGENI InstaGENI OpenGENI Control: 2x146GB Disk, Control: 4 TB Disk, 12GB RAM, dual-socket single-socket quad-core CPU, Control: 300 GB 4-core 2.66Ghz CPU 12 GB RAM Disk, 32 GB RAM Experiment: 1x146GB Disk Experiment: 1TB Disk, Experiment: 300 GB Compute hard drive +1x500+GB dual-socket six-core CPU, Disk, 32 GB RAM secondary drive, 48G RAM, 48GB RAM VM: Specifications dual-socket 6-core 2.66Ghz CPU VM: 1 TB Disk, not available VM: Specifications 1 GB RAM not available Network Control: 1G downlink/10g uplink ports Data: 10G client/40g uplink ports Control: 24 10/100/100 Mb ports, 2 1 Gb ports Data: 48 1 Gb ports, 4 10 Gb ports Table 2.3: GENI Racks Modes Comparison 6 Control: 48 GbE ports, optional 4 10 Gb ports Data: 48 dual-speed 1/10 Gb ports, 4 40 Gb ports Finally, the users receive a Manifest Rspec with the information of the resources that have finally be reserved for them. To summarize, GENI provides a globally distributed platform for networking experiments, which allows the configuration of every element in the experiment. Moreover, its performance levels make it suitable for either high or low performance-demanding experiments. Hence, this networking platform is used to perform the experiments within this thesis, so that DASH performance can be analysed in a distributed, real environment instead of using a virtual testbed.

40 40 Chapter 2.4: GENI

41 Chapter 3 Methodology and Metrics A complete evaluation system has been developed in order to analyse DASH performance. The system is composed of an on-line metrics collection tool, which is a DASH client, deployed on GENI and an off-line QoE tool and data analysis system. 3.1 Evaluation System Overview Evaluating DASH performance on a global scale is the objective of this thesis, and thus a testbed providing multiple deployment locations is needed. For that reason GENI is used, which provides the generic experiment topology depicted in Figure??. It comprises a video server, a DASH client collecting performance metrics and a network interconnecting them. One of the advantages of DASH is that no dedicated video server is needed, so that any web server can be used for video deployment. Therefore, the Apache 1 web server has been selected for the experiments, which is currently the most common server for content deployment on the Internet 2. Besides, the Cross-Origin Resource Sharing 3 (CORS) mechanism must me enabled when serving DASH on top of Apache, as DASH clients request the MPD file and the video segments using XMLHTTPRequests following the Same Origin Policy. Finally, the server is installed on a GENI VM and the video dataset is made available by just placing it in a directory shared by Apache. The DASH Industry Forum Reference Player 4 (dashif), which is the selected client (Section??), is deployed on another GENI rack and it collects performance metrics that are stored on a log for later analysis. It is a JavaScript client, which must be made available on a HTTP server to be accessed by a client. In this case SimpleHTTPServer 5 is used for the client deployment (see Section??)

42 42 Chapter 3.1: Evaluation System Overview Figure 3.1: Evaluation System Capturing Architecture Figure 3.2: Open vswitch Router Configuration The overlay network is configured ion top of the physical GENI network in order to connect the server and the client. This network has different topologies depending on the experiment requirements (Chapter??) and every node is a different VM configured as router with Open vswitch 6 (OvS). OvS is a virtual switch that bridges connections between virtual machines and other ports. It allows us to configure the Linux virtual machines installed on GENI as routers configuring the architecture depicted in Figure??. Each network interface is attached to a virtual bridge created with OvS and all these bridges are connected to the Linux routing system, so that the traffic coming from other GENI aggregates can be routed. The standalone mode is enabled, so that OvS flow configurations are not needed and the incoming traffic is forwarded according to the routing table entries. The final element in the capturing architecture is the Wireshark 7 protocol analyzer. It is installed in the server and the client machines, so that the traffic can be analysed in both directions of the connection. Wireshark allows low level protocol analysis and metrics collection that cannot be measured in the application layer, such as TCP retransmission. Besides, it provides a graph tool that is useful for analysing CUBIC and BBR behaviour. Finally, InstaGENI racks haven been selected because none of the elements of the system has high resource requirements, since just a single element runs on the machines at the same time. Additionally, the machines on GENI racks are configured with the Linux kernel 4.9, which is the first Linux distribution including BBR congestion control algorithm by default, and the Ubuntu LTS 64-bit disk image. The latest Ubuntu

43 Chapter 3.1: Evaluation System Overview 43 Figure 3.3: Evaluation System Analysis Architecture LTS version is not suitable for the experiments, as Linux kernel 4.9 cannot be installed due to errors. Besides the DASH system described so far, a data collection and processing system is implemented as depicted in Figure??. It consists of three components: DASH client, which collects video-stream-related metrics; Quality-of-Experience, which takes the video dataset and the previous metrics to compute the QoE metric off-line; and data analysis, where the results are analysed. All the inputs and outputs of the evaluation system are depicted in Table??. The DASH client (Section??), which collects metrics related to the video stream (I.01), which is the only input of the system. The outputs of this section are the HTTP metrics (O.03), which are composed of all the QoS metrics collected by the client; the temporal metrics (O.02), which are related to initial delay and stalling events; and the buffer level (O.01). The Quality-of-Experience component is introduced in detail in Section??. The quantization quality is computed in the quantization module, resulting in the video quality metric (VQM) output (O.12), which is computed comparing the videos in the dataset (I.11) with the raw video (I.12). This output, together with the quality switches information (O.13) is used to compute the spatial quality in the spatial module. the temporal quality is derived in the temporal module, which needs the motion information extracted from the dataset. Finally, the predicted MOS (O.11) is given by the user experience module. The collected data is analysed at the end of the process. First of all, outliers are detected and removed in the data mangling module, so that the statistical significance (O.21) can be computed in the significance test module. Besides, a graphical analysis is also possible thanks to the graphs (O.22) given by the display module.

44 44 Chapter 3.2: DASH Client 3.2 DASH Client Name Description I.01 DASH video stream I.11 Video dataset I.12 Raw video O.01 Buffer level log O.02 Temporal metrics log O.03 HTTP metrics log O.11 Estimated MOS O.12 Video Quality Metric O.13 Quality switches O.21 Statistical significance O.22 Metrics Graph Table 3.1: System Inputs and Outputs The selected DASH client for the experiments is dashif, which is the reference player proposed by the DASH Industry Forum 8. This is a JavaScript player that implementes the MPEG-DASH standard [? ] and works on any HTML5 browser supporting MediaSource Extensions and Encrypted Media Extensions. Being written in JavaScript, dashif does not need the installation of any libraries to be executed. This is an advantage in an environment such as GENI, where the disk image installed on the VM must be configured every time the resources are reserved. Additionally, it already implements a performance metric collection system, which makes it ideal for performance studies. The performance metrics, together with other information, can be accessed through an API 9. The dashif implementes an adaptation logic 10 based on last downloads throughput record, current throughput estimation and buffer level. The video quality selection is mainly governed by the primary bandwidth-based rules, which are: ThroughputRule uses the averaged throughput of the last segment downloads (3 by default) and is mostly for quality increases. AbandonRequestsRule tracks the real-time throughput to detect instant large throughput changes. In case the estimated download time gets significant higher than the segment duration, then the download is abandoned and the same segment at a lower quality is requested in order to prevent re-buffering events. Besides, two secondary rules based on buffer level are also implemented to control quality decreases, which are:

45 Chapter 3.2: DASH Client 45 BufferOccupancyRule overrides the ThroughputRule when a quality decrease is requested and there is enough video in the buffer, so that slower downloads can be absorbed to maintain the quality level until the throughput increases again. This rule prevents quality oscillations, which decrease users perceived quality [? ]. InsufficientBufferRule forces quality decrease when the buffer is empty, trying to get the data as fast as possible and thus minimize stalling time. These four rules govern the basic dashif behaviour, although the BOLA algorithm [? ] has been lately implemented but not activated by default. BOLA considers the quality selection as a utility maximization problem just considering the buffer level, unlike most of the state-of-the-art algorithms that have different network throughput estimations. Regarding the performance metrics, the player already implements a metric system that periodically (each second by default) collects information related to the following parameters: Buffer Length is the buffer length in seconds. Bitrate Downloading is the bitrate of the representation being downloaded. Index Downloading is the index of the representation being downloaded. Current Index is the index of the representation being played. Max Index is the maximum index available at the MPD. Dropped Frames is the count of dropped frames by the rendering pipeline since the beginning of the playback. Latency is the minimum, average and maximum latency in seconds over the last 4 downloaded segments. Download is the minimum, average and maximum download time in seconds over the last 4 downloaded segments. Ratio is the minimum, average and maximum ratio of the segment playback time to the total download time over the last 4 downloaded segments. However, a new metrics collection system has been implemented, so that they are not periodically collected but at the end of the playback. Besides, new metrics have also been implemented to measure the QoE of the video session. The metrics are divided into HTTP metrics, which are extracted from the HTTPRequest class; temporal quality metrics, which collect temporal information of the video playback; and specific DASH metrics, which are the previously mentioned Buffer Length and Dropped Frames metrics and are still periodically collected. The HTTP metrics are depicted in Table?? and are collected for each of the played segments. It must be noted that the number of played and downloaded segments might

46 46 Chapter 3.2: DASH Client Metric Definition Implementation Unit Bitrate Bitrate of the segment Extracted from segment URL kbps Index Representation index of the segment Mapping from bitrate to index - Throughput Available network throughput Segment length divided by download time kbps Latency Network latency Time between first response byte arrival and request sent time s Table 3.2: HTTP Metrics Metric Definition Implementation Unit Experiment Duration Configured experiment length Not dashif-related mm:ss Segment Length Configured segment length Not dashif-related s Initial Delay Time between play button click Time between play button click and and actual video playback start PLAYBACK STARTED event ms Number Total number of stalling events stal num++ when BUFFER EMPTY - Duration Summation of the duration of every stalling event Table 3.3: Temporal Quality Metrics Summation of all stalling events duration ms differ, as the experiment duration might be shorter than the video duration. Therefore, the number of played segments is equal to the experiment length divided by the segment length, which this is the number of HTTPRequest objects selected to retrieve the metrics from. Additionally, although segment downloads can be abandoned, this does not interfere with the collection system, as only complete requests are stored in the HTTPRequest list. Regarding the implementation of the metrics, the segment index extraction is unfortunately dataset dependent, as the API does not provide this information per segment request. Therefore, the proposed solution is a mapping from segment bitrate to index, as it is a direct and unambiguously mapping for each video quality. However, an alternative solution to this problem is provided in Section??. Moreover, the segment length in bytes is obtained from the Content-Length HTTP header for the throughput computation. The temporal quality metrics are depicted in Table?? and none of them is computed by the dashif player. Experiment Duration and Segment Length are configurable parameters and are not related to dashif. Thus, they are extracted from the arguments sent to the program. On the contrary, for the Initial Delay and stalling-related metrics, the events provided by the API have been used. The PLAYBACK STARTED event is raised when the playback starts after having been paused, and thus it is just considered the first time it is raised to get the initial delay. BUFFER EMPTY is triggered when the playback stalls and BUFFER LOADED when the buffer is loaded again. Therefore, the duration of a stalling event is computed as depicted in equation??, which is finally added to the overall stalling duration.

47 Chapter 3.3: Quality of Experience Metric 47 Figure 3.4: dashif Deployment Architecture stall duration = time(buf F ER LOADED) time(buf F ER EMP T Y ) (3.1) The dashif client creates a log with the collected metrics. However, as JavaScript applications cannot write to the host s file system, the Google Chrome s filesystem 11 is used to store the it, creating a file for each if the metrics types (HTTP, temporal and DASH). This is one of the reasons for selecting Google Chrome as the browser to execute the video player. Together with Google Chrome, all the necessary elements for dashif deployment are depicted in Figure??. As it is JavaScript software, it should be published on a HTTP server in order to be requested by the client through a web browser. Therefore, it is deployed on top of SimpleHTTPServer 12, which is a Python server that provides a simple and fast manner to make content available for downloading. In addition, as mentioned before, Google Chrome is selected for client download and execution, as it provides filesystem and all the extensions and codecs for video streaming by default. However, as browsers are typically run by humans, an automation system must be used to run the experiments on GENI. Therefore, the Selenium library for Python 13 is also part of the architecture, as is allows automatic control of web browsers. All these elements are deployed within the same GENI rack. Finally, Xvfb is used to virtualize a video interface on the GENI rack, because none is available and it is needed for Google Chrome execution. 3.3 Quality of Experience Metric The selected QoE model to implement for the experiments, as discussed in Section??, is the Liu et al. model [? ]. It uses three impairments functions to reflect users behaviour, which consider the effects of initial delay, stalls and level variation on users perception of the video quality. The impairment of initial delay is linearly modeled, as shown in equation??, where L ID is the length of initial delay and α is a constant and is listed in Table??, together with all the other constants used in the model

48 48 Chapter 3.3: Quality of Experience Metric α a b c d k B1 B2 MV T h C 1 C Table 3.4: Liu et al. Constants [? ] I ID = min(α L ID, 100) (3.2) The impairment function for stalling events (equation??) needs the total stalling duration (D ST ) and the total number of stalling events (N ST ). Additionally, as stalls are more harmful for videos with high motion, the amount of motion in a video is considered in the model (AMV M), based on the number of motion vectors within a video. The impairment due to motion is upper-bounded by the threshold MV T h, resulting in a piecewise equation for the stalling impairment. a D ST + b N ST c D ST N ST if AMV M < MV T h +d AMV M I ST = a D ST + b N ST c D (3.3) ST N ST if AMV M MV T h +d MV T h First of all, in order to compute the amount of motion, the Motion Vector Magnitude (MV M) is computed for each 16x16 macroblock in horizontal and vertical directions of the frame, following the equation??. Here N x and N y are the numbers of 16x16 macroblocks, and m ij,x and m ij,y the projection of motion vectors on x and y directions, respectively. Once the MV M is computed for all its macroblocks, the metric is averaged for each frame (equation??). MVM ij = (mij,x N x ) 2 + ( mij,y N y ) 2 (3.4) N y 1 N x MV M = MVM ij (3.5) N x N y i=1 i=1 Finally, the Average Motion Vector Magnitude (AM V M) is computed averaging for the M frames of the video, as depicted in equation??. AMV M = 1 M M MV M k (3.6) The impairment function for level variation considers two sources of quality degradation, which are the video quality level (equation??) and the level fluctuation (equation??). The effect of the quality level is computed as a weighted average of the objective video quality, measured through the VQM metric [? ]. Additionally, the weights are exponential terms that model the exponential grow of annoyance with the duration of k=1

49 Chapter 3.3: Quality of Experience Metric 49 the low level. On the contrary, the term D i indicates how long the level of the segment i is maintained and T is the duration of the segments. In addition, the sign(x) function used in equation?? is defined as?? to reflect the major effect of decreasing switches than increasing switches. Both terms are considered to compute the level variation impairment, as shown in equation?? P 1 = 1 N N VQM i e k T D i (3.7) i=1 P 2 = 1 N N VQM i VQM i+1 2 sign(vqm i+1 VQM i ) (3.8) i=1 sign(x) = { 1, if x > 0 0, otherwise (3.9) I LV = B 1 P 1 + B 2 P 2 (3.10) Once the impairment functions are computed, the are put together to compute the factor R (equation??), being C 1 and C 2 again listed in the Table??. The R factor measures the user experience, and the higher the factor is, the better this experience is. R = 100 I ID I ST I LV + C 1 I ID IST + I LV + C 2 IST I LV (3.11) DASH MOS = R R (R 60)(100 R) (3.12) Finally, the R factor is used to compute the DASH MOS (equation??), which is given in the range of [1, 4.5]. Video Quality Metric The Video Quality Metric (VQM) [? ] is a VQA metric for the prediction of objective perceived quality that detects perceptual changes in the spatial, temporal and chrominance properties of the video. It is a full reference metric, as it compares the raw and processed videos in order to extract the quality parameters, which are indicators of the overall video distortion. Seven different parameters are considered within the model. Four parameters measure spatial properties from the luminance component Y, two extract features from the chrominance components C B and C R and the last one considers the effects of contrast and motion. These parameters are: si loss detects decrease or loss of spatial information. si gain measures improvements to quality.

50 50 Chapter 3.4: Video Dataset hv loss detects shifts of edges from horizontal and vertical orientation to diagonal orientation. hv gain detects shifts of edges from diagonal orientation to horizontal and vertical orientation. chroma spread detects changes in the spread of the distribution of color samples. chroma extreme detects severe localized color impairments. ct ati gain includes the effects of motion and spatial detail in the model. These parameters are linearly combined to compute the VQM with maximum objective to subjective quality correlation. The VQM is in the [0,1] range, where lower values mean better video quality. Although this metric is a QoE model itself, as it provides a mapping from technical parameters to subjective quality, it cannot be directly applied to DASH systems because it does not consider the factors introduced in Section??. Regarding the software implementing this metric, the National Telecommunications and Information Administration (NTIA), which developed the VQM metric, provides its own software 14, but it is obsolete as it does not support the latest video formats. On the contrary, the vqmetric 15 implements several video quality metrics, among which is the VQM. Currently supported formats are mp4 and y4m (YUV420), which must be considered when selecting the video dataset for the experiments. In addition, the vqmetric tool is still under development and resolution scaling is not supported yet, which restricts the dataset to videos at the same resolution. Apart form these two tools, there is a lack of available VQM tools. The vqmetric tool has been selected because it supports current video formats, which implies that its limitations must be considered when choosing the dataset for the experiments. Moreover, its current limitations will be eventually overcome as new versions are released. 3.4 Video Dataset A well accepted, freely available video dataset for DASH is presented in [? ], which includes various videos with different characteristics in terms of content generation, with animation and movie sequences, as well as high motion content, such as sport. In addition, videos are compressed into different qualities and resolutions, as well as segmented into different lengths. Thus, all these aspects make it convenient for studying the video-source-related effects on DASH performance in different environments. For instance, devices with different screen size can be used, such as laptops or smartphones, due to the multiple available resolutions, or the most suitable segment length can be selected for a given scenario (see Section??) description-of-vqm-tools.aspx 15

51 Chapter 3.4: Video Dataset 51 Parameter Value Unit Resolution 1280x720 - Input format YUV420 - Output format MPEG-4 - Duration 1 min Frames per second 24 fps Segment length 2, 4, 6 sec GOP size 48, 96, 144 frames Data rate 0.1, 0.25, 0.5, 1, 2, 3, 5 Mbps Table 3.5: Dataset Specifications However, in spite of it desirable characteristics, the VQM tool selected for the experiments (Section??) can only be used with datasets with sequences at a single resolution, Therefore, the multiple resolutions prevent the VQM metric computation with the selected tool due to the current lack of resolution scaling. This problem has been addressed by generating a new dataset compatible with vqmetric. The fundamental premise of this dataset creation is that the computation time should be keep as low as possible, because both video compression and VQM computation are demanding processes in terms of computation time. In addition, a wide range of video qualities should be considered, so that the DASH client can adapt to several network conditions. Finally, the video should be segmented into multiple lengths in order to allow studies of segment lengths effects on DASH performance. Taking into account the computation time limitation, the dataset is composed of a single video sequence, so that there is no need of repeating the process for different sequences. The Big Buck Bunny video has been selected, which is an open-source, animation film intended for video research. The raw video files are freely available on the Xiph organization webpage 16. As depicted in Table??, a 720p resolution has been selected, so that HD video is provided. In addition, the video sequences have 1 minute length in order to minimize the amount of data the encoder and vqmetric work with, and thus the computation time. This time has been selected as it is considered enough to study the cross-layer interactions with TCP, because the cwnd updates are produced in a lower time scale. The number of bytes corresponding to 1 minute are extracted from the original sequence according to the equation??, which provides the number of bytes per frame in YUV420 codification. The values of width and height are the number of pixels in the horizontal and vertical directions. frame size = 3 width height (3.13) 2 Regarding the codification options, three segment lengths between the optimal values exposed in Section?? have been selected. Besides, in order to allow seamless switching 16

52 52 Chapter 3.4: Video Dataset Figure 3.5: Dataset Creation Process across representation, the GOP size is fixed to the length of the segments. Thus, each of the three GOP size values in Table?? corresponds to a segment size of 2, 4 and 6 seconds, respectively. Finally, there are seven data rates, where values below 1 Mbps are configured so that clients can overcome losses quickly, while higher data rates provide good quality when there is enough throughput. This dataset lacks of a variety of content types, thus only the characteristics of animation content are considered for the experiments. In addition, a single resolution has been considered for all the codifications, which results in a worse codification efficiency [? ]. However, it has a relatively short generation time and is vqmetric friendly, which allows us to compute the QoE metric. Moreover, DASH clients can adapt to different network conditions thanks to the several available video qualities. Finally, how segment length influences QoS and QoE can be also analysed with this dataset due to encoding of three different lengths. Therefore, being aware of its limitations and advantages, this dataset is considered suitable for the analysis of the relationships between QoS and QoE in DASH environments. The dataset creation process follows the structure depicted in Figure??, using the DASHEncoder tool [? ]. DASHEncoder provides both a segmented and a non-segmented copy of the video for each of the configured data rates and the MPD file ready for the dataset deployment. It uses the MP4Box 17 library for the MPD generation, but the revision 3744, which creates a DASH standard [? ] non-compatible MPD, and thus not supported by the dashif client. Therefore, the last version of MP4Box is used, which takes the encoded video as input. Particularly, the non-segmented video has been selected for the MPD creation, as it allows a easier deployment on the server. Non-segmented files are typically used for on-demand services, while segmented deployment is used for live video, are the segments are generated on the go. Finally, MP4Box provides the MPD and the init files, which are video files with the needed initialization segment, so that they can be deployed on the video server. 17

53 Chapter 3.5: Data Analysis Data Analysis A data analysis process is needed in order to process and understand the raw data resulting from the experiments. First, raw data should be processed in order to find outliers that can deteriorate the results or to compute aggregated statistics, so that the can be easily understood. In addition, once the data is already processed, significance tests are needed to prove if the results are by chance or they are indeed statistically significant. Finally, graphs and plots should be generated in order to visualize and present the data. The data is processed in the data mangling module, where the data is filtered to discard outliers. The Iglewicz-Hoaglin method [? ] has been selected, which is based on the data median absolute deviation (MAD). This method classifies data as outliers based on the rule??, where z i is defined in equation?? and MAD(x) in equation??. if z i > 3.5 x i outlier (3.14) z i = x i median(x) MAD(x) (3.15) MAD(x) = median( x i median(x) ) (3.16) Besides the outliers, the systems also filters erroneous throughput results. The dashif client, in the presence of high losses, sometimes fails to compute the download time. As a result, erroneous high throughput values are provided by the client. This is just a sporadic result and thus these values are discarded. Finally, data is also averaged in this module, because multiple execution of the same experiment are performed in order to do a subsequent statistical analysis. The median is computed for the integer-valued samples, while the mean is used for the real-valued ones. The significance test module takes as input the processed data in order to perform different hypothesis tests. The two-sample t-test is the selected test, because it allows the comparison between two distributions and test if the mean of these distributions is the same. The two hypothesis for the two-sided t-test are defined in?? and the test statistic is defined in equation??, where n is the sample size and S X1 X x the combined standard deviation (equation??). H 0 : X1 = X 2 vs. H 1 : X1 X 2 (3.17) t = X 1 X 2 2 S X1 X x n (3.18) S X1 X x = 1 2 (S2 X 1 + S 2 X 2 ) (3.19)

54 54 Chapter 3.5: Data Analysis Library matplotlib NumPy SciPy Description Plots producing Data visualization Scientific computing Array operations Statistical tests Table 3.6: Python Libraries [? ] Jupyter Notebook 18 together with Python is used for the data visualization and graph generations within the display module in the display module. Jupyter Notebook is a multi language scripting tool that has been used to run Python scripts and visualize the generated graphs. In addition, Python provides powerful libraries for data analysis and the libraries used within this thesis are depicted in Table??, together with the task they were used for

55 Chapter 4 Experiments Evaluation Different experiments have been run in the course of this thesis in order to analyse DASH performance. The effects of TCP congestion control algorithms on DASH has been studied by observing how DASH QoS and QoE parameters vary under different circumstances. Besides, some experiments were done with the relay software, so that the performance enhancement produced in relayed DASH downloads can be evaluated. 4.1 Congestion Control Algorithms Analysis The latest Google s BBR congestion control algorithm is said to provide better performance than CUBIC [? ], as it uses congestion- and not loss-based congestion detection. However, the performance enhancement BBR can produce on DASH applications is still unknown. Different experiments have been performed in order to compare the performance differences between CUBIC and BBR, providing insights in terms of QoS and QoE measures Cross-Traffic Scenario An application s performance depends directly on the amount of resources it has access to. However, the Internet protocols were not designed to provide exclusive access to communication resources, and thus multiple data flows compete for the resources when sharing at least one link. Congestion control mechanisms control the access to resources in TCP connection by limiting the amount of data to be sent. Moreover, these algorithms show different fair share characteristics [?? ], which results in performance differences depending on the selected algorithm. Therefore, how DASH performance is impacted by such differences is one of our research tasks. The topology depicted in Figure?? has been deployed on GENI to study the effects of cross-traffic on DASH applications. Two different aggregates have been selected for the deployment of the nodes, so that realistic latency values are obtained, getting a latency of 80 ms. However, these values do not necessary coincide with typical Internet 55

56 56 Chapter 4.1: Congestion Control Algorithms Analysis Figure 4.1: Cross-Traffic Experiment Topology Node client server router1 router2 cross-traffic1 cross-traffic2 Software dashif, Apache, Wireshark Apache, Wireshark Open vswitch iperf3 Table 4.1: Nodes Configuration in Cross-Traffic Topology values, as GENI uses the Internet2 1 network to connect aggregates, which provides better performance parameters than the public Internet. In addition to the cross-traffic effects, this topology also allows the analysis of a bottleneck s influence on DASH performance for different TCP implementations, which is achieved by configuring L5 with a lower throughput than the other links. As mentioned in Section??, the difference between CUBIC and BBR is the operation point, which depends on the bottleneck s throughput and the length of the router s buffers at that bottleneck. While BBR keeps buffers empty most of the time, CUBIC uses losses due to buffer overflows to detect congestion, thus increasing the latency. The nodes in Figure?? are configured according to the Table??. The client node is the DASH client, that sends requests to the video server deployed on the server node. Additionally, an Apache server is deployed on client to be used as a log server, so that the metrics collected by dashif can be extracted from GENI. Wireshark instances listen to video transactions in both directions of the download. The router1 and router2 nodes are Open vswitch routers, as described in Section??. Finally, the cross-traffic is generated in the cross-traffic1 and cross-traffic2 nodes with the iperf3 2 tool, which performs network performance measurements generating TCP or UDP greedy traffic. In this experiment, the links are configured with the throughputs in Table??, with the same value for the up and down links. Moreover, two different sets of values are

57 Chapter 4.1: Congestion Control Algorithms Analysis 57 Link Small Bottleneck Large Bottleneck L1, L2, L3, L4 10 Mbps 20 Mbps L5 3 Mbps 10 Mbps Table 4.2: Links Throughput in Cross-Traffic Topology configured, so that experiments can be done with a small and a large bottleneck. The throughput in the small bottleneck scenario is set to the integer throughput immediately above the data rate requested by the best video quality, which is 3 Mbps. In the other scenario, the selected value is high enough to allow best video quality download even in the presence of cross-traffic. The cross-traffic is generated in both directions of the connection, so that both video download and its ACKs are affected. This is done by executing two iperf server on each of the cross-traffic machines, with a respective iperf client on the opposite machine testing the connection to that server by sending greedy traffic. The cross-traffic machines are configured with the same congestion control algorithm than client and server. Results The results in Table?? are the averaged values of the outlier-free sample obtained after running 50 times for both cross-traffic experiments. Specifically, the median has been computed for integer values and the mean for the non-integer ones. The segment length selected for these experiments is 4 seconds. For the 3 Mbps scenario, BBR outperforms CUBIC in latency and initial delay, while BBR gets on average more retransmissions than CUBIC. Although throughput measurements differ by approximately 80 kbps, it is not enough enough to state that CUBIC achieves a significantly higher throughput than BBR. Moreover, video playback plays fluently without stalling events with both algorithms, as well as the download video quality also coincide in both cases. Regarding the experiments with 10 Mbps bottleneck, a side effect was found that implied a change in the experiment configuration. High losses, causing an average of 40 retransmission per second, appeared in both iperf clients when they were run in parallel in the BBR scenario. This behaviour continued through long periods of time, preventing the convergence to the BDP operation point. Therefore, a single cross-traffic client has been used as a solution. In order to modify the scenario as little as possible, the crosstraffic is generated in the same direction as the video downloads. Therefore, only the sporadic and short length ACKs do not experience the cross-traffic influence. In the 10 Mbps scenario there is a performance enhance for the majority of the metrics. Lower latency values are achieved, being similar for both algorithms, as well as a lower initial delay, although, unlike in the previous scenario, BBR gets a higher value than CUBIC. In addition, a significant difference appears between the throughput in both algorithms. The median downloaded video quality also increases, especially

58 58 Chapter 4.1: Congestion Control Algorithms Analysis Small Bottleneck Large Bottleneck Congestion Control CUBIC BBR CUBIC BBR Bottleneck 3 Mbps 3 Mbps 10 Mbps 10 Mbps Latency 183 ms 156 ms 109 ms 112 ms Throughput 1063 kbps 986 kbps 2409 kbps 1726 kbps Initial Delay 6563 ms 4744 ms 2321 ms 3133 ms Stalling Number Total Stalling Duration ms 329 ms Retransmissions Number Quality Index Table 4.3: Cross-Traffic Experiments Results X 1 X 2 p-value Test Result BBR-throughput(3 Mbps) CUBIC-throughput(3 Mbps) 1.21e 7 H 1 accepted BBR-latency(10 Mbps) CUBIC-latency(10 Mbps) H 1 accepted BBR-stall-duration(10 Mbps) CUBIC-stall-duration(10 Mbps) 0.5 H 0 not rejected BBR-retransmissions(3 Mbps) BBR-retransmissions(10 Mbps) 0.8 H 0 not rejected Table 4.4: Cross-Traffic Significance Tests Results for CUBIC. While the retransmissions number remains almost the same for BBR, it decreased for CUBIC. Finally, stalling events appear on average once per video session, with a duration of approximately 300 ms. As similarities have been found among the values of some of the metrics, the statistical significance of these results has been evaluated applying the two-sample t-test (Section??), being the hypothesis as defined in the expression??. The confidence level for the tests is 95% and they have been performed only for those metrics in Table?? that show a close value. Thus, in the 3 Mbps bottleneck scenario, the test comparing the throughput returns a p-value of 1.21e 7, which means that the null hypothesis is rejected. On the contrary, in the 10 Mbps bottleneck scenario, the test for the latency has a p-value of 0.002, meaning the null hypothesis is rejected as well. However, the null hypothesis has not been rejected for the stalling duration, with a 0.5 p-value. These results are collected in Table??. H 0 : X1 = X 2 vs. H 1 : X1 X 2 (4.1) Besides, the retransmissions number for BBR is similar in both scenarios. Therefore, a significance test has been run to prove their similarities. The p-value in this case is 0.8 and thus the test failed to reject the null hypothesis. Figure?? corresponds the small bottleneck scenario using BBR and represents the throughput, the latency and the downloaded video quality over time. After a sudden increase at the beginning of the connection, the throughput stabilizes around 1 Mbps, agreeing with the results in Table??. The DASH client does not request any video

59 Chapter 4.1: Congestion Control Algorithms Analysis 59 Throughput (Mbps) 2 0 Latency (ms) Quality Index Time(s) Figure 4.2: BBR Results with Small Bottleneck segment during the silences, thus resulting in the depicted on-off behaviour. Regarding the latency, sudden oscillations with high amplitude are common, even though there are also small periods where the latency is almost constant. Finally, the quality 3 has been the most requested quality within the whole session and the longest period of time without switches has been at quality 2. On the contrary, Figure?? are also from the small bottleneck scenario, but in this case with CUBIC. The throughput shows a similar behaviour as BBR (Figure??), with an on-off pattern, including large silent periods in the middle of the download. However, throughput fluctuations are more common and with higher amplitude. Unlike BBR, the CUBIC transmission starts with a throughput below the average for the first segment downloads, after which, despite the fluctuations, it increases. The latency is more stable with CUBIC than with BBR. Fluctuations have lower amplitude, although the average value is higher in this case (Table??). This is because latency values close to the actual link latency are less frequently achieved with CUBIC. In addition, comparing the quality index graph in Figure?? with the one in Figure??, the quality switch behaviour is practically identical for both congestion control algorithms, although CUBIC spends less time downloading quality 2. Figures?? and?? provide the same graphs seen so far, but for the large bottleneck scenario. The main differences between these graphs and the small bottleneck ones are quantitative, as they present a similar behaviour. The throughput evolutions show less silent periods for both algorithms, CUBIC s latency graph stabilization periods are longer and CUBIC achieves better video quality than BBR.

60 60 Chapter 4.1: Congestion Control Algorithms Analysis Throughput (Mbps) 2 0 Latency (ms) Quality Index Time(s) Figure 4.3: CUBIC Results with Small Bottleneck Discussion The presented results allow some remarks regarding dashif client behaviour that are important for the subsequent analysis of the congestion control algorithms. In the throughput graphs depicted in Figures?? -??, there are silent periods in which no data is sent. This silence appears because the client has enough data in its application buffer, and thus it does not request more video segments until they are necessary. This results in a well-known on-off DASH behaviour [? ]. In addition, competing traffic needs some convergence time until resources are fairly shared [?? ]. Therefore, DASH on-off traffic might experience unfair resource sharing in the presence of cross-traffic, since the resources at its disposal when transmitting are acquired by other applications in the silent periods. Thus, DASH must go again through the convergence period when it returns to transmission and, depending on the on-off pattern, it might be that it never reaches the fair sharing zone. Besides, dashif might request more video segments than necessary. This is done when there is enough video in the application buffer, but at a low quality. Therefore, the client can afford requesting the same segments it has already downloaded, but not played, at a higher quality. If the new segments arrive on time, then they are played instead of the lower quality ones previously downloaded. As our detection of the segment quality played at each time was designed under the assumption that the downloaded segments are always played (Section??), this finding implies that this detection mechanism is sometimes reporting lower quality than the actually played, leading to QoE underestimation.

61 Chapter 4.1: Congestion Control Algorithms Analysis 61 Throughput (Mbps) Latency (ms) Quality Index Time(s) Figure 4.4: BBR Results with Large Bottleneck Finally, although CUBIC and BBR get an average throughput close to 1 Mbps for the 3 Mbps experiments (Table??), the quality index graphs in Figures?? and?? show that the quality 2, which needs 500 kbps, is downloaded for a major period of time. Thus, almost half of the throughput is wasted. A better throughput exploitation would have been achieved if a 750 kbps quality was added to the dataset. Regarding the congestion control analysis, one would expect lower latency results for the BBR executions, as it is shown in [? ] that it works at the optimal latency operation point. Moreover, considering that buffer overflows are minimized with BBR [? ], less retransmissions are expected for BBR than for CUBIC. However, Table?? shows contradicting results. Figure?? depicts the latency and the throughput for the 10 Mbps scenario and BBR configured. This graph shows details of the throughput and latency graphs in Figure??, so that BBR s latency and throughput dynamics can be analysed in order to understand why the latency and retransmission results are worse than expected. The first singular behaviour in the execution is the latency increase marked by the box number 1. This increase might be considered as a result of the congestion produced by the data transmission after the preceding silent period. In this case, the data excess was queued at the bottleneck and the algorithm would eventually adapt its data rate to correct it. However, the sudden, high latency increases following it, which are marked by the box number 2, tell the opposite, because the Wireshark capture shows that the packets arriving at each of the peaks are the ACKs of the packets sent at the beginning of the transmission, after fast retransmit. Thereby, these packets were likely lost. Moreover, as only the cross-traffic is being transmitted before the start of the video download, the

62 62 Chapter 4.1: Congestion Control Algorithms Analysis Throughput (Mbps) 5 0 Latency (ms) Quality Index Time(s) Figure 4.5: CUBIC Results with Large Bottleneck routers buffers are empty as a result of iperf working at the operation point. Therefore, the reason for the losses at the beginning of the transmission might be the buffers overflow at the routers. Loss events do not only occur at the beginning, but also later in the download, such as the latency increase marked by the box number 3, with the subsequent peak, meaning ACKs arrival after retransmission. Buffers size in the routers depend on their configuration. Open vswitch has been configured for these experiments, but Open vswitch does not have an influence on buffer sizes, as mentioned in its implementation guide 3. However, the operating system and network drivers do buffer packets. The Byte Queue Limits (BQL) feature has been included for the Linux kernels since the version. BQL limits the hardware queues by setting a limit to reduce the latency produced by excessive queueing without sacrificing throughput. Therefore, the limit is set to the minimum amount of bytes needed to prevent hardware starvation between two consecutive transmissions. It varies according to network latency dynamics and does not depend on the throughput. The limit typically has a size much lower than the queues in the hardware 4. Thereby, the buffer size in the used routers, which have Linux kernel 4.9, is limited by BQL. The small buffer size configured by BQL have a different effect on CUBIC and BBR. It has a desirable effect on CUBIC, reducing latency while allowing a faster congestion detection due to buffer overflow. The time between network congestion and the first packet loss due to buffer overflow directly depends on the buffer size. Therefore, the

63 Chapter 4.1: Congestion Control Algorithms Analysis 63 Figure 4.6: BBR 10 Mbps Latency and Throughput smaller the buffer, the faster the congestion is detected. However, BBR was designed under the assumption that routers have high buffer size [? ]. BBR periodically sends more data than the network can deliver in order to detect available throughput changes, where a latency increase means no changes, since the data excess creates a queue at the bottleneck as a result of the bottleneck being congested. Therefore, if the buffer is not big enough, the data excess might suffer losses within this process. A new set of experiments has been run in order to prove if the previous affirmation about BQL buffer sizes and BBR holds for the scenario in Figure??. This experiment consists of two iperf clients in parallel, one transmitting from cross-traffic2 to crosstraffic1 and the other from server to client. Figure?? shows the latency graph for one of the flows, where the latency finally converges to the optimum and losses no longer appear. Thus, the data excess produced in the gain cycling is buffered without losses. Moreover, it can also be inferred from this result that small buffer sizes do not prevent BBR constant traffic to work at the BDP operation point. Besides, BQL also affects DASH on-off traffic in a different manner it does with constant traffic. In a normal execution with BBR configured, a new transmission of an on-off application after a silence causes a queue due to the data excess, and thus a latency increase. However, if the buffer size is not big enough, the new transmission causes losses, which has two effects, according to the results in Figure??. Firstly, latency fluctuates as a result of the losses, reaching high values in some cases. In addition, as the download spends most of the time on loss recovery because of the losses and the on-off behaviour, throughput oscillations are produced. Both CUBIC and BBR spend most of the time in the loss recovery phase, which reduces the differences between them. The

64 64 Chapter 4.1: Congestion Control Algorithms Analysis Latency (ms) Time(s) Figure 4.7: iperf Latency Evolution for the 10 Mbps Scenario similar behaviour is a result of the fast recovery algorithm taking control of the cwnd, thus disabling the congestion control until losses are overcome. Therefore, although applications with a constant traffic pattern do not essentially need high buffer sizes when working with BBR, non-desirable latency and throughput patterns appear for DASH when competing with cross-traffic due to its on-off behaviour, resulting in similar QoS and QoE metrics due to the analogous behaviours, as depicted in Table??. The 10 Mbps scenario shows that the video quality for CUBIC is close to the best one, while BBR fails to achieve a good video quality. As dashif selects the quality depending on the throughput (Section??), the throughput difference depicted in Table?? explains these results. The throughput difference shows that BBR fails to get a fair resource sharing as good as CUBIC s with on-off applications in congested links. However, this behaviour is not expected to persist in scenarios with longer buffers, as losses will be drastically reduced for BBR, while CUBIC s latency increases due to queues. The other remarkable difference regarding QoE metrics, besides the video quality for the 10 Mbps scenario, is the initial delay. The dashif client always selects the same quality for the first request. Therefore, the segment size is constant for all the first requests and the download time only depends on the latency and the achieved throughput. In the 3 Mbps scenario, Figure?? shows that CUBIC achieves a lower throughput than BBR at the beginning of the connection. Additionally, CUBIC also has on average higher latency than BBR (Table??). These two factors explain BBR s lower initial delay for the 3 Mbps scenario. However, CUBIC achieves better initial delay for the 10 Mbps bottleneck, where the difference is approximately 800 ms on average. As the Table?? shows that both algorithms achieve a similar latency, now the throughput is the only variable affecting the initial delay. Throughput graphs in Figures?? and??, together with the results in Table??, suggest that CUBIC gets better fairness than BBR as a

65 Chapter 4.1: Congestion Control Algorithms Analysis 65 Figure 4.8: Losses Experiment Topology Node client server router1 router 2 Software dashif, Apache, Wireshark Apache, Wireshark Open vswitch, NetEm Open vswitch Table 4.5: Nodes Configuration in Losses Topology result of being more aggressive, which explains the different throughputs and thus the better initial delay for DASH Packet Losses Scenario BBR provides throughput enhancement with respect to CUBIC [? ] in the presence of losses. Results in [? ] prove that latency and losses are more relevant for DASH QoE than available throughput. However, it is not studied yet if this holds for BBR as well. Therefore, experiments have been performed enforcing losses in order to compare how BBR s and CUBIC s reactions to losses affect DASH performance. The topology depicted in Figure?? has been deployed on GENI to perform these experiments, where the different nodes are configured according to Table??. The client and server nodes respectively run the DASH client and server, while router1 and router2 are Open vswitch routers. Moreover, losses are emulated with the NetEm software, which provides a network emulation functionality that allows delay and packet loss, duplication, corruption and re-ordering emulation. Finally, links L1 and L2 are configured with 10 Mbps on each communication direction, while L3 has been configured to the integer throughput immediately above the data rate requested by the best video quality, which is 3 Mbps. Results The results obtained in these experiments focus on the evolution of the QoS and QoE metrics regarding the losses probabilities mentioned before. The evolution for the QoS

66 66 Chapter 4.1: Congestion Control Algorithms Analysis Throughput (Mbps) CUBIC BBR Loss Rate (a) Throughput Latency (ms) Loss Rate (b) Latency CUBIC BBR Figure 4.9: QoS Metrics Evolution over Losses parameters is depicted in Figure??, where the Figure?? is the evolution of the throughput and Figure?? the evolution of the latency. The throughput behaviour coincides with the results provided in [? ], where CUBIC experiences a throughput drop with a lower loss percentage. On the contrary, BBR throughput remains almost constant for high losses until the fast drop behind 10% losses. Regarding the latency (Figure??), differences between CUBIC and BBR are of the order of 10 ms for low losses. In addition, CUBIC latency converges to BBR latency for high losses. This is an unexpected issue that is later address in the discussion. Figure?? shows the results for the QoE parameters. CUBIC downloads higher video qualities for low losses, although this is inverted for high losses, where BBR gets better video qualities. Besides, there is a big difference between the initial delay, as CUBIC initial delay increases exponentially with the losses. On the contrary, BBR s initial delays experience an increase with losses beyond 10%. Additionally, 200 ms initial delay is experienced by the two algorithms for losses below 0.1%. Finally, the stalling evolution is depicted in Figures?? and??. This behaviour, as well as the latency case in the QoS metrics, is unexpected, as the higher the losses are, the less stalling events occur, and thus a lower total stalling duration is achieved. Moreover, CUBIC outperforms BBR for losses below the 10%. The high stalling duration depicted in Figure?? for high losses shows that, although there are a decrease in the number of stallings, they have longer duration Discussion The objective of this experiments is to understand how QoE parameters are affected by the different throughputs obtained by CUBIC and BBR in the presence of losses. Figure?? shows that dashif selects approximately the same quality for low loss with BBR and CUBIC, as a result of both algorithms achieving roughly the same throughput. However, as BBR gets higher throughputs than CUBIC for high losses, it also downloads better qualities. This is a result of the throughput being the main rule for the quality selection

67 Chapter 4.1: Congestion Control Algorithms Analysis 67 Quality Index CUBIC BBR Initial Delay (ms) CUBIC BBR Loss Rate Loss Rate (a) Quality Index (b) Initial Delay Tota Stalling Number CUBIC BBR Stalling Duration (ms) CUBIC BBR Loss Rate (c) Stalling Number Loss Rate (d) Stalling Duration Figure 4.10: QoE Metrics Evolution over Losses in the dashif client (see Section??). Therefore, the BBR quantization level is less sensitive to losses than CUBIC. Regarding the initial delay, both algorithms achieve approximately 2 seconds initial delay for low losses. As the impact of initial delay on QoE is not severe [? ], this value is considered acceptable. Nevertheless, when losses are increased, CUBIC s initial delay grows exponentially with the loss, resulting in much higher values for losses beyond 5%. A video segment download time depends on three parameters: the segment size and the network latency and throughput. The dashif clients always request the quality 4 at the beginning of the download, thus the segment size is constant for all the downloads. Additionally, Figure?? shows that BBR and CUBIC achieve a similar latency for high losses. Therefore, the initial delay directly depends on the throughput in this case. Since BBR achieves much higher throughput than CUBIC in the high losses scenario, this is the reason why its initial delay remains close to the optimum, while CUBIC s latency grows. The last QoE metrics are related to the stalling events. An interesting result is that the highest number of stalling events, and in turn the longest stalling durations, appear for the low losses scenarios. Figure?? depicts the evolution of the throughput for a single video session with 0.001% losses. The DASH client constantly requests new video

68 68 Chapter 4.2: Relayed Connection Performance Analysis segments, which can be derived from the lack of silences in the complete download. The silences appear when there is enough video in the application buffer, so that the client can afford waiting before requesting more segments. In addition, the download time of the segments is approximately 4 seconds, which coincides with their length. Therefore, the application buffer level is likely to deplete in these scenarios. This is confirmed by the Figure??, which depicts the buffer level of the same video sessions over time. On the contrary, this effect is not longer present for losses beyond 1%. Figure?? depicts the buffer level for a download with 5% losses, which also has low buffer level at the beginning of the playback, and thus it is likely to get stalling events at the beginning of the playback. However, after the first 20 seconds it has enough video in the buffer that does not suffer stalls in the remaining of the playback. Therefore, DASH applications are more sensitive to losses when the selected video quality demands all the available throughput, specially at the beginning of the download. However, if the client is able to download enough video within the first seconds of the session, stalling events are less frequent. The stalling for low losses would no longer be a problem if the dashif client used buffer information instead of throughput as the main rule for the quality selection. Most of the losses in these cases have been detected at the beginning of the download, which is a result of the client automatically increasing the selected quality after the first segment download. A more conservative approach, based on the buffer level, could make a decision in order to minimize the stalling probability only based on the amount of data already received, and thus only increase the quality when stalling are less likely. The BOLA adaptation logic [? ] implements such a buffer-based approach, but it has not be activated for the experiments. Besides, CUBIC outperforms BBR in stalling number and duration in most of the cases. For losses bellow 0.01%, where both algorithms get the same throughput, the stalling number is roughly the same. However, the number of stalls drops together with the throughput. Figure?? shows the buffer level evolution for 5% losses, where CUBIC achieves better buffer occupancy, without stalls at the beginning of the playback. This is a result of downloading a less demanding video quality, thus getting faster downloads despite the throughput drop, as it can be seen on Figure??. Finally, for high losses, where CUBIC gets almost zero throughput, BBR outperforms as a result of the higher throughput. To summarize, BBR s higher throughput with losses results in better initial delay and downloaded video quality. However, CUBIC still gets better results regarding stalls, specially for higher losses. Moreover, DASH has a high sensitivity to losses when the quality index demands most of the available throughput, resulting in stalling events even for losses around 0.001%. 4.2 Relayed Connection Performance Analysis The relay implementation might enhance DASH performance by reducing the latency in the presence of losses [? ]. In order to prove it, the topology depicted in Figure??

69 Chapter 4.2: Relayed Connection Performance Analysis 69 Throughput (Mbps) BBR Throughput (Mbps) CUBIC Time(s) Figure 4.11: Throughput Evolution with 0.001% Losses Node client server relay Software dashif, Apache, Wireshark Apache, Wireshark Open vswitch, NetEm, Relay Table 4.6: Nodes Configuration in Relay Topology has been deployed on GENI, emulating the suboptimal scenario for TCP introduced in Section??. The long delay is achieved by deploying the server on an aggregate and the relay and the client on a different one. In addition, as in Section??, losses are emulated in the link between the client and the relay with the NetEm software. The software running on each machine is summarized in Table??. Finally, the links L1 and L2 are configured with 3 Mbps in both directions. The current relay implementation status is depicted in Figure??. The controller is not configured yet, and thus the relay must be configured beforehand to open the connection to the server by putting a relay rule on it. Once the connection is open, the relay returns the input, where the requests to the relayed service must be sent to, and the output port, which is used to forward the packets sent by the server to the client. Therefore, although the segmentation is still not transparent, as its relay IP address and port must be specified on the client, the connection is successfully relayed. Additionally, the relay has six buffers, whose size is set when configuring the relay rule. The application buffer is used to store the packets sent by the two connection sides, so that they can be retransmitted in case they are lost. The rib, rob, sib and sob are,

70 70 Chapter 4.2: Relayed Connection Performance Analysis 15 CUBIC BBR 20 CUBIC BBR Buffer Level (s) 10 5 Buffer Level (s) Time (s) Time (s) (a) Buffer Level 0.001% loss (b) Buffer Level 5% loss Figure 4.12: Buffer Level Evolution Throughput (Mbps) Time(s) Figure 4.13: Throughput Evolution with CUBIC and 5% Losses respectively, the receiver and sender input and output buffers. These four buffers should have a size enough to deliver the sent and receive packets to the application buffer at their arrival rate. Most of the data is sent from the server to the client in a DASH download, as the server sends the video, while the client only sends video requests and ACKs. Therefore, the application buffer, rib and sob buffer hold the most of the sent data and in turn the size of these buffer must be configured to allow the download. However, sin and rob buffers sizes can be keep to the default size, which is bytes. Although the other three buffers might have a different optimal size, they correlate because a video download follows the chain rib-application buffer-sob in the relay. Thus, the three buffers have been configured to the same size for the experiments, so that buffer size makes reference to the size of these buffer from now on.

71 Chapter 4.2: Relayed Connection Performance Analysis 71 Figure 4.14: Relay Experiment Topology Figure 4.15: Relay Implementation Status Besides, how well the relay works theoretically depends on the application buffer size. If this buffer is set to only buffer the data depending on the arrival and departure times, the client will be able to download the video, but the data will not remain in the relay, so that each time a packet is lost the relay will need to fetch the data again from the server, thus achieving the same behaviour as a usual TCP connection. On the contrary, for larger buffer sizes, packets remain in the relay for a time t, which depends on the buffer size, so that larger sizes achieve larger t values. If the fast recovery algorithm is activated within this period t, the relay can retransmit the lost packet without requesting it again to the server, thus reducing the latency. Regarding the relay s TCP connections, a connection to the server is open as soon as the flow rule is configured and the relay listens on the input port, waiting for requests. When the client closes the connection, the relay closes its connection to the server and the sockets it is listening to. Afterwards, a new connection to the server is opened and also opens a socket at the same ports than the previous one, so that new requests can be managed without reconfiguring the relay. This behaviour allows the experiments automation, as there is no need of reconfiguring the relay each time a new request is sent. However, high losses scenarios cannot be executed with the current relay implementation, as the dashif client opens a new connection after abandoning a segment download due to long stalling periods, which is a result of excessive packet losses. As soon as the

TCP and BBR. Geoff Huston APNIC

TCP and BBR. Geoff Huston APNIC TCP and BBR Geoff Huston APNIC Computer Networking is all about moving data The way in which data movement is controlled is a key characteristic of the network architecture The Internet protocol passed