Minimizing Application-Level Delay of Multi-Path TCP in Wireless networks: A Receiver-Centric Approach

Similar documents
(1) W tcp = (3) N. Assuming 1 P r 1. = W r (4) a 1/(k+1) W 2/(k+1)

IP Network Design by Modified Branch Exchange Method

Slotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

INFORMATION DISSEMINATION DELAY IN VEHICLE-TO-VEHICLE COMMUNICATION NETWORKS IN A TRAFFIC STREAM

The Dual Round Robin Matching Switch with Exhaustive Service

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Detection and Recognition of Alert Traffic Signs

A modal estimation based multitype sensor placement method

RT-WLAN: A Soft Real-Time Extension to the ORiNOCO Linux Device Driver

A Recommender System for Online Personalization in the WUM Applications

Adaptation of TDMA Parameters Based on Network Conditions

Hierarchically Clustered P2P Streaming System

On using circuit-switched networks for file transfers

EE 6900: Interconnection Networks for HPC Systems Fall 2016

Communication vs Distributed Computation: an alternative trade-off curve

On Adaptive Bandwidth Sharing with Rate Guarantees

Controlled Information Maximization for SOM Knowledge Induced Learning

Erasure-Coding Based Routing for Opportunistic Networks

Optical Flow for Large Motion Using Gradient Technique

Performance Optimization in Structured Wireless Sensor Networks

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

The concept of PARPS - Packet And Resource Plan Scheduling

Computer Networks. TCP Libra: Derivation, analysis, and comparison with other RTT-fair TCPs

Improved Utility-based Congestion Control for Delay-Constrained Communication

AN ANALYSIS OF COORDINATED AND NON-COORDINATED MEDIUM ACCESS CONTROL PROTOCOLS UNDER CHANNEL NOISE

A Cross-Layer Framework of QoS Routing and Distributed Scheduling for Mesh Networks

HISTOGRAMS are an important statistic reflecting the

Color Correction Using 3D Multiview Geometry

Towards Adaptive Information Merging Using Selected XML Fragments

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

Analysis of Wired Short Cuts in Wireless Sensor Networks

A Stable Traffic Engineering Technique for Performance Enhancement of the Non-TCP Elastic Flows *

WIRELESS sensor networks (WSNs), which are capable

arxiv: v2 [physics.soc-ph] 30 Nov 2016

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM

An Unsupervised Segmentation Framework For Texture Image Queries

On the Forwarding Area of Contention-Based Geographic Forwarding for Ad Hoc and Sensor Networks

A New Finite Word-length Optimization Method Design for LDPC Decoder

Dynamic Topology Control to Reduce Interference in MANETs

Configuring RSVP-ATM QoS Interworking

Combinatorial Mobile IP: A New Efficient Mobility Management Using Minimized Paging and Local Registration in Mobile IP Environments

Scaling Location-based Services with Dynamically Composed Location Index

Prioritized Traffic Recovery over GMPLS Networks

Root Cause Analysis for Long-Lived TCP Connections

Topological Characteristic of Wireless Network

A Memory Efficient Array Architecture for Real-Time Motion Estimation

THE THETA BLOCKCHAIN

TCP Libra: Exploring RTT-Fairness for TCP

An Improved Resource Reservation Protocol

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set

On the throughput-cost tradeoff of multi-tiered optical network architectures

IP Multicast Simulation in OPNET

SCALABLE ENERGY EFFICIENT AD-HOC ON DEMAND DISTANCE VECTOR (SEE-AODV) ROUTING PROTOCOL IN WIRELESS MESH NETWORKS

Using SPEC SFS with the SNIA Emerald Program for EPA Energy Star Data Center Storage Program Vernon Miller IBM Nick Principe Dell EMC

Event-based Location Dependent Data Services in Mobile WSNs

An Efficient Handover Mechanism Using the General Switch Management Protocol on a Multi-Protocol Label Switching Network

Image Enhancement in the Spatial Domain. Spatial Domain

Worst-Case Delay Bounds for Uniform Load-Balanced Switch Fabrics

Lifetime and Energy Hole Evolution Analysis in Data-Gathering Wireless Sensor Networks

Ethernet PON (epon): Design and Analysis of an Optical Access Network.

Interference-Aware Multicast for Wireless Multihop Networks

Quality-of-Content (QoC)-Driven Rate Allocation for Video Analysis in Mobile Surveillance Networks

Point-Biserial Correlation Analysis of Fuzzy Attributes

Modeling a shared medium access node with QoS distinction

Number of Paths and Neighbours Effect on Multipath Routing in Mobile Ad Hoc Networks

Heterogeneous V2V Communications in Multi-Link and Multi-RAT Vehicular Networks

i-pcgrid Workshop 2016 April 1 st 2016 San Francisco, CA

1.3 Multiplexing, Time-Switching, Point-to-Point versus Buses

Conversion Functions for Symmetric Key Ciphers

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

Transmission Lines Modeling Based on Vector Fitting Algorithm and RLC Active/Passive Filter Design

The Internet Ecosystem and Evolution

Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

MANET QoS support without reservations

= dv 3V (r + a 1) 3 r 3 f(r) = 1. = ( (r + r 2

APPLICATION OF STRUCTURED QUEUING NETWORKS IN QOS ESTIMITION OF TELECOMMUNICATION SERVICE

Cardiac C-Arm CT. SNR Enhancement by Combining Multiple Retrospectively Motion Corrected FDK-Like Reconstructions

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

Minimizing spatial and time reservation with Collision-Aware DCF in mobile ad hoc networks

Analysis of Coexistence between IEEE , BLE and IEEE in the 2.4 GHz ISM Band

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

Tier-Based Underwater Acoustic Routing for Applications with Reliability and Delay Constraints

Gravitational Shift for Beginners

Simulation and Performance Evaluation of Network on Chip Architectures and Algorithms using CINSIM

COSC 6385 Computer Architecture. - Pipelining

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

Fifth Wheel Modelling and Testing

Lecture # 04. Image Enhancement in Spatial Domain

Quality Aware Privacy Protection for Location-based Services

POMDP: Introduction to Partially Observable Markov Decision Processes Hossein Kamalzadeh, Michael Hahsler

Modeling spatially-correlated data of sensor networks with irregular topologies

High performance CUDA based CNN image processor

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

Journal of Network and Computer Applications

Mobility Pattern Recognition in Mobile Ad-Hoc Networks

A Consistent, User Friendly Interface for Running a Variety of Underwater Acoustic Propagation Codes

DYNAMIC STORAGE ALLOCATION. Hanan Samet

EFFICIENT ENERGY BASED CONGESTION CONTROL SCHEME FOR MOBILE AD HOC NETWORKS

Transcription:

Minimizing Application-Level Delay of Multi-Path TCP in Wieless netwoks: A Receive-Centic Appoach Se-Yong Pak, Changhee Joo, Yongseok Pak, and Saewoong Bahk Depament of ECE and INMC, Seoul National Univesity, Koea Ulsan National Institute of Science and Technology, Koea Digital Media & Communication R&D Cente, Samsung Electonics, Koea Email: psy@netlab.snu.ac.k, cjoo@unist.ac.k, yongseok.pak@samsung.com, sbahk@snu.ac.k Abstact Multi-Path TCP (MPTCP) has attacted much attention as a pomising technology to impove thoughput pefomance of wieless devices that suppot multi-homed heteogeneous netwoks. Although MPTCP povides significant incease in netwok capacity, it may suffe fom poo delay pefomance since the delay tends to be aligned with the wost-pefoming path: packets deliveed though a shot-delay subflow have to wait in the eodeing buffe fo packets being tansmitted ove a long-delay subflow. In this pape, we investigate the applicationlevel delay pefomance of steaming taffic ove MPTCP, and develop an analytical famewok to take into account nonnegligible netwok queuing delay and the inteplay of congestion contol between multiple subflows. We design a simple thesholdbased subflow taffic allocation scheme that aims to minimize use-level delay and develop a eceive-centic taffic splitting contol (R-TSC) that can be tuned to use pefeences. The clientside R-TSC solution facilitates incemental deployment of lowdelay steaming sevice ove MPTCP. Though simulation and testbed expeiments using commecial LTE and WiFi netwoks, we demonstate significant pefomance gains ove the standad MPTCP potocol. I. INTRODUCTION Multi-Path TCP (MPTCP) is an emeging technology fo multi-homed wieless devices to exploit multiple communication paths in paallel. Many mobile smat devices aleady have multiple netwok intefaces such as Bluetooth, WiFi, and cellula (3G/LTE). MPTCP has attacted significant attention as a pomising tanspot-laye solution in smat devices to povide seamless handove and to exploit path divesity though oppotunistic tansmissions ove heteogeneous wieless netwoks. We conside a high-quality live steaming sevice scenaio whee MPTCP has been adopted fo eal-time applications in multi-homed wieless envionments. TCP has been widely used fo eal-time applications such as Skype, Facetime and online games o high quality video steaming sevice [5] [7] by establishing two-way communication channels in the pesence of netwok addess tanslation (NAT) device and fiewalls. By exploiting multiple paths, high-quality steaming sevices will be povided though MPTCP. A key pefomance metic of eal-time applications is delay. It has been epoted in [7] [1] that TCP often suffes fom lage delays in wieless netwoks due to excessively lage buffe installation at access points (APs) to compensate fo capacity fluctuation of wieless channels. In MPTCP, longdelay paths can aggavate the delay pefomance since packets aiving at the eceive though shot-delay paths may need to wait fo out-of-ode packets aiving though long-delay paths. A numbe of woks on MPTCP have mainly focused on thoughput pefomance and fainess between MPTCP subflows, and developed congestion contol schemes that ochestate subflows to coexist with conventional single-path TCPs [14] [17]. In [14] and [15], MPTCP congestion contol was studied and the Linked Inceases Algoithm (LIA) was poposed, which has been standadized by IETF [13]. In [16], an extension of TCP-Vegas fo MPTCP is consideed to exploit RTT vaiations as a congestion signal. In [17], the possibility that MPTCP-LIA huts the thoughput of othe competing connections has been noticed, and the authos popose a congestion contol scheme, named O-LIA, which aims to achieve paeto-optimal fainess. In othe woks, session management schemes fo MPTCP to suppot mobility have been poposed in [18], and MPTCP pefomance has been evaluated in eal wieless netwoks [19]. Recently, seveal woks have studied the delay aspect of MPTCP. In [], it has been shown that ound-obin packet allocation ove subflows can lead to lage delays at the eceive that impact packet eodeing. Least-RTT-Fist (LRF) allocation has been poposed as a solution to addess the poblem. Howeve, although LRF allocation emoves the long-tem delay mismatch between subflows, the inteplay between LRF allocation and MPTCP-LIA congestion contol can poduce lage delays [1]. In [], the authos investigated applicationlevel delay of MPTCP and developed subflow ate allocation at the sende to mitigate the poblem. The afoementioned woks do not take into account the inteplay with congestion contol that lead to complications and unexpected esults. The poposed solutions also equie contol at the sende (i.e., seve) side. In client-seve systems, it is had fo seve-centic solutions to accommodate divese

client pefeences [8] [3]. In this pape, we develop an analytical famewok to undestand subflow behavio of MPTCP-LIA unde timevaying RTT conditions, and design eceive-centic subflow ate allocation schemes that aim to minimize application-level delay of MPTCP. Ou main contibutions ae: We show that fixed RTT models fail to adequately captue TCP dynamics, and develop a time-vaying RTT model to account fo TCP tansmission ate contol. We fomulate an MPTCP delay minimization poblem and solve it though subflow ate allocation. We design a simple theshold-based solution that takes into consideation both time-vaying RTTs and the inteplay between subflows. We extend ou solution to the eceive-side algoithm. We develop subflow pefomance estimation method at the eceive, and modulate the subflow ates by exploiting thee duplicate acknowledgments (3-dup-ACKs). Though simulation and testbed expeiments in commecial LTE and WiFi netwoks, we evaluate ou model and demonstate significant pefomance gains of ou eceive-centic appoach. The est of this pape is oganized as follows. Section II descibes ou system model fo application-level delay of MPTCP. In Section III, we develop an analytical famewok fo MPTCP delay dynamics accounting fo the time-vaying RTTs. In Section IV, we fomulate an application-level delay minimization poblem and develop a theshold-based seve-centic MPTCP solution. We extend it to a eceivecentic solution that does not equie any modification at the seve. We veify ou poposed scheme though expeimental measuements and simulations in Section VI. II. SYSTEM MODEL We conside an MPTCP connection with a set R of subflows with non-zeo ate that delive taffic geneated fom a multimedia steaming application. The packet aival fom the application can be modeled as a stochastic pocess with an abitaily distibution with mean ate f. We assume that each subflow has a fixed two-way path, and each path has a single bottleneck link ove the fowad data path and no bottleneck link ove the backwad path. At the bottleneck link, the capacity shae of subflow is denoted by c and we assume dop-tail queueing. We denote the netwok delay, i.e., ound-tip time (RTT), of subflow at time t as T R (t), which equals the sum of a time-constant component T p and a time-vaying component T q (t). The fome accounts fo any fixed delays including signal popagation, packet pocessing, and signal tansmission, and coesponds to the minimum ound-tip time. The latte may be dominated by the queueing delay T q (t) at the bottleneck queue. Thus, the RTT delay of subflow is given as T R (t) =T p + T q (t). Let x (t) denote the tansmission ate of subflow. Since MPTCP adopts the sliding window technique fo the congestion contol, we have x (t) = w(t) T R(t), whee w (t) denotes the sende Send buffe queuing delay 7 6 5 Netwok delay 1 eceive 3 4 Reodeing delay Fig. 1. Application-level end-to-end delay of MPTCP is modeled as thee components; send buffe queueing delay, netwok delay, and eodeing buffe delay at the eceive. congestion window size at time t. Each subflow changes its own congestion window w (t) in an Additive Incease and Multiplicative Decease (AIMD) manne fo compatibility with conventional single-path TCP flows [4]. AIMD has been shown to be stable opeation unde a vaiety of netwok envionments [6], and we assume that the subflow tansmission ate conveges to an equilibium point, denoted by ˆx. Let ˆx denote its vecto. We define application-level delay of MPTCP as the delay fom the time when a packet is injected into MPTCP, to the time when the packet is deliveed to the pee application at the eceive. The netwok delay is only a component of the application-level delay. We classify delay into thee subcomponents as illustated in Fig. 1. Sende buffe queuing delay: When an MPTCP packet is injected by the application at the sende, it fist entes the send buffe and waits fo a tansmission oppotunity ove a subflow. We assume that thee is some andomness in the netwok and the inte-sevice time of the send buffe follows a andom pocess with the exponential distibution of mean ate R ˆx. Since packet aivals to the send buffe have an abitay distibution with mean ate f, we model the send buffe queueing delay as a G/M/1 queueing system. Let T s denote the expected send buffe queueing delay. Fom Kingman s fomula [4], we obtain T s (ˆx) f/ R ˆx R ˆx f ( 1+Ca ), (1) whee C a denotes the atio of the standad deviation of the inte-aival time to the mean. Thus, the send buffe queueing delay is a function of the sum of subflow ates and deceases with the sum ate. Netwok delay: Since packets in subflow follow a fixed path, we model the sliding window mechanism of the subflow as a closed-loop queueing system in a vitual cicuit netwok [], [3]. In paticula, fom ou assumption of netwok andomness, we model it as an M/M/1 closed-loop system with mean ˆx 1. 1 We assume that the distibution of inte-aiving time and sevice follow the exponential distibution. The empiical eason is descibed in Section VI-A

Let T n denote the expected netwok delay ove the fowad path, i.e., T n = T R T p. Fom ˆx = w and T R the closed-loop esult T R, we obtain = w+c T p c T n (ˆx )= c T p T p c ˆx. () Thus, the netwok delay of subflow deceases as its tansmission ate ˆx deceases. Reodeing buffe delay: Packets fom diffeent subflows ae likely to expeience diffeent netwok delays and they may aive out of ode at the eceive. As shown in Fig. 1, some packets (e.g., packets, 3, 4 in Fig. 1) have to wait in the eodeing buffe at the eceive until next-expected packets (e.g., packet 1 in Fig. 1) aive. Let T o denote the expected eodeing buffe delay fo packets of subflow at the eceive. Since it comes fom the mismatch in netwok delay between subflows, we can expess its uppe bound by thei maximum diffeence T o (ˆx) max T i n (ˆx i ) T n (ˆx ). (3) i R Let T (ˆx) denote the application-level delay of MPTCP given the subflow tansmission ate ˆx. In ou model, fom (1), (), and (3), we obtain the delay bound T (ˆx) max (T s (ˆx)+T n (ˆx )+T o (ˆx)), R T s (ˆx)+max T n (4) (ˆx ). R Note that the send buffe delay is a monotonically deceasing function of the subflow ate sum, and the netwok delay of each subflow is an inceasing function of its subflow ate. Hence, thee is a tade-off elationship between the send buffe delay and the netwok delays. In this wok, we aim to minimize the application-level delay bound (4) though subflow ate allocation of MPTCP. To do so, we develop a pactical theshold-based solution that contols each subflow ate, complying with MPTCP congestion contol. Futhemoe, we ae inteested in a eceive- o client-side solution to facilitate incemental deployment by end uses. III. UNDERSTANDING TCP DELAY DYNAMICS Application-level delay pefomance of MPTCP depends on the subflow tansmission ate ˆx as shown in (4), whee each subflow ate is unde contol though the congestion window size. Thee ae seveal diffeent MPTCP congestion contolles [14] [17]. Among those, Linked Incease Algoithm (MPTCP-LIA), viewed as a de facto standad [13], is known to achieve high thoughput and fai coexistence with conventional single-path TCP flows [14], [15]. In this section, we investigate the pefomance of MPTCP-LIA with espect to subflow congestion windows. A. Inapplicability of simple fixed-rtt model In analysis of TCP pefomance, it is often assumed that RTT is fixed and the netwok queueing delay is negligible (i.e., T R (t) =T p ) [15]. Unde the fixed RTT assumption, the netwok delay T n and the maximum eodeing delay become constant, and the application-level delay is detemined by the send buffe delay. Hence, fom (4), the optimal solution is to set each subflow ate to its maximum (i.e., ˆx = c ). MPTCP always achieves optimal delay pefomance since the pe-subflow AIMD contolle consumes available bandwidth in a geedy manne. Howeve, ou expeimental esults in eal wieless netwoks demonstate that geedy subflow ate allocation is not delayoptimal, and application-level delay pefomance significantly depends on the tansmission ate of each individual subflow. We establish an MPTCP connection with two subflows and 1, which ae connected though LTE and WiFi netwoks, espectively. The application geneates VBR taffic with mean ate 35 Mbps, and the minimum RTT values of each subflow ae T p = 3 ms and T p 1 = 15 ms, espectively. We measue application-level packet delay unde diffeent values of bottleneck capacity. In the fist expeiment, we vay the LTE link capacity c between 4 and 6 Mbps, and fix the WiFi link capacity c 1 to 15 Mbps. Fom measuements of application-level packet delay, we found that the delay distibutions fo diffeent LTE link capacities emain simila. In the second expeiment, we fix the LTE link capacity c to 4 Mbps and vay the WiFi link capacity c 1 between 8 Mbps to 3 Mbps. In contast to the fist expeiment, we obseve significant pefomance diffeence as a function of WiFi link capacity. Inceasing the ate of smallate subflow is much moe effective in delay pefomance impovement than inceasing the ate of lage-ate subflow. Fom these esults we conclude that the fixed-rtt model is not suitable fo undestanding the MPTCP delay pefomance. This motivates us to investigate the delay dynamics of MPTCP and develop a model that takes into account the impact of time-vaying RTT. B. Single-flow model with time-vaying RTT We fist investigate the RTT dynamics of a single subflow and its effect on the tansmission ate. Subsequently, we extend the esult to multiple subflow scenaios. We build a simple RTT model as a function of the (subflow) window size, whee we intoduce two phases to estimate tansmission ate unde the time-vaying RTT. Let us assume that the bottleneck capacity c and the minimum RTT T p of subflow ae known. Let W BDP denote the minimum bandwidth-delay poduct (BDP) of subflow, i.e., W BDP = c T p. Let W denote the maximum numbe of in-flight packets allowed fo subflow, which equals the sum of W BDP and the maximum available buffe space fo subflow along the path. Pactically, it can be egaded as the (aveage) window size w (t) when a packet loss occus. Note that W BDP can be consideed as the window size when the bottleneck link queue stats to build up. When the congestion window inceases beyond (i.e., when w (t) > W BDP ), packets of subflow will expeience BDP w(t) W queueing delay that equals c. Hence, we can wite

RTT (ms) RTT (ms) 5 15 1 5 c = Mbps in WiFi c = 3.1 Mbps in WiFi 5 1 15 window (Kbyte) 4 3 1 (a) WiFi netwok c = 75Mbps in LTE c = 55Mbps in LTE 5 1 15 window (Kbyte) (b) LTE netwok Fig.. Expeimental measuements of the RTT and the window size of TCP in commecial WiFi and LTE netwoks. the RTT delay as T R (t) =T p BDP max [,w(t) W ] + c = max [T p, w(t) c ]. (5) We veify ou simple RTT model (5) though expeiments in WiFi & LTE netwoks. We measue the RTT of each packet (using the TCP timestamp option) and the window size w (t) when its coesponding ACK is eceived at the sende. We conduct the expeiments with diffeent wieless bottleneck link capacities. In all the cases, the last-hop wieless link is set as the bottleneck, i.e. wied links always have lage capacities than 1 Mbps and the wieless link capacity of smalle than 1 Mbps is contolled by using backgound taffic though othe devices. Fig. shows ou measuement esults, whee the netwok delay estimated based on ou model (5) is epesented as a solid line fo each bottleneck capacity c. We obseve that T p ms fo the WiFi netwok and T p 4 ms fo the LTE netwok. The measuement esults match well with ou analysis and show a linea elationship between the RTT and the congestion window whee the slope is detemined by c. In the LTE netwok, we obseve the impact of the minimum delay T p when the window size is small. Using (5) we estimate the tansmission ate ˆx of subflow. Let us conside typical cycles of the window evolution of TCP between packet losses as shown in Fig. 3. Since Eq. (5) is piecewise linea, we divide each cycle into two phases: the constant RTT phase fo T R (t) =T p (o fo w (t) W BDP ), and the linea RTT phase fo T R (t) = w(t) c (o fo w (t) > W BDP ). Constant RTT phase: The RTT is constant and the window size inflates at a fixed ate of one segment pe T p. In Fig. 3, it is the dak gay shaded aea stating fom t 1. Linea RTT phase: The incease of the window size deceleates, because, as RTT inceases, it takes longe fo the eceive to etun an ACK to the sende [19]. Fom (5) and the fact that the window inflates by one segment pe T R (t), the tace of the window size has a concave shape as shown in Fig. 3 which esults in a longe cycle peiod than in the fixed-rtt model. When the window size eaches W, a packet will be dopped and the window size is educed to βw by the AIMD algoithm. In Fig. 3, the linea RTT phase happens between t and t 3. Constant RTT phase Linea RTT Phase Fig. 3. Window evolution cycle of a single subflow. Let τ,1 and τ, denote the duation of the constant RTT phase and the linea RTT phase, espectively, which can be calculated fom (5) by summing the RTTs duing each phase: τ,1 = W BDP T p w=βw = T p (W BDP βw ), τ, = W w=w BDP w c. Let ˆx,1 and ˆx, denote the expected tansmission ate duing τ,1 and τ,, espectively. The aveage tansmission ate ˆx can be witten as thei weighted sum, i.e., τ,1 τ, ˆx = ˆx,1 + ˆx,, (7) τ,1 + τ, τ,1 + τ, whee ˆx,1 = ˆx, = W BDP w w=βw τ,1 = W BDP + βw +1, T p W w w=w BDP w = = c. τ, w/c Fom βw W BDP W and the default β of TCP Reno (β = 1 ), the aveage tansmission ate of a single subflow can be calculated as ˆx = 1 (W BDP /W 3/4 c )+(W BDP (6) (8) /W ). (9) We veify (9) though simulations with a single-path TCP (TCP-Reno) in a satuated netwok (i.e., the sende always has data packets to send). We measue the aveage tansmission ate of the TCP flow by vaying the bottleneck link

capacity and the amount of the buffe space. The esults ae shown in Fig. 4 whee the x-axis epesents the (nomalized) buffe space: thee is less buffe space as W BDP /W inceases, and zeo buffe space when W BDP /W =1. We obseve that i) ou analysis (9) is accuate and povides good estimation unde diffeent netwok conditions, ii) when thee is a sufficient amount of the buffe space, packet loss hadly occus and the tansmission ate is bounded by the link capacity c, iii) the setting aound W BDP /W =.5 would be sufficient to fully exploit the link capacity, and iv) peviously simplified models fo TCP tansmission ate unde the fixed RTT assumption [1] [3] do not take into consideation the time-vaying queueing delay and povide inaccuate estimation (dash line) when a lage amount of the buffe space is available. Aveage tansmission ate (Mbps) 6 5 4 3 1.3.4.5.6.7.8.9 1. W BDP / W c = 4 Mbps (Analysis) c = 4 Mbps (simulation) c = 4 Mbps (Fixed RTT model) c = Mbps (Analysis) c = Mbps (simulation) c = Mbps (Fixed RTT model) c = 15 Mbps (Analysis) c = 15 Mbps (simulation) c = 15 Mbps (Fixed RTT model) c = 1 Mbps (Analysis) c = 1 Mbps (simulation) c = 1 Mbps (Fixed RTT model) Fig. 4. Aveage tansmission ate of single TCP ove vaious link capacities. IV. MINIMIZING APPLICATION-LEVEL DELAY OF MPTCP Fom Eqs. (1), (), and (5), the application-level delay of MPTCP is a function of subflow tansmission ates, and the poblem can be ewitten as minimize f/ ( ) } ˆx 1+Ca ct ˆx f +max{ p c ˆx T p subject to ˆx f ˆx c fo all R. (1) Note that the fist tem of the objective function depends on the sum of subflow ates (instead of individual subflow ates). Povided that the sum ate is fixed, the second tem can be minimized when all the ( subflows with non-zeo ˆx have the ct same netwok delay p c ˆx T p ). Futhe, the function is a convex function of ˆx and thus its solution can be easily obtained, e.g., by an iteative solution such as the gadient descent method. Howeve, although the optimal subflow ate allocation, denoted by ˆx, is found, thee emains the difficulty in contolling the ate of each subflow. In pactice, we cannot simply fix the tansmission ate of subflow to ˆx due to othe advantages of TCP congestion contol such as adaptability to system dynamics, stability unde a wide ange of netwok envionments, fainess, etc [5]. Futhe, the MPTCP fainess citeia and the coupled contol between subflows make the poblem moe challenging. To this end, we keep the window-based congestion contol of MPTCP-LIA with the AIMD algoithm, and intoduce the pe-subflow theshold such that the sende shinks the subflow window w (t) by β when w (t) gows ove the theshold. This contols the aveage tansmission ate aound the taget value ˆx by esticting the numbe of in-flight packets in each subflow. The appoach makes ou solution attactive as a pactical one to impove the steaming delay in MPTCP, because it can be easily implemented without losing the othe advantages of TCP congestion contol. Let W denote the window theshold of subflow. Then we need W to find the theshold vecto W = { } such that ˆx( W )=ˆx. It is not staightfowad since the subflow window contol of MPTCP-LIA ae coupled with each othe. In the following, we investigate the MPTCP-LIA congestion contol in detail and develop a pactical appoximate solution. Let w tot (t) := R w (t). MPTCP-LIA contols the window w (t) of subflow as follows. On eceipt of a valid ACK, subflow { inceases } the congestion window w (t) by min, whee a w, 1 tot(t) w (t) a(t) :=w tot (t) max i R (w i (t)/(t R i (t)) ) ( i R w i(t)/t R i (t)). (11) On detection of a loss, subflow deceases the congestion window w (t) by a facto of β(= 1/). If thee is only one subflow, it is equivalent to the TCP-Reno congestion contol, and we can diectly use (9) to estimate ˆx fom W. We now assume that thee ae at least two subflows with non-zeo window size. Fom (11), it can be easily seen that a(t) w tot < 1 w (t) fo all R. Then the fluid model [] of MPTCP-LIA window contol can be witten as ẇ (t) = w(t) T (t) ( a(t) w tot(t) (1 q (t)) w(t) q (t) ), (1) whee q (t) denotes the packet loss pobability of subflow, and the tems descibe the incease and decease ates of the window size. Let α (t) denote the amount of window incement pe RTT povided no packet loss. Fom (11) and (1), we have α (t) = w (t)a(t) w tot (t) w (t) max i R = ( i R w i(t) (T R i (t)) ). (13) w i(t) Ti R(t) Let b denote the maximum ate divided by RTT, i.e., b := w agmax i(t) i R. We assume that b emains unchanged (Ti R fo a long time, (t)) which commonly occus when a subflow In [15], each subflow of MPTCP should incease its window size no faste than the single-path TCP would, and should decease as quickly as the singlehop TCP. As a esult, each subflow of MPTCP should not get moe than the tansmission ate of a single-path TCP.

dominates the othe in both capacity and delay. Unde this assumption, we can ewite (13) as α (t) = w (t) xb(t) T R b (t) ( i R x i(t) ). (14) This implies that when subflow i ( b) educes its tansmission ate (e.g., due to tempoal wieless fading), subflow ( i) will have a lage α (t), incease its tansmission ate moe quickly, and compensate the pefomance loss of subflow i. In ode to undestand the delay dynamics of MPTCP subflows, we have to take into account time-vaying RTT and divide the peiod into two phases (pe subflow) as in Section III-B. Due to the coupling of MPTCP-LIA window contol acoss the subflows, the complete analysis needs to divide the time into R phases, whee denotes the cadinality of the set. Thus, the complete analysis will esult in high computational complexity, which makes ou solution hadly scalable fo many subflows and causes significant enegy consumption that needs to be avoided in mobile devices. We cicumvent the difficulty by iteatively calculating the window evolution of subflow unde the assumption that the tansmission ates of othe subflows ae fixed. Specifically, in (14), x i (t) is eplaced with ˆx i fo i, and Tb R (t) with its long-tem aveage ˆT b R. In the following, we estimate the tansmission ate of subflow fo two disjoint cases: b and = b. When b: we can ewite (14) as α (t) = ( w (t) ˆx b ˆT R b ) / ( w(t) T R(t) +, i R i) ˆx whee R := R\{}. We conside a typical cycle of the window evolution of subflow and divide it into two phases as befoe. Let τ,1 and τ, denote the duation of each phase, espectively. Let α,1 (t) and α, (t) denote the window incement ate in each phase, espectively. Since T R (t) =T p in the constant RTT phase and w(t) T R(t) = c in the linea RTT phase, we obtain w (t) ˆx b ˆT R b w (t) ˆx b ˆT R b α,1 (t) =, α w (t) ( T p + i R ˆx i),(t) =, (c + i R ˆx i) (15) In the constant RTT phase, we have α,1 (t) < w (t) w b (t)/(t R b (t)) w (t) w (t)/(t 1 fom (14) and the definition R of b, which implies (t)) that the window size inceases moe slowly than in TCP-Reno. On the othe hand, in the linea RTT phase, the window inflates at ate α, (t)/t R (t), which is constant since both α, (t) and T R (t) ae a linea function of w (t) fom (5) and (15). When = b: we have Tb R(t) =T p b in the constant RTT phase and w b(t) T b (t) = c b in the linea RTT phase. Unde the assumption of the constant ates of the othe subflows, we obtain the window incement ate of subflow b at each phase as α b,1 (t) = w b (t) (T p b ) ( w b (t) T p + i R b b ˆx i), α b,(t) = c b, (c b + i R ˆx i) b (16) espectively. The above esult implies that the window of subflow b evolves following a concave function in the linea RTT phase. Since subflow b often achieves the best thoughput, MPTCP-LIA let the best-pefoming subflow stay at a highe tansmission ate, i.e., in the linea RTT phase, fo a longe time. Fom (6), (15) and (16), we calculate the aveage tansmission ate of subflow in each phase as ˆx,1 = ˆx, = espectively, whee τ,1 = t t 1 t t w (t)dt 1 τ,1 t 3 t w (t)dt τ, = W + = W BDP + T p α(t) dw (t), τ, = t 3 t t t α,1(t)dt 1 τ,1, (17) t 3 t α,(t)dt τ,, (18) w (t)/c α(t) dw (t). Suppose that the MPTCP sende has knowledge of W BDP = {W BDP }, c = {c } and ˆT R = { ˆT R } and obtains an optimal solution ˆx to (1) using a numeical method. Fom (17) and (18), the sende finds W such that ˆx(W = W, W BDP, c, ˆT R ) that is sufficiently close to ˆx. Then, MPTCP-LIA achieves ˆx by halving w (t) wheneve w (t) becomes geate than W. Since the seach algoithm fo W has vaious types of implementation, we employ a simple exhaustive seach as follows: We fist limit the seach ange to [W BDP, W BDP ] fo each subflow, since the AIMD opeation suggests that a smalle window w (t) <W BDP leads to link undeutilization and a lage window w (t) > W BDP causes an additional delay without inceasing the thoughput. In this ange, we sequentially seach until ˆx ˆx ( W, W BDP, c, ˆT R ) <ɛ, (19) fo some small ɛ > and all R. Ou exhaustive seach finishes quickly when the numbe of subflows is small. Howeve, the seach space will exponentially incease with the numbe of subflows, and thus finding good W with low complexity emains as an inteesting open poblem. V. RECEIVER-CENTRIC TRAFFIC SPLITTING Ou theshold-based solution in Section IV equies the infomation on the application (i.e., f) and the subflow paths (i.e., {T p }, W BDP, c, ˆT R ), and contols the subflow ate ˆx to minimize the application-level delay of MPTCP. Since all the infomation can be easily obtained at the sende, one can implement the solution at the sende. Howeve, the sende is often agnostic to the use pefeences like quality of expeience and netwok pefeence. Futhe, it often takes long fo the innovation to be adopted at the sende due to sevice povide policies, computation load at the seve, etc [8] [3]. To acceleate the innovation without intevention fom sevice povides and facilitate deployment fom end uses, we develop a eceive-centic solution that does not equie any change at the sende.

R-TSC Subflow 1 Subflow Subflow n Wieless netwok 1 Application laye Subflow taffic contol Wieless netwok ACK /DupACK Wieless netwok n taffic ate estimation Optimal tansmission ate Subflow paamete estimation (t) Fig. 5. System stuctue fo eceive-centic taffic splitting contol (R-TSC). In developing the eceive-centic MPTCP solution that minimizes the application-level delay bound (4), the eceive should be able to collect necessay infomation on the application and the subflow paths, and contol the tansmission ate of each subflow. The eceive estimates the infomation fo each subflow path (T R (t),t p,c,w BDP,W ), by obseving incoming packets. The application infomation on steaming ate (f) is estimated fom the play-back buffe pogess. Fom these, we can calculate the optimal subflow tansmission ate ˆx as befoe. Howeve, since the eceive cannot diectly contol the congestion window size, we induce the window eduction by letting the eceive intentionally geneate 3-dup-ACKs, which will tigge a etansmission and halve the window size at the sende. The oveall system stuctue fo ou R-TSC is shown in Fig. 5, and the detailed pocedues to collect the infomation and the condition to invoke the 3-dup-ACKs ae given below: 1) Estimate the subflow path infomation: Fo each subflow, the eceive takes an adaptive appoach to estimating W BDP, the bottleneck link capacity c, and the aveage RTT ˆT R, since they may change acoss time accoding to the system dynamics. Fom the RTT measuements T R (t) with the TCP timestamp option, we choose the minimum RTT T p as the lowest RTT value eve obseved. Also, the eceive estimates the window size w (t) by counting the numbe of incoming packets duing an RTT peiod T R (t) and sets the tansmission ate ˆx (t) = w(t) T R (t). The subflow bandwidth c is set to w (t) T R(t) if T R (t) >T p. Then, we have W BDP = c T p, and obtain the maximum of in-flight packets W as the window size w (t) when a packet loss is detected. ) Estimate the application ate infomation: The eceive estimates the steaming ate f using the play-back buffe. To elaboate, letting d b (t) denote the amount of multimedia steaming data in the application play-back buffe at time t, the eceive can obtain the estimation ˆf on the application ate as net buffeed data pe unit time plus the sum of subflow ates as ˆf = d b(t) d b (t Δt) Δt + R ˆx (t), () whee Δt is the inteval between play-back buffe measuements. 3) Find an optimal solution ˆx : Since the eceive has all the necessay infomation about the application and the subflow paths, it can find an optimal solution ˆx to (1) using the numeical method, e.g., the gadient method. 4) Contol subflow ates by geneating intentional 3-dup- ACKs: To achieve the taget subflow ate allocation ˆx, the eceive effectively sets the window theshold W of subflow. Note that W has been used in ou sende-centic solution to halve the window size w (t) when it becomes geate than W. The eceive induces the window eduction by intentionally geneating 3-dup-ACKs when the estimated w (t) is geate than W. To highlight the eceive-side opeation, we Dup denote the theshold by W. As befoe, it is obtained fom the exhaustive seach with (19) at the eceive. The sende eacts to this (fake) loss event by halving its window size, and as a esult, we achieve the subflow taffic allocation ˆx that minimizes the application-level delay (4). The oveall algoithm at the eceive is descibed in Algoithm 1. VI. PERFORMANCE EVALUATION In this section, we veify ou TCP queuing delay model in moden wieless netwok though the expeiment in LTE netwoks, and the tansmission ate model of MPTCP-LIA using ns-3 [11]. Afte all we evaluate the pefomance of ou solutions though testbed expeiments using Andoid devices. A. TCP queueing model veification In eal wieless netwok, the busty tansmission and eception of TCP beaks the assumption of Sec.II. (i.e. Exponential distibution of aival & sevice inteval). Accoding to measuement esult, thee ae seveal easons fo packet bust. In LTE netwoks, MAC laye opeation (o channel state) of wieless netwoks often tansmits 7 o 8 packets at a time. We conjectue that this is due to channel-awae scheduling and packet aggegation policy of LTE povide. In WiFi netwok, simila taffic pattens ae obseved due to A- MPDU and block ACK. Fig. 6-(a) shows the distibutions of inte-aival time at the eceive, and about 8% packets aive with inteval(i.e., aive as a bust). The inte-aival time at the eceive also affects the ACK inte-aival time at the seve shows as shown in Fig. 1-(b), since the eceive geneates an ACK upon a packet eception. A notable diffeence is the mean ate, which is halved because the delayed ACK option causes one ACK to be geneated pe eceived packets. The delayed ACK, which is commonly used in most smat phones, makes the taffic bustie since an ACK leads to two o moe packet tansmissions. Fig. 6-(c) shows the distibution

Algoithm 1 Algoithm of R-TSC On eceiving a packet in subflow : 1: subseqnum : subflow seq. no. of the eceived packet : nextseqnum : next expected subflow seq. no. 3: if subseqnum = nextseqnum then 4: Update vaiables T R and T p 5: EstimateSubflowPath() /* see below */ 6: Calculate ˆx by solving (1) 7: Seach fo W Dup (= W ) that satisfies (19) Dup 8: if w W then 9: nextseqnum nextseqnum +1 Dup W, duplicate ACKs will be gene- 1: end if 11: /* when w > ated by not inceasing nextseqnumbe */ 1: end if 13: Tansmit an ACK with nextseqnum 14: Put the packet in the eodeing buffe EstimateSubflowPath() : 1: δt : fixed time duation fo updating w, x and c. : t now : cuent time, t last : time of the last update. 3: d b (t) : data amount in the play-back buffe at time t. 4: if t now t last δt then T 5: w α w +(1 α) pkts R 6: ˆx β ˆx +(1 β) pkts (t now t last ) 7: ˆf d b (t now) d b (t last ) (t now t last ) + R ˆx 8: if T R T p then 9: c ˆx 1: end if 11: pkts 1: t last t now 13: else 14: pkts pkts +1 15: end if (t now t last ) of packet inte-depatue time at the seve. About a half of packets ae tansmitted with inteval at seve, and it shows lage vaiance compaed to exponential distibution at same tansmission ate. TABLE I THE RATIO OF VARIANCE AND THE SQUARE OF AVERAGE c a in eceive Measuement 15.6 MA - samples 8.91 MA - 4 samples 3.877 MA - 1 samples 1.18 c s in seve Measuement 33.984 MA - 3 samples 1.49 MA - 7 samples 5.147 MA - 37 samples 1.194 Nonetheless, we found that the inteval distibutions can be appoximated to an the exponential distibution by taking seveal packets as a unit. We evaluate the similaity of inteval distibutions of TCP with exponential distibution though the atio of vaiance and the squae of aveage (c a,c s and in M/M/1 queueing model, c a = c s =1). As shown in Table. I, as the numbe of sample of unweighted moving aveage (MA) gets lage, the inte-depatue time at the seve and the inte-aival time at the eceive get close to the exponential distibution. Also in Fig. 6 the distibutions of MA shows the little diffeence fom the exponential distibution of the same mean ate, when the numbes of MA samples ae 1 and 37, espectively. It is quite small compaed to BDP and tansmission window size. B. Tansmission ate model evaluation We evaluate ou model by compaing the numeical esults with simulation esults. We conside a multi-homed mobile scenaio, whee an MPTCP connection is established though LTE and WiFi netwoks. Subflow is established ove LTE netwok with wieless capacity c and subflow 1 ove WiFi netwok with capacity c 1. In both netwoks, the last-hop wieless link is the bottleneck and the popagation delay is T p = T p 1 =5ms. To focus on the impact of MPTCP-LIA congestion contol, we maintain the send buffe always full such that the tansmission ate of each subflow is detemined by its congestion window. Fo diffeent bottleneck link ates, we measue the tansmission ate of each subflow vaying the maximum numbe of in-flight packets W, which is done by changing the buffe size at the bottleneck link. The simulation esults will be compaed with the numeical esults fom (19). Fig. 7 shows the esults with thee diffeent netwok settings. In the fist simulation with (c,c 1 ) = (, 5) Mbps (which esults in (W BDP,W1 BDP ) = (17, 3) Kbytes) and W1 = 41 Kbytes, we change W. Fig. 7(a) shows that the simulation esults and the analytical esults ae well matched. As we incease W, the ate ˆx of subflow inceases linealy up to W = W BDP and satuates the capacity c when W W BDP. Note that when W W BDP, the numeical esults ae a bit highe than the simulation esults, which is due to the ovehead that is not taken into account in the analysis (i.e., heade length, ACK, etc). In the second simulation, we fix W to 14 Kbytes and vay W1 fom 1 to 1 Kbytes. Fig. 7(b) shows the simila esults. The esults of the two simulations demonstate that ou analysis esults ae also well matched with the simulation esults unde the ate changes of eithe the lage-ate subflow o the small-ate subflow. The high accuacy in the subflow tansmission ate estimation is the advantage of ou model ove the fixed-rtt model. In the thid simulation, we conside the scenaios when the capacity of one subflow changes. We set W = 73 Kbytes, W1 =68Kbytes, and c =4Mbps while vaying c 1 fom 1 to 4 Mbps. Fig. 7(c) shows the esults. Again, unlike the fixed-rtt model in [14], [15], ou model povides accuate estimation on the tansmission ates fo diffeent ates of the small-ate subflow. C. Evaluation though testbed expeiments Fo expeimental evaluation, we developed a testbed with an MPTCP seve connected to an MPTCP client. The MPTCP

1 1 Expeiments (56.4Mbps) MA-1 Exp. Dst (56.4Mbps) 4. 6 4.1..3.4 6 4..5 Expeiments (56.4Mbps) MA- 7 Exp. Dst (56.4Mbps) 8 Pecentiles 6 8 Pecentiles Pecentiles 8 1 Expeiments (56.4Mbps) MA-1 Exp. Dst (8.Mbps).1 Inte-aival time (s)..3.4.5..1. Inte-aival time (s) (a) Distibution of packet inte-aival time at the eceive.3.4.5 Inte-depatue time (s) (b) Distibution of ACK inte-aival time at the seve (c) Distibution of packet inte-depatue time at the seve 5 x(simlulation) x1(simlulation) x(analysis) x1(analysis) 15 1 5 1 W 3 4 x1(simlulation) x(analysis) x1(analysis) 15 1 5 4 (kbyte) (a) c = Mbps, c1 = 5Mbps, W1 = 41. Kbyte x (simulation) x(simlulation) $YHUDJH Wansmission ate (Mbps) 5 Aveage tansmission ate (Mbps) Aveage tansmission ate (Mbps) Fig. 6. The capacity of LTE netwok is about 56.4 Mbps and minimum RTT is about 5ms. The BDP of the path is about 44.8 packets and the tansmission window is ove 1 packets. 6 (kbyte) W1 8 1 (b) c = Mbps, c1 = 5Mbps, W = 14. Kbyte x1 (simulation) 4 x (analysis) x1 (analysis) 3 1 1 c1 (Mbps) 3 4 (c) W = 73.43 Kbyte, W1 = 68.35 Kbyte, c = 4 Mbps Fig. 7. Compaison of simulation and numeical esults: subflow tansmission ates x = (x, x1 ) and the maximum numbe of in-flight packets W. Results pedicted by analysis (19) ae well matched with the simulation esults. seve uns Ubuntu 14.4 in a desktop compute with MPTCP implemented in the kenel [1]. The seve has two Ethenet intefaces, each of which is connected to the client though LTE and WiFi netwoks, espectively. We use commecial LTE netwoks (SK telecom, South Koea), and a self-configued WiFi netwok with a home access point (Cisco Ai-SAP16I). We sepaate thei outing paths such that thee is no ovelap. The client is an Andoid smat phone (Nexus 5) that uns Andoid 4.4. with MPTCP kenel implementation [1]. Between the seve and the client, we establish an MPTCP connection with two subflows, whee subflow passes though the LTE netwok and subflow 1 though the WiFi netwok. Fo the WiFi netwok, we contol the link capacity c1 in [6, 35] Mbps by fixing the modulation and coding ate (MCR). Fo the LTE netwok, since we cannot diectly contol the bottleneck link capacity c, we geneate the backgound taffic using othe devices and set c in [15, 1] Mbps. The minimum RTT is Tp 3 ms fo the LTE netwok, and T1p 15 ms fo the WiFi netwok. Unde the contolled bottleneck link capacity, we keep the send buffe always full and measue the maximum numbe of in-flight packets (W ) and the maximum RTT (max TR ). We show the esults in Table II. In ou expeiments, wieless loss aely occus and most packet losses ae caused by buffe oveflow at the wieless link. In both LTE and WiFi netwoks, TABLE II N ETWORK PERFORMANCE UNDER CONTROLLED BOTTLENECK LINK CAPACITY c 71.3 Mbps 55.1 Mbps 7.9 Mbps 15.1 Mbps LTE netwoks WBDP W 73.8 Kbyte 7571.4 Kbyte 11.6 Kbyte 6983.6 Kbyte 17.1 Kbyte 7477.1 Kbyte 57.9 Kbyte 6861.9 Kbyte max TR 89.7 ms 991.5 ms 9. ms 3545.1 ms c1 33.5 Mbps 1.99 Mbps 15.76 Mbps 4.7 Mbps WiFi netwoks W1BDP W1 64.3 Kbyte 719.6 Kbyte 4. Kbyte 716.4 Kbyte 3.6 Kbyte 7.3 Kbyte 9.8 Kbyte 78.4 Kbyte max T1R 6. ms 51.1 ms 335.7 ms 1167.8 ms we obseve that W is much geate than WBDP, and max TR significantly exceeds the minimum RTT TP. This implies that both netwoks have a lage amount of buffe space to accommodate the dynamics of wieless systems, which often causes the well-known buffebloat poblem [7], [8]. We now set f = 35.85 Mbps, and evaluate the delay pefomance of ou solution in compaison with MPTCP-LIA. We use the same setting with (c, c1 ) = (4, 1) Mbps. We test both MPTCP-LIA schedules with a conventional eceive (CR) and ou R-TSC.

window (Kbyte) Tansmission ate (Mbps) Delay (ms) 9 6 3 5 1 15 9 6 3 5 1 15 6 4 w (LTE) w 1 (WiFi) 5 1 15 Time (s) Total tansmission ate Send buffe queuing delay Netwok + eodeing delay Application level delay (a) MPTCP-LIA with a conventional eceive window (Kbyte) Tansmission ate (Mbps) Delay (ms) 9 6 3 5 1 15 9 6 3 5 1 15 6 Send buffe queuing delay Netwok + eodeing delay 4 Application level delay w (LTE) w 1 (WiFi) Total tansmission ate 5 1 15 Time (s) (b) MPTCP-LIA with R-TSC eceive window (Kbyte) Tansmission ate (Mbps) Delay (ms) 9 6 3 5 1 15 9 6 3 w (LTE) w 1 (WiFi) Total tansmission ate 5 1 15 4 Send buffe queuing delay Netwok + eodeing delay 3 Application level delay 1 5 1 15 Time (s) (c) MPTCP-LIA (LRF) with a conventional eceive Fig. 8. Pefomance compaison of MPTCP-LIA with a conventional eceive (CR) and with ou solution, in tems of congestion window, total tansmission ate, and delay pefomance, whee the bottleneck link capacities (c,c 1 )=(4, 1) Mbps and the application ate f =35.85 Mbps. Cumulative distibution (%) 1 8 6 4 4 6 8 1 1 14 16 Application level delay (ms) MPTCP-LIA (LRF) MPTCP-LIA (RR) MPTCP-LIA with R-TSC (a) (c,c 1 )=(4, 15) Mbps, f =35.85 Mbps Cumulative distibution (%) 1 8 6 4 4 6 8 1 1 14 16 Application level delay (ms) MPTCP-LIA (LRF) MPTCP-LIA (RR) MPTCP-LIA with R-TSC (b) (c,c 1 ) = (4, 1) Mbps, f = 35.85 Mbps Cumulative distibution (%) 1 8 6 4 4 6 8 1 1 14 16 Application level delay (ms) MPTCP-LIA (LRF) MPTCP-LIA (RR) MPTCP-LIA with R-TSC (c) (c,c 1 )=(4, 8) Mbps, f =35.85 Mbps Fig. 9. Cumulative distibution of application-level delays of MPTCP-LIA with and without R-TSC. Fig. 8 shows the window evolution, the total tansmission ate, and the delay of the two subflows. When MPTCP-LIA distibutes packets ove the subflows in a ound-obin manne, the small-ate subflow often suffes fom lage queuing delay, which leads to poo application-level delay pefomance as shown in Fig. 8(a). In contast, when R-TSC is used at the eceive, the window sizes of the both subflows ae unde contol though 3-dup-ACKs to balance the tansmission ates, and the solution achieves low application-level delay as shown in Fig. 8(b). Thus fa, we assumed that MPTCP-LIA uses the basic ound obin schedule (RR) to distibute the packets in the send buffe. An altenative method to educe the applicationlevel delay at the sende is to use Least-RTT-Fist (LRF) schedule [], i.e., allocate packets fom the send buffe to the subflow with the minimum RTT fist. It has been shown that MPTCP-LIA with LRF is effective to educe the eodeing delay. Howeve, ou expeiments show that it often ceates lage delay duing the initial peiod. Fig. 8(c) shows the expeimental esults with MPTCP-LIA (LRF) with the conventional eceive (CR): subflow 1 (WiFi) initially inceases the window size faste than subflow (LTE) owing to its shot RTT. Once subflow has a lage window size (i.e., b =), it hindes window inflation of subflow 1 unde MPTCP-LIA congestion contol as in (15). It takes about 1 seconds fo subflow 1 to have a compaable window size. Since the small window size esults in low tansmission ate, it suffes fom lage delay (up to 4 seconds) that is incued at the send buffe. This is consistent with obsevations made in [1]. Ou R-TSC solution emoves such long initial delay when it is matched with MPTCP-LIA (LRF), in which case, the expeimental esults ae simila to Fig. 8(c). (They ae omitted due to space constaints.) To investigate application-level delay distibutions, we geneate taffic at ate f =35.85 Mbps, set the LTE link capacity to 4 Mbps, so that the application ate is smalle than the LTE link capacity. Unde diffeent WiFi link capacities, we measue packet delay duing the initial second peiod. Fig. 9 shows the cumulative distibution of packet delays. We obseve that MPTCP-LIA with R-TSC eceive achieves 3 3 When R-TSC is used at the eceive, both MPTCP-LIA (RR) and MPTCP- LIA (LRF) achieve simila delay pefomance. In the pape, we only show the delay distibution of MPTCP-LIA (RR) with R-TSC and omit that of MPTCP-LIA (LRF).

significantly bette delay pefomance than that with the conventional eceive. Fo MPTCP-LIA (RR), the capacity c 1 of the small-ate subflow impacts geatly on the delay pefomance. As c 1 deceases, application-level delay inceases due to the mismatch between the netwok delays. When LRF is employed, thee exists an initial peiod of long delay of 5 1 seconds which depends on the diffeence of the minimum RTT. VII. CONCLUSION In this pape, we aimed at minimizing the applicationlevel delay of MPTCP though pecise pe-subflow tansmission ate allocation. To this end, we analyzed the elationship between TCP tansmission ate and time-vaying queueing delay, and we investigated the impact of the pe subflow maximum window theshold on its ate allocation. The poblem is challenging due to the stong coupling between subflows in MPTCP-LIA congestion contol. We used appoximation methods to develop a sende-side subflow-ate allocation scheme to minimize application-level delay. We extended it to develop a eceive-side solution, named R-TSC, which facilitates incemental deployment without any suppot fom the sevice povides. By geneating intentional thee duplicate acknowledgments (3-dup-ACKs) as necessay, R- TSC leads to a split of the steaming taffic into subflows that significantly educes application-level delay. We evaluated ou client-side solution though simulation and testbed expeiments in commecial LTE and WiFi netwoks. The esults show that R-TSC significantly impoves the pefomance of MPTCP-LIA. ACKNOWLEDGMENT This wok was suppoted by the Samsung Electonics DMC eseach cente and IITP gant funded by the Koea govenment (MSIP) (No. B16-15-164, Reseach on Nea- Zeo Latency Netwok fo 5G Immesive Sevice). REFERENCES [1] J Padhye, V Fioiu, D Towsley, and J Kuose, Modeling TCP thoughput: a simple model and its empiical validation, ACM SIGCOMM Compute Communication Review, pp. 33-314, Oct. 1998. [] F. P. Kelly, A. Maulloo, and D. Tan, Rate contol in communication netwoks: shadow pices, popotional fainess and stability, Jounal of the Opeational Reseach Society, vol. 49, pp. 37-5, 1998. [3] S. Floyd and K. Fall, Pomoting the Use of End-to-End Congestion Contol in the Intenet, IEEE/ACM Tansactions on Netwoking, Aug. 1999. [4] S. Floyd, M. Handley, and J. Padhye, A Compaison of Equation- Based and AIMD Congestion Contol, ACIRI Technical Repot,[online] Available : http://www.acii.og/tfc/aimd.pdf, May. [5] Alexande Afanasyev, Neil Tilley, Pete Reihe, and Leonad Kleinock, Host-to-Host Congestion Contol fo TCP, IEEE Communications Suvey & Tutoials, vol. 1, no. 3, 1. [6] D. Chiu and R. Jain, Analysis of the Incease/Decease Algoithms fo Congestion Avoidance in Compute Netwoks, Jounal of Compute Netwoks and ISDN, vol. 17, no. 1, Jun. 1989. [7] H. Jiang, Yaogong Wang, Kyunghan Lee, and Injong Rhee, Tackling buffebloat in 3G/4G netwoks, in Poc. of IEEE IMC, Nov. 1. [8] Jim Gettys, and Kathleen Nichols, Buffebloat: dak buffes in the intenet, Magazine Communications of the ACM,Vol. 55, Issue 1, Jan. 1. [9] S. Alfedssonl, G. D. Giudice, J. Gacia1, A. Bunstom, L. D. Cicco, and S. Mascolo, Impact of TCP congestion contol on buffebloat in cellula netwoks, in IEEE WoWMoM, 13. [1] H. Im, C. Joo, T. Lee, and S. Bahk, Receive-side TCP Countemeasue to Buffebloat in Wieless Access Netwoks, submitted to IEEE Tansactions on Moblie Computing, 15. [11] NS-3 module fo MPTCP [online] Available : http://code.google.com/p/ mptcp-ns3/. [1] MPTCP - Linux Kenel implementation. [Online]. Available: http://mptcp.info.ucl.ac.be/. [13] C. Raiciu, M. Handley, and D. Wischik, Coupled congestion contol fo multipath tanspot potocols, IETF RFC 6356, Oct. 11. [14] D. Wischik, M. Handley and C. Raiciu, Contol of multipath TCP and optimization of multipath outing in the Intenet, in Poc. NetCOOP 9. [15] D. Wischik, C. Raiciu, A. Geenhalgh, and M. Handley, Design, implementation and evaluation of congestion contol fo multipath TCP, in Poc. of ACM NSDI, Jun. 11. [16] Y. Cao, M. Xu, and X. Fu, Delay-based Congestion Contol fo Multipath TCP, in IEEE ICNP 1. [17] R. Khalili, N. Gast, M. Popovic, and J. L. Boudec, MPTCP is not paeto-optimal: pefomance issues and a possible solution, IEEE/ACM Tansactions on Netwoking, vol. 1, no. 5, Oct. 13. [18] C. Paasch, G. Detal, F. Duchene, C. Raiciu, and O. Bonaventue, Exploing mobile/wifi handove with multipath TCP, in ACM SIGCOMM wokshop on Cellula netwoks: opeations, challenges, and futue design, 1. [19] Y. Chen, Y. Lim, R. J. Gibbens, E. M. Nahum, R. Khalili, and D. Towsley, A measuement-based study of MultiPath TCP pefomance ove wieless netwoks, in ACM IMC, 13. [] C. Paasch, S. Felin, ö. Alay and O. Bonaventue, Expeimental Evaluation of Multipath TCP Schedules, in ACM SIGCOMM Capacity Shaing Wokshop (CSWS 14), 14. [1] Y. Chen and D. Towsley, Buffebloat and Delay Analysis of Multipath TCP in Wieless netwoks, in IFIP Netwoking, 14. [] S. Pak, C. Joo, Y. Pak and S. Bahk, Impact of Taffic Splitting on the Delay Pefomance of MPTCP, in IEEE ICC, 14. [3] M. Schwatz, Telecommunication netwoks : potocols, modeling and analysis, Pentice Hall, Jan. 1987. [4] P. G. Haison, and N. M. Patel, Pefomance modeling of communication netwoks and compute achitectue, Addison-Wesley, 199. [5] X. Zhang, Y. Xu, H. Hu, Y. Liu, Z. Guo, and Y. Wang, Pofiling Skype Video Calls: Rate Contol and Video Quality, in IEEE INFOCOM, Ma. 1. [6] A. Finamoe, M. Mellia, M. M. Munaf o, R. Toes, and S. G. Rao, Youtube eveywhee: impact of device and infastuctue synegies on use expeience, in ACM IMC, 11. [7] Lee, S., Roh, H., Lee, H., Chung, K. Enhanced TFRC fo high quality video steaming ove high bandwidth delay poduct netwoks. Communications and Netwoks, Jounal of, 16(3), 344-354. [8] N. Sping, M. Chesie, M. Beyman, V. Sahasanaman, T. Andeson, and B. Beshad, Receive based management of low bandwidth access links, in Poc. IEEE INFOCOM, Tel Aviv, Ma., pp. 4554. [9] P. Meha, A. Zakho, and C. De Vleeschouwe, Receive-diven bandwidth shaing fo TCP, in Poc. IEEE INFOCOM, San Fancisco, Ap. 3. [3] D. Ros, and M. Welz, Less-than-Best-Effot Sevice: A Suvey of Endto-End Appoaches, IEEE Tansaction on Communications Suveys and Tutoials, vol. 15, no., 13.