Shaping the Linux kernel MPTCP implementation towards upstream acceptance

Similar documents
MultiPath TCP : Linux Kernel implementation

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness

MPTCP: Design and Deployment. Day 11

Improving Multipath TCP. PhD Thesis - Christoph Paasch

Linux IP Networking. Antonio Salueña

Multipath TCP for FreeBSD

A Programming Model for Applicationdefined Multipath TCP Scheduling

TCP/misc works. Eric Google

Application-defined Multipath TCP Scheduling

Transport Layer Review

UNIVERSITY OF OSLO Department of Informatics. Late data choice with the Linux TCP/IP stack. Master Thesis. Erlend Birkedal

CSCI-GA Operating Systems. Networking. Hubertus Franke

Transport layer. UDP: User Datagram Protocol [RFC 768] Review principles: Instantiation in the Internet UDP TCP

Transport layer. Review principles: Instantiation in the Internet UDP TCP. Reliable data transfer Flow control Congestion control

Initial connection setup. Adding subflow setup. Three-way handshake with MP_CAPABLE Exchange 64 bit key(key-a, Key-B)

Designing a Resource Pooling Transport Protocol

MultiPath TCP : Linux Kernel Implementation

Status Update About COLO FT

Probe or Wait : Handling tail losses using Multipath TCP

Lecture 20 Overview. Last Lecture. This Lecture. Next Lecture. Transport Control Protocol (1) Transport Control Protocol (2) Source: chapters 23, 24

A proposal for MPTCP Robust session Establishment (MPTCP RobE) enable full multipath capability for MPTCP Markus Amend, Eckard Bogenfeld, Andreas

6.1 Internet Transport Layer Architecture 6.2 UDP (User Datagram Protocol) 6.3 TCP (Transmission Control Protocol) 6. Transport Layer 6-1

Internet Engineering Task Force Internet-Draft Obsoletes: 6824 (if approved)

ECE 650 Systems Programming & Engineering. Spring 2018

Zhang Chen Zhang Chen Copyright 2017 FUJITSU LIMITED

6. The Transport Layer and protocols

BLEST: Blocking Estimation-based MPTCP Scheduler for Heterogeneous Networks. Simone Ferlin, Ozgu Alay, Olivier Mehani and Roksana Boreli

Summary of last time!

Latest Developments with NVMe/TCP Sagi Grimberg Lightbits Labs

Multipath TCP: Overview, Design, and Use-Cases

Linux Networking: tcp. TCP context and interfaces

Announcements. No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6

No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6

Operating Systems. 17. Sockets. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

TCP over Wireless PROF. MICHAEL TSAI 2016/6/3

bitcoin allnet exam review: transport layer TCP basics congestion control project 2 Computer Networks ICS 651

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

What is an L3 Master Device?

Mobile Communications Chapter 9: Mobile Transport Layer

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Outline 9.2. TCP for 2.5G/3G wireless

L41 - Lecture 5: The Network Stack (1)

Part1: Lecture 2! Beyond TCP!

Summary of last time!

Implementation and Evaluation of Coupled Congestion Control for Multipath TCP

8. TCP Congestion Control

Transport Layer. Application / Transport Interface. Transport Layer Services. Transport Layer Connections

TCP Sendbuffer Advertising. Costin Raiciu University Politehnica of Bucharest

Communications Software. CSE 123b. CSE 123b. Spring Lecture 10: Mobile Networking. Stefan Savage

Quick announcement. CSE 123b Communications Software. Last class. Today s issues. The Mobility Problem. Problems. Spring 2003

Multipath TCP. Prof. Mark Handley Dr. Damon Wischik Costin Raiciu University College London

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

Chapter 6 Congestion Control and Resource Allocation

Internet Layers. Physical Layer. Application. Application. Transport. Transport. Network. Network. Network. Network. Link. Link. Link.

Different Layers Lecture 21

NETWORK PROGRAMMING. Instructor: Junaid Tariq, Lecturer, Department of Computer Science

Exploring Alternative Routes Using Multipath TCP

Network Research and Linux at the Hamilton Institute, NUIM.

The Transport Layer: TCP & Reliable Data Transfer

Transport Protocols and TCP: Review

Introduction to TCP/IP networking

Congestion Control for High-Bandwidth-Delay-Product Networks: XCP vs. HighSpeed TCP and QuickStart

NTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.

Computer Communication Networks Midterm Review

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

This sequence diagram was generated with EventStudio System Designer (

TCP Tuning for the Web

CSE 123A Computer Netwrking

Effect of SCTP Multistreaming over Satellite Links

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

Switch Configuration message sent 1 (1, 0, 1) 2

Different Layers Lecture 20

Basic Reliable Transport Protocols

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

4. What is the sequence number of the SYNACK segment sent by spinlab.wpi.edu to the client computer in reply to the SYN? Also Seq=0 (relative

Lecture 3: The Transport Layer: UDP and TCP

NT1210 Introduction to Networking. Unit 10

TCP. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli (Slides by Christos Papadopoulos, remixed by Lorenzo De Carli)

UNIT IV TRANSPORT LAYER

UDP, TCP, IP multicast

Chapter 4 End-to-End Protocol Layer

Protocol Overview. TCP/IP Performance. Connection Types in TCP/IP. Resource Management. Router Queues. Control Mechanisms ITL

Fall 2012: FCM 708 Bridge Foundation I

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

Multipath TCP. Prof. Mark Handley Dr. Damon Wischik Costin Raiciu University College London

Just enough TCP/IP. Protocol Overview. Connection Types in TCP/IP. Control Mechanisms. Borrowed from my ITS475/575 class the ITL

Memory mapped netlink

Lecture 11. Transport Layer (cont d) Transport Layer 1

The case for ubiquitous transport-level encryption

Transport Protocols. Raj Jain. Washington University in St. Louis

Fast Retransmit. Problem: coarsegrain. timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission

Tuning, Tweaking and TCP

9th Slide Set Computer Networks

Chapter 23 Process-to-Process Delivery: UDP, TCP, and SCTP

The Controlled Delay (CoDel) AQM Approach to fighting bufferbloat

Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University

ETSF10 Internet Protocols Transport Layer Protocols

Multiple unconnected networks

TSIN02 - Internetworking

Transcription:

Shaping the Linux kernel MPTCP implementation towards upstream acceptance Doru-Cristian Gucea Octavian Purdila

Agenda MPTCP in a nutshell Use-cases Basics Initial Linux kernel implementation Implementation alternatives Towards upstream submission Questions

MPTCP in nutshell Transport level multi-path solution Unmodified applications and network Works at least as well as regular TCP Works when a regular TCP would work Falls back to regular TCP if needed Fair with TCP, moves traffic away from congestion

Improved mobility with MPTCP

Throughput during a subway trip

Other MPTCP use-cases Power improvements via race to idle VM migration across different network domains Multi-WiFi: take advantage of multiple APs Improved throughput and reliability in the data-center

MPTCP basics 3G celltower

MPTCP basics SYN MP_CAPABLE X 3G celltower

MPTCP basics 3G celltower SYN/ACK SYN/ACK MP_CAPABLE MP_CAPABLE Y

MPTCP basics 3G celltower ACK ACK MP_CAPABLE MP_CAPABLE

MPTCP basics 3G celltower STATE CWND Snd.SEQNO Rcv.SEQNO

MPTCP basics 3G celltower STATE CWND Snd.SEQNO Rcv.SEQNO

MPTCP basics 3G celltower STATE CWND Snd.SEQNO Rcv.SEQNO

MPTCP basics 3G celltower STATE CWND Snd.SEQNO Rcv.SEQNO SYN/ACK JOIN X

MPTCP basics 3G celltower STATE CWND Snd.SEQNO Rcv.SEQNO STATE B CWND Snd.SEQNO Rcv.SEQNO

MPTCP basics 3G celltower DATA DATA SEQ SEQ A DSEQ: DSEQ: 1 STATE CWND Snd.SEQNO Rcv.SEQNO STATE B CWND Snd.SEQNO Rcv.SEQNO

MPTCP basics 3G celltower DATA DATA SEQ SEQ A DSEQ: DSEQ: 1 STATE CWND Snd.SEQNO Rcv.SEQNO DATA SEQ B DSEQ: 2 STATE B CWND Snd.SEQNO Rcv.SEQNO

Initial Linux kernel implementation Implementation done directly at the TCP level Good performance Low overhead when falling back to TCP Intrusive changes that increases the complexity of the TCP stack

Key MPTCP structures TCP socket, visible to userspace Meta socket Master socket Sub-flow socket Sub-flow socket TCP sockets, not visible to userspace

Pros/cons of using TCP sockets Re-use TCP code that transfers data to/from userspace (tcp_sendmsg, tcp_recvmsg) Makes MPTCP transparent to the application (including fallback) Some TCP functions now must deal with 3 cases TCP socket MPTCP sub-flow socket MPTCP meta socket

Creating the master socket Meta socket SYN,MP SYN,ACK,MP ACK,MP mptcp_sk_clone Master socket

Fallback to TCP Meta socket SYN,MP SYN,ACK ACK mpc =false Plain TCP socket

TCP queues (simplification) Receive q tcp_recvmsg tcp_sendmsg Write q out of order q tcp_retransmit_skb tcp_write_xmit Backlog q tcp_transmit_skb tcp_v4_rcv

MPTCP receive path mptcp_sk_ready Sub-flow receive q mptcp_backlog_rcv Sub-flow out of order q Meta backlog q tcp_recvmsg Meta receive q Meta out of order q tcp_v4_rcv mptcp_sk_ready

MPTCP send path tcp_sendmsg Meta write q mptcp_retransmit_skb mptcp_write_xmit Mptcp scheduler Sub-flow write q

MPTCP DSS options Maps meta seq to sub-flow seq 20 bytes Does not fit into skb->cb Save them in the skb data in the space reserved for the TCP header Everytime pskb_copy() is called from the TCP stack we need to copy DSS manually Save DSS here Head Reservered for MAC/IP/TCP header Data Tail End

Alternative approaches Userspace implementation Requires infrastructure to pass / receive MPTCP options from userspace Use control messages for sendmsg/recvmsg Needs new userspace ABIs to handle the connection handshake Complete rewrite Create a separate layer on top of TCP (similar with how NFS, CIFS uses TCP)

Towards a separate MPTCP layer Eliminate the branches in the TCP code that deals with MPTCP subflow and meta sockets Isolate the MPTCP sub-flow functionality MPTCP meta sockets are not TCP sockets create a new protocol to deal with them Eliminate code duplication between TCP and MPTCP

MPTCP sub-flow specific code in TCP MPTCP connection hand-shake Client side Server side MPTCP receive window MPTCP send and receive path hooks MPTCP coupled congestion code

Isolate MPTCP connection handshake TX Connection socket operations are used to abstract and isolate IPv4 and IPv6 By defining MPTCP specific connection socket operation we isolated the TX part of MPTCP subflow handshake Connect socket operations queue_xmit send_check con_request syn_recv_sock...

Isolate MPTCP connection handshake RX On the receive side request socket operations are used to abstract MD5 code Unfortunately these operations are not enough to isolate MPTCP code but... Request socket operations md5_lookup calc_md5_hash We noticed significant code duplication between the IPv4 and IPv6 paths

Isolate MPTCP connection handshake RX Added new operations to abstract and isolate the IPv4 and IPv6 paths With that we also isolated the RX part of MPTCP sub-flow handshake Request socket operations init_req cookie_init_seq cookie_init_seq route_req send_synack queue_hash_add init_seq

Isolate MPTCP receive window and send path Introduce a new structure to abstract some TCP socket operations Specific operations for the meta socket, sub-flow socket and regular TCP TCP socket operations write_xmit write_wakeup select_initial_window init_buffer_space set_rto...

git diff v3.18..mptcp_trunk --stat (>10) include/linux/tcp.h 85 + include/net/sock.h 11 + include/net/tcp.h 194 ++net/core/sock.c 35 + net/ipv4/af_inet.c 27 + net/ipv4/inet_connection_sock.c 21 + net/ipv4/tcp.c 182 ++net/ipv4/tcp_fastopen.c 28 + net/ipv4/tcp_input.c 320 +++net/ipv4/tcp_ipv4.c 202 ++net/ipv4/tcp_minisocks.c 95 + net/ipv4/tcp_output.c 254 ++ net/ipv4/tcp_timer.c 81 + net/ipv6/tcp_ipv6.c 274 +++

Separate MPTCP meta layer Rationale: the meta socket is not a TCP socket Create a new IP protocol level socket for MPTCP socket(af_inet, SOCK_STREAM, IPROTO_TCP TCPEXT_MPTCP)

WIP: early allocation of the master socket Allocate the master socket as soon as the meta socket is created Falling back to TCP adds overhead as we now go through the meta socket MPTCP is not transparent at the application level* Simplifies the connect path: Connect of meta socket translates to connect on master socket Avoids cloning the meta socket and changes in the inet connection layer

WIP: receive path sk_data_ready (on sub-flow sockets) scan the rx queue update DSS mapping mark sub-flow and wake-up meta socket tcp_recvmsg Meta receive q mptcp_recvmsg (on meta socket) wait_event(wq, ready_sub-flows) for all ready sub-flows lock sock(sub-flow) Meta out of order q mptcp_recvmsg Sub-flow receive q tcp_read_sock(sub-flow) recv actor clone SKB and add to meta socket clear sub-flow ready bit release_sock(sub-flow) tcp_recvmsg (meta socket, O_NONBLOCK)

WIP: send path mptcp_sendmsg Meta write q tcp_send_skb? MPTCP Reinjection work queue Sub-flow write q tcp_push

Conclusions MPTCP has interesting use-cases and it is used commercially The initial Linux kernel implementation had large TCP stack changes We have been steadily reducing changes to the TCP stack We believe a separate MPTCP layer should help us reduce TCP changes even more and help us manage the complexity

Thank you!