Transparent TCP Recovery

Similar documents
Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues

Man-in-the-Middle TCP Recovery

Engineering Fault-Tolerant TCP/IP servers using FT-TCP. Dmitrii Zagorodnov University of California San Diego

GFS: The Google File System. Dr. Yingwu Zhu

Internet Layers. Physical Layer. Application. Application. Transport. Transport. Network. Network. Network. Network. Link. Link. Link.

GFS: The Google File System

From eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup

MODELS OF DISTRIBUTED SYSTEMS

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

CSCI-GA Operating Systems. Networking. Hubertus Franke

TSIN02 - Internetworking

Today: Fault Tolerance. Replica Management

iscsi Target Usage Guide December 15, 2017

TSIN02 - Internetworking

User Datagram Protocol

ECE 435 Network Engineering Lecture 10

CSE/EE 461 Lecture 13 Connections and Fragmentation. TCP Connection Management

Network Protocols. Transmission Control Protocol (TCP) TDC375 Autumn 2009/10 John Kristoff DePaul University 1

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Primary-Backup Replication

Reliable Transport I: Concepts and TCP Protocol

Status Update About COLO FT

Today: Fault Tolerance. Reliable One-One Communication

The Google File System

Persistence Schemes. Chakchai So-In Department of Computer science Washington University

CCNA 1 Chapter 7 v5.0 Exam Answers 2013

Zhang Chen Zhang Chen Copyright 2017 FUJITSU LIMITED

OSI Transport Layer. objectives

Computer Networks and Data Systems

MIGSOCK A Migratable TCP Socket in Linux

Internet Protocol and Transmission Control Protocol

CSE 5306 Distributed Systems. Fault Tolerance

Nooks. Robert Grimm New York University

Fault Tolerance. Distributed Systems IT332

Transport Protocols. ISO Defined Types of Network Service: rate and acceptable rate of signaled failures.

HIP Host Identity Protocol. October 2007 Patrik Salmela Ericsson

Today: Fault Tolerance. Failure Masking by Redundancy

Chapter 09 Network Protocols

Fault Tolerance. Distributed Systems. September 2002

Recovering Device Drivers

TCP so far Computer Networking Outline. How Was TCP Able to Evolve

Category: Standards Track February Fault Tolerance for the Label Distribution Protocol (LDP)

UNIT IV -- TRANSPORT LAYER

ECE 650 Systems Programming & Engineering. Spring 2018

TSIN02 - Internetworking

TSIN02 - Internetworking

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Mobile Transport Layer

CSE 5306 Distributed Systems

BlackBerry Enterprise Server for IBM Lotus Domino Version: 5.0. Administration Guide

Distributed Systems (ICE 601) Fault Tolerance

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

The Google File System. Alexandru Costan

CS 416: Operating Systems Design April 22, 2015

Review problems (for no credit): Transport and Network Layer

CCNA R&S: Introduction to Networks. Chapter 7: The Transport Layer

Communication. Distributed Systems Santa Clara University 2016

MySQL HA Solutions Selecting the best approach to protect access to your data

Chapter 12 Network Protocols

To see the details of TCP (Transmission Control Protocol). TCP is the main transport layer protocol used in the Internet.

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

MODELS OF DISTRIBUTED SYSTEMS

DRBD 9. Lars Ellenberg. Linux Storage Replication. LINBIT HA Solutions GmbH Vienna, Austria

COMP/ELEC 429/556 Introduction to Computer Networks

Mobile Communications Chapter 9: Mobile Transport Layer

Transport Layer. Application / Transport Interface. Transport Layer Services. Transport Layer Connections

UDP and TCP. Introduction. So far we have studied some data link layer protocols such as PPP which are responsible for getting data

iscsi Error Recovery Mallikarjun Chadalapaka Randy Haagens Julian Satran London, 6-7 Aug 2001

Transport Layer Marcos Vieira

Outline 9.2. TCP for 2.5G/3G wireless

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

Today: Fault Tolerance. Fault Tolerance

EE 122: Transport Protocols. Kevin Lai October 16, 2002

N-Variant SystemsA Secretless Framework for Security through. Diversity Cox et al.

! How is a thread different from a process? ! Why are threads useful? ! How can POSIX threads be useful?

Broker Clusters. Cluster Models

Designing a Resource Pooling Transport Protocol

The Google File System (GFS)

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Concepts of Distributed Systems 2006/2007

The Google File System

CS457 Transport Protocols. CS 457 Fall 2014

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Impact of transmission errors on TCP performance. Outline. Random Errors

Chapter 17: Distributed Systems (DS)

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

3.1 Introduction. Computers perform operations concurrently

CMSC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. October 25, 2018

Distributed File Systems II

Advanced Network Design

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

Your projected and optimistically projected grades should be in the grade center soon o Projected: Your current weighted score /30 * 100

CS /29/17. Paul Krzyzanowski 1. Fall 2016: Question 2. Distributed Systems. Fall 2016: Question 2 (cont.) Fall 2016: Question 3

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

TCP/IP Performance ITL

Consensus and related problems

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Live Migration of Direct-Access Devices. Live Migration

Outline. What is TCP protocol? How the TCP Protocol Works SYN Flooding Attack TCP Reset Attack TCP Session Hijacking Attack

Transcription:

Transparent Recovery with Chain Replication Robert Burgess Ken Birman Robert Broberg Rick Payne Robbert van Renesse October 26, 2009

Motivation Us:

Motivation Them: Client

Motivation There is a connection... Client

Motivation Client We keep some application state...

Motivation Client... and so do they

Motivation The server fails! Client

Motivation... and is revived Client But two things are still missing!

Motivation Look! A persistent store! Client

Motivation checkpoint! Client

Motivation Client recover!

Motivation Client recovers application state

Motivation Client What recovers connection state?

Why bother? As soon as the client tries to send... Client

Why bother? RST! Client

Why bother? Client I ll just try again!

Why bother? Client

Why bother? Client

The client Client

The client Them: Client

The client Humans see Connection Reset Them: Client

The client Humans see Connection Reset When should the client retry? Them: Client

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? Client

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? What session maps to a new connection? Client

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? What session maps to a new connection? Client Some protocols would need re-authentication

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? Client What session maps to a new connection? Some protocols would need re-authentication Some protocols already respond actively... BGP assumes link is lost! Resync is slow

What are the possibilities?

What are the possibilities?

What are the possibilities?

The network stack

The network stack Has the state and the logic

The network stack Has the state and the logic No copies or context switches

The network stack Has the state and the logic No copies or context switches Fork network stack

The network stack Has the state and the logic No copies or context switches Fork network stack Redundant logging

The network stack Has the state and the logic No copies or context switches Fork network stack Redundant logging Synchronous replication

The network stack Has the state and the logic No copies or context switches Fork network stack Redundant logging Synchronous replication End-to-end, only use kernel for efficiency

Network stack wrappers [FT-]

Network stack wrappers [FT-] Still some kernel advantages

Network stack wrappers [FT-] Still some kernel advantages No change to server or

Network stack wrappers [FT-] Still some kernel advantages No change to server or Two kernel modules

Network stack wrappers [FT-] Still some kernel advantages No change to server or Two kernel modules Must interpose on socket calls

Network stack wrappers [FT-] Still some kernel advantages No change to server or Two kernel modules Must interpose on socket calls Synchronous replication

The server

The server User-level networking

The server User-level networking state includes

The server User-level networking state includes Can t leverage OS

The server User-level networking state includes Can t leverage OS Significant server changes

A proxy [CRAFT, I-]

A proxy [CRAFT, I-] Splice (spoof) separate connections

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance Recover by reconnecting

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance Recover by reconnecting state machine replication

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance Recover by reconnecting state machine replication Connections aren t really connected

A man in the middle [Morris, ST-]

A man in the middle [Morris, ST-] Little or no overhead

A man in the middle [Morris, ST-] Little or no overhead Guesswork

A man in the middle [Morris, ST-] Little or no overhead Guesswork No control over server

A man in the middle [Morris, ST-] Little or no overhead Guesswork No control over server What about recovery?

R

R Little or no overhead

R Little or no overhead No guesswork

R Little or no overhead No guesswork Can control server

R Little or no overhead No guesswork Can control server Chain replication

R Little or no overhead No guesswork Can control server Chain replication Machine-independent

R Little or no overhead No guesswork Can control server Chain replication Machine-independent Replicas don t grok

R Little or no overhead No guesswork Can control server Chain replication Machine-independent Replicas don t grok But what about recovery?

Recovery How we recover determines what we must replicate

Recovery If we replicate state machines

Recovery If we replicate state machines and we control the stack

Recovery If we replicate state machines and we control the stack Can failover TCBs

Recovery We must restart the connection

Recovery We must restart the connection with an unchanged stack

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain and makes a new connection

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain and makes a new connection Packets modified to spoof real client

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain and makes a new connection Packets modified to spoof real client handshakes normally (chain maintains spoofing)

Recovery Recovery Connection replayed from checkpoint

Recovery Recovery Connection replayed from checkpoint Client traffic can continue ignored until it catches up

Recovery Recovery Connection replayed from checkpoint Client traffic can continue ignored until it catches up Connection recovered!

Replaying client data

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery Replay from explicit checkpoint Can be requested from server or administrator may need to distinguish recovered connections (getpeername) Bounded data to store and replay

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery Replay from explicit checkpoint Can be requested from server or administrator may need to distinguish recovered connections (getpeername) Bounded data to store and replay One step further:

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery Replay from explicit checkpoint Can be requested from server or administrator may need to distinguish recovered connections (getpeername) Bounded data to store and replay One step further: Hold ACKs until checkpointed

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery Replay from explicit checkpoint Can be requested from server or administrator may need to distinguish recovered connections (getpeername) Bounded data to store and replay One step further: Hold ACKs until checkpointed No need to store or replay packets at all!

It can t be that simple...

It can t be that simple... Initial Sequence Numbers We can pick the correct client sequence number The server picks its own We ll have to patch up all future packets Change sequence, recompute checksum

It can t be that simple... Initial Sequence Numbers We can pick the correct client sequence number The server picks its own We ll have to patch up all future packets Change sequence, recompute checksum Fragmentation Assume that endpoints use a reasonable MSS Or that driver program handles reassembly

It can t be that simple... Initial Sequence Numbers We can pick the correct client sequence number The server picks its own We ll have to patch up all future packets Change sequence, recompute checksum Fragmentation Assume that endpoints use a reasonable MSS Or that driver program handles reassembly Selective ACKnowledgements SACKs are advisory only Let SACKs flow normally Do nothing to recover those packets if lost

It can t be that simple... Initial Sequence Numbers We can pick the correct client sequence number The server picks its own We ll have to patch up all future packets Change sequence, recompute checksum Fragmentation Assume that endpoints use a reasonable MSS Or that driver program handles reassembly Selective ACKnowledgements SACKs are advisory only Let SACKs flow normally Do nothing to recover those packets if lost MD5 security Adds an additional checksum with symmetric key Administrator must provide key information

Implementation

Implementation librtcp or administrator functions Set up server (with key information) Send checkpoints to replicas Recover connection Replica functions Process packet (agnostic of transport)

Implementation librtcp or administrator functions Set up server (with key information) Send checkpoints to replicas Recover connection Replica functions Process packet (agnostic of transport) Potentially many driver programs Current rtcp program uses Linux netfilter QUEUE 1. Pull packet off queue 2. Let library process packet 3. Permit delivery

Implementation librtcp or administrator functions Set up server (with key information) Send checkpoints to replicas Recover connection Replica functions Process packet (agnostic of transport) Potentially many driver programs Current rtcp program uses Linux netfilter QUEUE 1. Pull packet off queue 2. Let library process packet 3. Permit delivery In the future, Feather-Weight Pipes!

Implementation status Replicas log and forward packets Connections are tracked and kept up-to-date No checkpointing or recovery yet No MD5 option handling yet

Conclusion R Chain replication enables cheap consistency Leverages existing stacks Platform-independent Can run on any machine in network Perhaps minor changes to server (or automatic administrator) to checkpoint and recover

Conclusion R Chain replication enables cheap consistency Leverages existing stacks Platform-independent Can run on any machine in network Perhaps minor changes to server (or automatic administrator) to checkpoint and recover Simple, simple, simple, simple. Fast?