Transparent TCP Recovery - PDF Free Download

Transparent Recovery with Chain Replication Robert Burgess Ken Birman Robert Broberg Rick Payne Robbert van Renesse October 26, 2009

Motivation Us:

Motivation Them: Client

Motivation There is a connection... Client

Motivation Client We keep some application state...

Motivation Client... and so do they

Motivation The server fails! Client

Motivation... and is revived Client But two things are still missing!

Motivation Look! A persistent store! Client

Motivation checkpoint! Client

Motivation Client recover!

Motivation Client recovers application state

Motivation Client What recovers connection state?

Why bother? As soon as the client tries to send... Client

Why bother? RST! Client

Why bother? Client I ll just try again!

Why bother? Client

The client Client

The client Them: Client

The client Humans see Connection Reset Them: Client

The client Humans see Connection Reset When should the client retry? Them: Client

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? Client

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? What session maps to a new connection? Client

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? What session maps to a new connection? Client Some protocols would need re-authentication

The client Humans see Connection Reset When should the client retry? Them: When should the client give up? Client What session maps to a new connection? Some protocols would need re-authentication Some protocols already respond actively... BGP assumes link is lost! Resync is slow

What are the possibilities?

The network stack

The network stack Has the state and the logic

The network stack Has the state and the logic No copies or context switches

The network stack Has the state and the logic No copies or context switches Fork network stack

The network stack Has the state and the logic No copies or context switches Fork network stack Redundant logging

The network stack Has the state and the logic No copies or context switches Fork network stack Redundant logging Synchronous replication

The network stack Has the state and the logic No copies or context switches Fork network stack Redundant logging Synchronous replication End-to-end, only use kernel for efficiency

Network stack wrappers [FT-]

Network stack wrappers [FT-] Still some kernel advantages

Network stack wrappers [FT-] Still some kernel advantages No change to server or

Network stack wrappers [FT-] Still some kernel advantages No change to server or Two kernel modules

Network stack wrappers [FT-] Still some kernel advantages No change to server or Two kernel modules Must interpose on socket calls

Network stack wrappers [FT-] Still some kernel advantages No change to server or Two kernel modules Must interpose on socket calls Synchronous replication

The server

The server User-level networking

The server User-level networking state includes

The server User-level networking state includes Can t leverage OS

The server User-level networking state includes Can t leverage OS Significant server changes

A proxy [CRAFT, I-]

A proxy [CRAFT, I-] Splice (spoof) separate connections

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance Recover by reconnecting

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance Recover by reconnecting state machine replication

A proxy [CRAFT, I-] Splice (spoof) separate connections Replicate for fault-tolerance Recover by reconnecting state machine replication Connections aren t really connected

A man in the middle [Morris, ST-]

A man in the middle [Morris, ST-] Little or no overhead

A man in the middle [Morris, ST-] Little or no overhead Guesswork

A man in the middle [Morris, ST-] Little or no overhead Guesswork No control over server

A man in the middle [Morris, ST-] Little or no overhead Guesswork No control over server What about recovery?

R Little or no overhead

R Little or no overhead No guesswork

R Little or no overhead No guesswork Can control server

R Little or no overhead No guesswork Can control server Chain replication

R Little or no overhead No guesswork Can control server Chain replication Machine-independent

R Little or no overhead No guesswork Can control server Chain replication Machine-independent Replicas don t grok

R Little or no overhead No guesswork Can control server Chain replication Machine-independent Replicas don t grok But what about recovery?

Recovery How we recover determines what we must replicate

Recovery If we replicate state machines

Recovery If we replicate state machines and we control the stack

Recovery If we replicate state machines and we control the stack Can failover TCBs

Recovery We must restart the connection

Recovery We must restart the connection with an unchanged stack

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain and makes a new connection

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain and makes a new connection Packets modified to spoof real client

Recovery Recovery We must restart the connection with an unchanged stack Recovery process notifies chain and makes a new connection Packets modified to spoof real client handshakes normally (chain maintains spoofing)

Recovery Recovery Connection replayed from checkpoint

Recovery Recovery Connection replayed from checkpoint Client traffic can continue ignored until it catches up

Recovery Recovery Connection replayed from checkpoint Client traffic can continue ignored until it catches up Connection recovered!

Replaying client data

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery

Replaying client data Replay from beginning No need for server to tell recovered from new must be deterministic Memory-intensive in common case Slow recovery Replay from explicit checkpoint Can be requested from server or administrator may need to distinguish recovered connections (getpeername) Bounded data to store and replay

It can t be that simple...

It can t be that simple... Initial Sequence Numbers We can pick the correct client sequence number The server picks its own We ll have to patch up all future packets Change sequence, recompute checksum Fragmentation Assume that endpoints use a reasonable MSS Or that driver program handles reassembly Selective ACKnowledgements SACKs are advisory only Let SACKs flow normally Do nothing to recover those packets if lost

Implementation

Implementation librtcp or administrator functions Set up server (with key information) Send checkpoints to replicas Recover connection Replica functions Process packet (agnostic of transport)

Implementation librtcp or administrator functions Set up server (with key information) Send checkpoints to replicas Recover connection Replica functions Process packet (agnostic of transport) Potentially many driver programs Current rtcp program uses Linux netfilter QUEUE 1. Pull packet off queue 2. Let library process packet 3. Permit delivery

Implementation status Replicas log and forward packets Connections are tracked and kept up-to-date No checkpointing or recovery yet No MD5 option handling yet

Conclusion R Chain replication enables cheap consistency Leverages existing stacks Platform-independent Can run on any machine in network Perhaps minor changes to server (or automatic administrator) to checkpoint and recover