Consistency in SDN. Aurojit Panda, Wenting Zheng, Xiaohe Hu, Arvind Krishnamurthy, Scott Shenker

Similar documents
Model Checking Dynamic Datapaths

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Revisiting Network Support for RDMA

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

SpecPaxos. James Connolly && Harrison Davis

CPS 512 midterm exam #1, 10/7/2016

Advanced Topics in Routing

Software Defined Networking Data centre perspective: Open Flow

OTSDN What is it? Does it help?

Designing Distributed Systems using Approximate Synchrony in Data Center Networks

DATA CENTER FABRIC COOKBOOK

Building Consistent Transactions with Inconsistent Replication

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

Paxos provides a highly available, redundant log of events

GUIDE. Optimal Network Designs with Cohesity

Scalability and Resilience in SDN: an overview. Nicola Rustignoli

Replication in Distributed Systems

From Routing to Traffic Engineering

Thinking Architecturally (80 Minutes Inside Scott s Head)

Database Architectures

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Upgrading From a Successful Emergency Control System to a Complete WAMPAC System for Georgian State Energy System

Cybersecurity was nonexistent for most network data exchanges until around 1994.

Implementation and Performance of a SDN Cluster- Controller Based on the OpenDayLight Framework

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Routing Strategies. Fixed Routing. Fixed Flooding Random Adaptive

Distributed Consensus Protocols

A High Performance Packet Core for Next Generation Cellular Networks

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

The Impact of Control Path Survivability on Data Plane Survivability in SDN. Sedef Savas Networks Lab, Group Meeting Aug 11, 2017

Large scale SPVC ATM network

Plexxi Theory of Operations White Paper

CS October 2017

ResQ: Enabling SLOs in Network Function Virtualization

Today: Fault Tolerance

THOUGHTS ON SDN IN DATA INTENSIVE SCIENCE APPLICATIONS

Next Generation Architecture for NVM Express SSD

Eventual Consistency Today: Limitations, Extensions and Beyond

Network Design Clinic

Robust Networking with IPv6

InfluxDB and the Raft consensus protocol. Philip O'Toole, Director of Engineering (SF) InfluxDB San Francisco Meetup, December 2015

Introduction to OSPF

Huawei Technologies engaged Miercom to evaluate the S12700

Fault Tolerance. Distributed Systems. September 2002

F10: A Fault- Tolerant Engineered Network

Distributed Systems. Day 9: Replication [Part 1]

NOX, POX, and lessons learned. James Murphy McCauley

Feature Comparison Summary

Bringing SDN to the Internet, one exchange point at the time

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Discrete Event (time) Simulation

CAP for Networks. Aurojit Panda Colin Scott Ali Ghodsi Teemu Koponen Scott Shenker UC Berkeley KTH/Royal Institute of Technology VMware ICSI ABSTRACT

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

Motivation. The Impact of DHT Routing Geometry on Resilience and Proximity. Different components of analysis. Approach:Component-based analysis

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

Five Reasons Why You Should Choose Cisco MDS 9000 Family Directors Cisco and/or its affiliates. All rights reserved.

The Impact of DHT Routing Geometry on Resilience and Proximity. Acknowledgement. Motivation

Network Configuration Example

SCALABLE CONSISTENCY AND TRANSACTION MODELS

1 Network Function Virtualization

Distributed Systems 11. Consensus. Paul Krzyzanowski

Modern Database Concepts

Distributed Systems 24. Fault Tolerance

Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana. UC Berkeley SIGCOMM 2002

SEL-5056 Software-Defined Network (SDN) Flow Controller

Data Center Applications and MRV Solutions

No Compromises. Distributed Transactions with Consistency, Availability, Performance

Revisiting Network Support for RDMA

ONOS OVERVIEW. Architecture, Abstractions & Application

ETSF05/ETSF10 Internet Protocols. Routing on the Internet

6.033 Spring 2016 Lecture #18. Distributed transactions Multi-site atomicity Two-phase commit

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

Replicated State Machine in Wide-area Networks

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Building Infrastructure for Private Clouds Cloud InterOp 2014"

Strong Consistency & CAP Theorem

Investigating the Use of Synchronized Clocks in TCP Congestion Control

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

31270 Networking Essentials Focus, Pre-Quiz, and Sample Exam Answers

THE NETWORK AND THE CLOUD

6.033 Spring Lecture #18. Distributed transactions Multi-site atomicity Two-phase commit spring 2018 Katrina LaCurts

Cloud Backup and Recovery for Healthcare and ecommerce

Top-Down Network Design

Inter-Domain Routing: BGP

CS 425 / ECE 428 Distributed Systems Fall 2017

Stateless Network Functions:

Cisco Group Encrypted Transport VPN

The Virtualisation Journey at Perpetual. Business Technology Group November 2009

Modeling, Analyzing, and Extending Megastore using Real-Time Maude

CS 4226: Internet Architecture

Storage Networking Strategy for the Next Five Years

Azure Development Course

Reinforcement learning algorithms for non-stationary environments. Devika Subramanian Rice University

Deploy Microsoft SQL Server 2014 on a Cisco Application Centric Infrastructure Policy Framework

SDN Evolution of networks. Raul Caldeira

Information Data Reliability With The XGbE WAN PHY

What is Hyperconvergence?

Transcription:

Consistency in SDN Aurojit Panda, Wenting Zheng, Xiaohe Hu, Arvind Krishnamurthy, Scott Shenker

Distributed SDN Today Replicated Replicated Replicated Consistency Layer

Distributed SDN Today Replicated Replicated Replicated Consistency Layer Sequences Events

Distributed SDN Today Replicated Replicated Replicated Consistency Layer Sequences Events Today: Paxos, Raft, etc. used to implement serializability

Our Approach Consistent Policy Database Consistency Layer Independent Independent Independent

Our Approach Consistent Policy Database Consistency Layer Independent Independent Independent Respond instantaneously

Our Approach Consistent Policy Database Consistency Layer Eventual Correctness Independent Independent Independent Respond instantaneously

Our Approach Consistent Policy Database Consistent view of policy Consistency Layer Eventual Correctness Independent Independent Independent Respond instantaneously

Performance Allows greater scalability and resilience.

Performance Allows greater scalability and resilience. Faster convergence: we do better than when consistency is used.

Performance Allows greater scalability and resilience. Faster convergence: we do better than when consistency is used. 1 SCL Coordination 0.8 CDF 0.6 0.4 0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Convergence Time (ms) Convergence Time in Data Centers

Performance Allows greater scalability and resilience. Faster convergence: we do better than when consistency is used. 1 SCL Coordination 1 SCL Coordination 0.8 0.8 CDF 0.6 0.4 CDF 0.6 0.4 0.2 0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Convergence Time (ms) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Convergence Time (S) Convergence Time in Data Centers Convergence Time in AS topology

Our approach ensures: Correctness

Correctness Our approach ensures: Eventually all controllers agree on the sequence of network events seen.

Correctness Our approach ensures: Eventually all controllers agree on the sequence of network events seen. Eventually each controller and network agree on state of the network.

Correctness Our approach ensures: Eventually all controllers agree on the sequence of network events seen. Eventually each controller and network agree on state of the network. Therefore eventually computed and installed states are correct.

Correctness Our approach ensures: Eventually all controllers agree on the sequence of network events seen. Eventually each controller and network agree on state of the network. Therefore eventually computed and installed states are correct. Assuming deterministic controllers and idempotent switch updates.

What about Consistency?

What about Consistency? Correctness: Needed to ensure flow tables, controllers are correct.

What about Consistency? Correctness: Needed to ensure flow tables, controllers are correct. Programmability: Needed to make it easier to program networks.

What about Consistency? Correctness: Needed to ensure flow tables, controllers are correct. Programmability: Needed to make it easier to program networks. Performance: Needed for faster convergence.

What about Consistency? Correctness: Needed to ensure flow tables, controllers are correct. Programmability: Needed to make it easier to program networks. Performance: Needed for faster convergence.

Why Does This Work? Networks are open world systems.

Why Does This Work? Networks are open world systems. Open World: Truth resides in an external entity (e.g., network).

Why Does This Work? Networks are open world systems. Open World: Truth resides in an external entity (e.g., network). Closed World: Truth resides in the system itself (e.g., a database).

Why Does This Work? Networks are open world systems. Open World: Truth resides in an external entity (e.g., network). Closed World: Truth resides in the system itself (e.g., a database). With open world systems

Why Does This Work? Networks are open world systems. Open World: Truth resides in an external entity (e.g., network). Closed World: Truth resides in the system itself (e.g., a database). With open world systems Truth can be recovered from the external system.

Why Does This Work? Networks are open world systems. Open World: Truth resides in an external entity (e.g., network). Closed World: Truth resides in the system itself (e.g., a database). With open world systems Truth can be recovered from the external system. Consistency with ground truth is more important than within the system.

Why is this relevant?

Sources of Network Updates Planned Updates Network Events

Sources of Network Updates Planned Updates Policy updates, link recovery, etc. Network Events Link failures, switch failure, etc.

Sources of Network Updates Planned Updates Policy updates, link recovery, etc. Network Events Link failures, switch failure, etc. Working Network Working Network Broken Network Working Network

Sources of Network Updates Planned Updates Policy updates, link recovery, etc. Network Events Link failures, switch failure, etc. Working Network Working Network Broken Network Working Network Goal

Sources of Network Updates Planned Updates Policy updates, link recovery, etc. Network Events Link failures, switch failure, etc. Working Network Working Network Broken Network Working Network Goal Maintain correctness during transition Minimize time to connectivity restored.

Sources of Network Updates Planned Updates Policy updates, link recovery, etc. Network Events Link failures, switch failure, etc. Working Network Working Network Broken Network Working Network Goal Maintain correctness during transition Minimize time to connectivity restored. Consistency helps (required?) Consistency adds latency.

Edge-Core Separation Fabric Provides connectivity Routing, Traffic Engineering

Edge-Core Separation Endhost Edge Richer Policies ACLs Traffic Priorities Fabric Provides connectivity

Conclusion Existence proof that controller consistency is not necessary. In fact slows down network recovery in response to failures. Should we require consistency for SDN controllers? Question is similar to the ACID vs NoSQL debate in data stores.

Open Questions What about data plane consistency? Ensures each packet processed according to consistent policy. Do we need data plane consistency? For planned updates: Helps with correctness during policy changes. For network events: Adds latency before connectivity is restored.