ACMS: The Akamai Configuration Management System. A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein

Similar documents
INF-5360 Presentation

John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante

Distributed Systems 11. Consensus. Paul Krzyzanowski

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,

ZooKeeper. Table of contents

Applications of Paxos Algorithm

Distributed KIDS Labs 1

Replication in Distributed Systems

Dynamo: Key-Value Cloud Storage

Distributed Systems (ICE 601) Fault Tolerance

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

IBM Sterling Data Synchronization Manager. User Guide. DocumentationDate:12October2012

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

The Google File System

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

The Google File System

Distributed Deadlock

CS Amazon Dynamo

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

GOSSIP ARCHITECTURE. Gary Berg css434

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Intuitive distributed algorithms. with F#

Implementation Issues. Remote-Write Protocols

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CSE 124: Networked Services Lecture-17

Consensus and related problems

Distributed Storage Systems: Data Replication using Quorums

Distributed Data Management Replication

Scaling Out Key-Value Storage

Distributed Systems. 05. Clock Synchronization. Paul Krzyzanowski. Rutgers University. Fall 2017

Scaling Distributed Machine Learning with the Parameter Server

Replication. Feb 10, 2016 CPSC 416

Dynamo: Amazon s Highly Available Key-Value Store

Today: World Wide Web! Traditional Web-Based Systems!

EECS 498 Introduction to Distributed Systems

Bloom: Big Systems, Small Programs. Neil Conway UC Berkeley

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

Distributed Systems Fault Tolerance

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

CS 655 Advanced Topics in Distributed Systems

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Practice: Large Systems Part 2, Chapter 2

OL Connect Backup licenses

416 practice questions (PQs)

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Today CSCI Consistency Protocols. Consistency Protocols. Data Consistency. Instructor: Abhishek Chandra. Implementation of a consistency model

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Distributed File Systems II

Today: Fault Tolerance. Fault Tolerance

Lecture 2: January 24

Eventual Consistency 1

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

02 - Distributed Systems

Distributed Systems Architectures

CS 677 Distributed Operating Systems. Programming Assignment 3: Angry birds : Replication, Fault Tolerance and Cache Consistency

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Implementing Replication. Overview of Replication Managing Publications and Subscriptions Configuring Replication in Some Common Scenarios

02 - Distributed Systems

NewSQL Databases. The reference Big Data stack

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Today: Fault Tolerance

Broker Clusters. Cluster Models

Consistency in Distributed Systems

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Strong Eventual Consistency and CRDTs

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

The Google File System

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

The Google File System

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

05 Indirect Communication

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

SpecPaxos. James Connolly && Harrison Davis

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Internet Content Distribution

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

Windows Azure Services - At Different Levels

Today: Fault Tolerance. Replica Management

Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev

Last Class:Consistency Semantics. Today: More on Consistency

PRIMARY-BACKUP REPLICATION

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Mix n Match Async and Group Replication for Advanced Replication Setups. Pedro Gomes Software Engineer

Building a Real-time Notification System

Chapter 11 - Data Replication Middleware

10. Replication. Motivation

System models for distributed systems

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

MillWheel:Fault Tolerant Stream Processing at Internet Scale. By FAN Junbo

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Transcription:

ACMS: The Akamai Configuration Management System A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein Instructor: Fabian Bustamante Presented by: Mario Sanchez

The Akamai Platform Over 15,000 servers (now 56k!) Deployed in 1200+ different ISP networks In 60+ countries (over 70+) 1/27/2010 2

Motivation Customer profiles: Customers need to maintain close control over the manner in which their web content is served Customers need to configure different options that determine how their content is served by the CDN Akamai s internal services and processes: Need for frequent updates or reconfigurations 1/27/2010 3

Akamai Configuration Management System (ACMS) Supports configuration propagation management Accepts and disseminates distributed submissions of configuration information Availability Reliability Asynchrony Consistency Persistent storage 1/27/2010 4

Problem The widely dispersed set of end clients At any point in time some servers may be down or have connectivity problems Configuration changes are generated from widely dispersed places Strong consistency requirements 1/27/2010 5

Assumptions The configuration files will vary in size from a few hundred bytes up to 100MB Most updates must be distributed to every Akamai node There is no particular arrival pattern of submissions The Akamai CDN will continue to grow Submissions could originate from a number of distinct applications running at distinct locations on the Akamai CDN 1/27/2010 6

Assumptions Continued Each submission of a configuration file foo completely overwrites the earlier submitted version of foo For each configuration file there is either a single writer or multiple idempotent (non-competing) writers 1/27/2010 7

Requirements High Fault-Tolerance and Availability - multiple entry points Correctness - unique ordering of versions Acceptance Guarantee - quorum of entry points Persistent Fault-Tolerant Storage - asynchronous delivery avoids downtime Efficiency and Scalability - quick server synchronization Security 1/27/2010 8

Approach Architecture Requirements Update management Frontend Backend Delivery Small set of Storage Points (SPs) The entire Akamai CDN

Architecture Quorum-based Replication Publishers SP SP Storage Points SP SP SP Accepting SP Edge Servers

Quorum-based Replication A quorum is defined as a majority of the ACMS SPs Any update submission should be replicated and agreed upon by the quorum A majority of operational and connected SPs should be maintained Every future majority overlaps with the earlier majority that agreed on a file 1/27/2010 11

Quorum-based Replication Each SP maintains connectivity by exchanging liveness messages with its peers. If there is no quorum of SPs, the system halts and refuses to accept updates Monitored by the NOCC SPs are located in distinct ISPs networks to reduce the probability of an outage 1/27/2010 12

Acceptance Algorithm Acceptance Algorithm Phases Replication the Accepting SP copies the update to at least a quorum of the SPs Agreement Vector exchange protocol

Acceptance Algorithm Continued S P UID for a configuration file foo: foo.a.1234 S P Agreement algorithm? S P S P S P A publisher contacts an accepting SP The Accepting SP first creates a temporary file with a unique filename (UID) The accepting SP sends this file to a number of SPs If replication succeeds the accepting SP initiates an agreement algorithm called Vector Exchange Upon success the accepting SP accepts and all SPs upload the new file

Acceptance Algorithm Continued The VE vector is just a bit vector with a bit corresponding to each Storage Point A 1-bit indicates that the corresponding Storage Point knows of a given update When a majority of bits are set to 1, we say that an agreement occurs and it is safe for any SP to upload this latest update 1/27/2010 15

Acceptance Algorithm Continued The Accepting SP sets its own bit to 1 and the rest to 0, broadcasts the vector along with the UID of the update to the other SPs Any SP that sees the vector sets its corresponding bit to 1, re-broadcasts the modified vector to the rest of the SPs Each SP learns of the agreement independently when it sees a quorum of bits set When the Accepting SP that initiated the VE instance learns of the agreement it accepts the submission of the publishing application When any SP learns of the agreement it uploads the file

Acceptance Algorithm Continued A S P S P B A initiates and broadcasts a vector: A:1 B:0 C:0 D:0 E:0 S P C sets its own bit and re-broadcasts: A:1 B:0 C:1 D:0 E:0 E D S P S P C D sets its bit and rebroadcasts A:1 B:0 C:1 D:1 E:0 Any SP learns of the agreement when it sees a majority of bits set.

Recovery The recovery protocol is called Index Merging SPs continuously run the background recovery protocol with one another The downloadable configuration files are represented on the SPs in the form of an index tree The SPs merge their index trees to pick up any missed updates from one another The Download Points also need to sync up state 1/27/2010 18

Architecture Optimization Download Points DP DP SP SP Storage Points DP DP SP SP SP Receivers

The Index Tree Configuration files are organized into a tree and split into groups: Group Index file: lists the UIDs of the latest agreed upon updates for each file in the group Root Index files: lists all Group Index files together with the latest modification timestamps of those indexes 1/27/2010 20

The Index Merging Algorithm Snapshot is a hierarchical index structure that describes latest versions of all accepted files Each SP updates its own snapshot when it learns of a quorum agreement For full recovery each SP needs only to merge in a snapshot from majority-1 other SPs SPs merge their index stamp, not just the data listed in those indexes 1/27/2010 21

Timestamping Rules 1. When a Storage Point includes new information inside an index file, its next iteration must increase the file s timestamp 2. An SP should always set its index s timestamp to the highest timestamp for that index among its peers (even if it does not include new info) (Snapshots are also used by the edge servers to detect changes) 1/27/2010 22

Correctness Requirement T is the maximum allowed clock skew between any two communicating SPs The communication protocol enforces this bound by rejecting liveness messages from SPs that are at least T seconds apart No two SPs that accepts updates for the same file can have a clock skew more than 2T seconds Updates for the same file submitted at least 2T+1 seconds apart will be ordered correctly 1/27/2010 23

Data Delivery Processes on edge servers subscribe to specific configurations via their local Receiver process A Receiver checks for updates to the subscription tree by making HTTP IMS requests recursively If the updates match any subscriptions the Receivers download the files 1/27/2010 24

Evaluation Workload of the ACMS front-end over a 48 hour period in the middle of a work week The period from the time an 14,276 total file submissions on the system Five operating Storage Points Accepting SP is first contacted by a publishing application, until it replies with Accept Size range Avg file sz Distribution Avg time(s) 0K-1K 290 40% 0.61 1K-10K 3K 26% 0.63 10K-100K 22K 23% 0.72 100K-1M 167K 7% 2.23 1M-10M 1.8M 1% 13.63 10M-100M 51M 3% 199.87

Propagation Time Distribution A random sampling of 250 Akamai nodes The average propagation time is approximately 55 seconds

Propagation times for various size files Mean and 95thpercentile delivery time for each submission The average time for The average propagation each file to propagate to time 95% of its recipients 99.95% of updates arrived within three minutes The remaining 0.05% were delayed due to temporary network connectivity issues

Discussion Push-based vs. pull-based update The effect of increasing the number of SP on the efficiency The effect of having less number on nodes in the quorum The effect of having a variable sized quorum Consistency vs. availability trade-offs in quorum selection The trade-off of having great cacheability 1/27/2010 28

Impressions Just a cool implementation, nothing more? Anything new? Sheer size? Solves a specific problem Thank you! Credit: This presentation was based on the work of Parya Moinzadeh, UIUC 1/27/2010 29