Surviving congestion in geo-distributed storage systems

Size: px
Start display at page:

Download "Surviving congestion in geo-distributed storage systems"

Transcription

1 Surviving congestion in geo-distributed storage systems Brian Cho Marcos K. Aguilera University of Illinois at Urbana-Champaign Microsoft Research Silicon Valley

2 Geo-distributed data centers Web applications increasingly deployed across geo-distributed data centers e.g., social networks, online stores, messaging App data replicated across data centers Disaster tolerance Access locality 2

3 Congestion between geo-distributed data centers Limited bandwidth between data centers e.g., leased lines, MPLS VPN Bandwidth is expensive: ~K $/Mbps [SprintMPLS] Provision for typical (not peak) usage Many machines in each data center 3

4 Congestion Delay between geo-distributed data centers Congestion can cause significant delays TCP messaging increases to order-of-seconds (Figure) Observed across Amazon EC2 data centers [Kraska et al] Users do not tolerate delays (<s) [Nielsen] FIGURE: RPC round trip delay under congestion (0-30s) 4

5 Replication techniques applied to geo-distributed data centers Weak consistency e.g., Amazon Dynamo, Yahoo PNUTS, COPS Good performance: updates can be propagated asynchronously Semantics undesirable in some cases (e.g., writes get re-ordered across replicas) Strong consistency e.g., ABD, Paxos, available in Google Megastore, Amazon SimpleDB Avoids the many problems of weak consistency Must wait for updates to propagate across data centers App delay requirements difficult to meet under congestion 5

6 Contributions Vivace: a strongly consistent key-value store that is resilient to congestion across geo-distributed data centers Approach New algorithms send small amount of critical information across data centers in separate prioritized messages Challenges Still provide strong consistency Keep prioritized messages small Avoid delay overhead in absence of congestion 6

7 Vivace algorithms Enhance previous strongly consistent algorithms Prioritize small amount of critical information across sites 7

8 Vivace algorithms Enhance previous strongly consistent algorithms Prioritize small amount of critical information across sites Two algorithms:. Read/write algorithm Very simple Based on traditional quorum algorithm [ABD] Linearizable read() and write() read() contains a write-back phase 2. State machine replication algorithm More complex, details in paper 8

9 Traditional quorum algorithm: write val is large (compared with key & ts) <WRITE,key,val,ts> 9

10 Traditional quorum algorithm: write <WRITE,key,val,ts> <ACK-WRITE> 0

11 Traditional quorum algorithm: write write done <WRITE,key,val,ts> <ACK-WRITE>

12 Traditional quorum algorithm: read <READ,key> 2

13 Traditional quorum algorithm: read <READ,key> <ACK-READ,val,ts> large val 3

14 Traditional quorum algorithm: read writeback: ensures strong consistency (linearizability) 2 <WRITE,key,val,ts> large val, again! 4

15 Traditional quorum algorithm: read writeback: ensures strong consistency (linearizability) 2 large val, again! <WRITE,key,val,ts> <ACK-WRITE> 5

16 Traditional quorum algorithm: read read done 2 6

17 Vivace: write Replica 2 Replica 3 new quorum of local replicas 7

18 Vivace: write val sent locally <W-LOCAL,key,val,ts> Replica 2 Replica 3 8

19 Vivace: write <W-LOCAL,key,val,ts> <ACK-W-LOCAL> Replica 2 Replica 3 9

20 Vivace: write prioritize no val: small message! 2 <W-TS,key,ts> Replica 2 Replica 3 20

21 Vivace: write prioritize no val: small message! 2 <W-TS,key,ts> <ACK-W-TS> Replica 2 Replica 3 2

22 Vivace: write write done 2,2,3 have a consistent view of key & ts, but no val (yet) Replica 2 Replica 3 22

23 Vivace: write,2,3 add val to their consistent view of key & ts 2 * <W-REMOTE,key,val,ts> Replica 2 Replica 3 val is still large, but not in critical path 23

24 write comparison Traditional algorithm: remote RTT 2 * Replica 2 Replica 3 Vivace algorithm: prioritized remote RTT + local RTT 24

25 Vivace: read prioritize only ask for ts <R-TS,key> Replica 2 Replica 3 25

26 Vivace: read prioritize small message <R-TS,key> <ACK-R-TS,ts> Replica 2 Replica 3 26

27 Vivace: read 2 <R-DATA,key,ts> ask for data with largest ts Replica 2 Replica 3 27

28 Vivace: read 2 <R-DATA,key,ts> <ACK-R-DATA,val> large val, but wait for only one reply (common case: local) Replica 2 Replica 3 28

29 Vivace: read prioritize 2 <W-TS,key,ts> 3 writeback only small ts Replica 2 Replica 3 29

30 Vivace: read prioritize 2 <W-TS,key,ts> <ACK-W-TS> 3 Replica 2 Replica 3 30

31 Vivace: read read done 2 3 Replica 2 Replica 3 3

32 read comparison 2 Traditional algorithm: 2 remote RTTs 2 3 Vivace algorithm: 2 prioritized remote RTT + local RTT 32

33 Evaluation topics Practical prioritization setup Delay with congestion KV-store operations Twitter clone web app operations Delay without congestion Overhead of Vivace algorithms compared to traditional algorithms 33

34 Evaluation setup cluster <-> Amazon EC2 Ireland DSCP bit prioritization on local router s egress port Congestion generated with iperf prioritization applied here only cluster (Illinois) Amazon EC2 (Ireland) 34

35 Evaluation Does prioritization work in practice? Simple ping experiment Prioritized messages bypass congestion router-based prioritization is effective 35

36 Evaluation How well does Vivace perform under congestion? KV-store operations Twitter-clone operations (a) Read algorithms (a) Post tweet (b) Write algorithms (b) Read user timeline (c) State machine algorithms (c) Read friends timeline 36

37 Evaluation How well does Vivace perform under congestion? avoids congestion delays 2 remote RTTs TCP resend on packet loss 2 prioritized remote RTTs + local RTT buffering delay (a) Read algorithms 37

38 Evaluation What is the overhead of Vivace without congestion? (Results in paper) No measurable overhead compared to traditional algorithms Extra message phases are not harmful 38

39 Conclusion Proposed two new algorithms Read/write (simple, in talk) State machine (more complex, in paper) Both algorithms avoid delay due to congestion by prioritizing a small amount of critical information, while Still providing strong consistency Keeping prioritized messages small Avoiding delay overhead in absence of congestion Using a practical prioritization infrastructure Careful use of prioritized messages can be an effective strategy in geo-distributed data centers 39

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

There Is More Consensus in Egalitarian Parliaments

There Is More Consensus in Egalitarian Parliaments There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports

More information

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8) PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao *, Quanlu Zhang, Zhi Yang *, Ming Wu, Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance

More information

10 Reasons your WAN is Broken

10 Reasons your WAN is Broken Lack of Visibility Most WAN performance problems are driven by underperforming connections or applications. It isn t uncommon to be paying for a 20 Mbps WAN link that performs at 10 Mbps. The root cause

More information

NWEN 243. Networked Applications. Transport layer and application layer

NWEN 243. Networked Applications. Transport layer and application layer NWEN 243 Networked Applications Transport layer and application layer 1 Topic TCP flow control TCP congestion control The Application Layer 2 Fast Retransmit Time-out period often relatively long: long

More information

Erasure Coding in Object Stores: Challenges and Opportunities

Erasure Coding in Object Stores: Challenges and Opportunities Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe

More information

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li Janus Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li New York University, University of Southern California State of the Art

More information

estadium Project Lab 2: Iperf Command

estadium Project Lab 2: Iperf Command estadium Project Lab 2: Iperf Command Objectives Being familiar with the command iperf. In this Lab, we will set up two computers (PC1 and PC2) as an ad-hoc network and use the command iperf to measure

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Designing Distributed Systems using Approximate Synchrony in Data Center Networks

Designing Distributed Systems using Approximate Synchrony in Data Center Networks Designing Distributed Systems using Approximate Synchrony in Data Center Networks Dan R. K. Ports Jialin Li Naveen Kr. Sharma Vincent Liu Arvind Krishnamurthy University of Washington CSE Today s most

More information

Improving the Robustness of TCP to Non-Congestion Events

Improving the Robustness of TCP to Non-Congestion Events Improving the Robustness of TCP to Non-Congestion Events Presented by : Sally Floyd floyd@acm.org For the Authors: Sumitha Bhandarkar A. L. Narasimha Reddy {sumitha,reddy}@ee.tamu.edu Problem Statement

More information

2.993: Principles of Internet Computing Quiz 1. Network

2.993: Principles of Internet Computing Quiz 1. Network 2.993: Principles of Internet Computing Quiz 1 2 3:30 pm, March 18 Spring 1999 Host A Host B Network 1. TCP Flow Control Hosts A, at MIT, and B, at Stanford are communicating to each other via links connected

More information

4 Myths about in-memory databases busted

4 Myths about in-memory databases busted 4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

EE st Term Exam Date: October 9, 2002

EE st Term Exam Date: October 9, 2002 EE 122 1 st Term Exam Date: October 9, 2002 Name: SID: ee122 login: Day/time of section you attend: Problem Points 1 /10 2 /10 3 /20 4 /20 5 /20 6 /20 Total /100 1. Question 1 (10 pt) Use no more than

More information

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem

More information

BIG DATA AND CONSISTENCY. Amy Babay

BIG DATA AND CONSISTENCY. Amy Babay BIG DATA AND CONSISTENCY Amy Babay Outline Big Data What is it? How is it used? What problems need to be solved? Replication What are the options? Can we use this to solve Big Data s problems? Putting

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

Apache Cassandra - A Decentralized Structured Storage System

Apache Cassandra - A Decentralized Structured Storage System Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics

More information

Fault-Tolerant Distributed Services and Paxos"

Fault-Tolerant Distributed Services and Paxos Fault-Tolerant Distributed Services and Paxos" INF346, 2015 2015 P. Kuznetsov and M. Vukolic So far " " Shared memory synchronization:" Wait-freedom and linearizability" Consensus and universality " Fine-grained

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering

More information

Linux Plumbers Conference TCP-NV Congestion Avoidance for Data Centers

Linux Plumbers Conference TCP-NV Congestion Avoidance for Data Centers Linux Plumbers Conference 2010 TCP-NV Congestion Avoidance for Data Centers Lawrence Brakmo Google TCP Congestion Control Algorithm for utilizing available bandwidth without too many losses No attempt

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput Topics TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput 2 Introduction In this chapter we will discuss TCP s form of flow control called a sliding window protocol It allows

More information

Replication. Feb 10, 2016 CPSC 416

Replication. Feb 10, 2016 CPSC 416 Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front

More information

Communications Software. CSE 123b. CSE 123b. Spring Lecture 3: Reliable Communications. Stefan Savage. Some slides couresty David Wetherall

Communications Software. CSE 123b. CSE 123b. Spring Lecture 3: Reliable Communications. Stefan Savage. Some slides couresty David Wetherall CSE 123b CSE 123b Communications Software Spring 2002 Lecture 3: Reliable Communications Stefan Savage Some slides couresty David Wetherall Administrativa Home page is up and working http://www-cse.ucsd.edu/classes/sp02/cse123b/

More information

Page 1. Key Value Storage" System Examples" Key Values: Examples "

Page 1. Key Value Storage System Examples Key Values: Examples CS162 Operating Systems and Systems Programming Lecture 15 Key-Value Storage, Network Protocols" October 22, 2012! Ion Stoica! http://inst.eecs.berkeley.edu/~cs162! Key Value Storage" Interface! put(key,

More information

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability Recap Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo Consistency models Linearizability Sequential consistency Causal consistency Eventual consistency Depending

More information

Page 1. Key Value Storage"

Page 1. Key Value Storage Key Value Storage CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Handle huge volumes

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

Page 1. Key Value Storage" System Examples" Key Values: Examples "

Page 1. Key Value Storage System Examples Key Values: Examples Key Value Storage" CS162 Operating Systems and Systems Programming Lecture 15 Key-Value Storage, Network Protocols" March 20, 2013 Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Interface put(key,

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol

More information

How Eventual is Eventual Consistency?

How Eventual is Eventual Consistency? Probabilistically Bounded Staleness How Eventual is Eventual Consistency? Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica (UC Berkeley) BashoChats 002, 28 February

More information

Asynchronous View Maintenance for VLSD Databases

Asynchronous View Maintenance for VLSD Databases Asynchronous View Maintenance for VLSD Databases Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu Ramakrishnan SIGMOD 2009 Talk by- Prashant S. Jaiswal Ketan J. Mav Motivation

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

CPE 448/548 Exam #1 (100 pts) February 14, Name Class: 448

CPE 448/548 Exam #1 (100 pts) February 14, Name Class: 448 Name Class: 448 1) (14 pts) A message M = 11001 is transmitted from node A to node B using the CRC code. The CRC generator polynomial is G(x) = x 3 + x 2 + 1 ( bit sequence 1101) a) What is the transmitted

More information

416 practice questions (PQs)

416 practice questions (PQs) 416 practice questions (PQs) 1. Goal: give you some material to study for the final exam and to help you to more actively engage with the material we cover in class. 2. Format: questions that are in scope

More information

CRC. Implementation. Error control. Software schemes. Packet errors. Types of packet errors

CRC. Implementation. Error control. Software schemes. Packet errors. Types of packet errors CRC Implementation Error control An Engineering Approach to Computer Networking Detects all single bit errors almost all 2-bit errors any odd number of errors all bursts up to M, where generator length

More information

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa Optimizing Network Performance in Distributed Machine Learning Luo Mai Chuntao Hong Paolo Costa Machine Learning Successful in many fields Online advertisement Spam filtering Fraud detection Image recognition

More information

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Ambry: LinkedIn s Scalable Geo- Distributed Object Store Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil

More information

ECE 697J Advanced Topics in Computer Networks

ECE 697J Advanced Topics in Computer Networks ECE 697J Advanced Topics in Computer Networks Network Measurement 12/02/03 Tilman Wolf 1 Overview Lab 3 requires performance measurement Throughput Collecting of packet headers Network Measurement Active

More information

DrRobert N. M. Watson

DrRobert N. M. Watson Distributed systems Lecture 15: Replication, quorums, consistency, CAP, and Amazon/Google case studies DrRobert N. M. Watson 1 Last time General issue of consensus: How to get processes to agree on something

More information

CSE/EE 461 Lecture 16 TCP Congestion Control. TCP Congestion Control

CSE/EE 461 Lecture 16 TCP Congestion Control. TCP Congestion Control CSE/EE Lecture TCP Congestion Control Tom Anderson tom@cs.washington.edu Peterson, Chapter TCP Congestion Control Goal: efficiently and fairly allocate network bandwidth Robust RTT estimation Additive

More information

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

OR /2017-E. White Paper KARL STORZ OR1 FUSION IP. Unified Communication and Virtual Meeting Rooms WHITE PAPER

OR /2017-E. White Paper KARL STORZ OR1 FUSION IP. Unified Communication and Virtual Meeting Rooms WHITE PAPER OR1 32 1.0 11/2017-E White Paper KARL STORZ OR1 FUSION IP Unified Communication and Virtual Meeting Rooms WHITE PAPER Contents 1 Description KARL STORZ OR1 FUSION... 3 2 Microsoft Skype for Business (SfB)...

More information

Data Store Consistency. Alan Fekete

Data Store Consistency. Alan Fekete Data Store Consistency Alan Fekete Outline Overview Consistency Properties - Research paper: Wada et al, CIDR 11 Transaction Techniques - Research paper: Bailis et al, HotOS 13, VLDB 14 Each data item

More information

Low-Latency Multi-Datacenter Databases using Replicated Commit

Low-Latency Multi-Datacenter Databases using Replicated Commit Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi UCSB Presented by Ashutosh Dhekne Main Contributions Reduce

More information

Network Management & Monitoring

Network Management & Monitoring Network Management & Monitoring Network Delay These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/) End-to-end

More information

Transaction Management using Causal Snapshot Isolation in Partially Replicated Databases

Transaction Management using Causal Snapshot Isolation in Partially Replicated Databases Transaction Management using Causal Snapshot Isolation in Partially Replicated Databases ABSTRACT We present here a transaction management protocol using snapshot isolation in partially replicated multi-version

More information

Security (and finale) Dan Ports, CSEP 552

Security (and finale) Dan Ports, CSEP 552 Security (and finale) Dan Ports, CSEP 552 Today Security: what if parts of your distributed system are malicious? BFT: state machine replication Bitcoin: peer-to-peer currency Course wrap-up Security Too

More information

Fundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1)

Fundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1) Fundamentals Large-Scale Distributed System Design (a.k.a. Distributed Systems 1) https://columbia.github.io/ds1-class/ 1 Interested in... 1. scalable web services? 2. big data? 3. and the large-scale

More information

VPNs. Communication Technologies Last Points (briefly) VPNs Technologies. Satellite Networks. Telemedicina e e-saúde 2009/2010

VPNs. Communication Technologies Last Points (briefly) VPNs Technologies. Satellite Networks. Telemedicina e e-saúde 2009/2010 Communication Technologies Last Points (briefly) VPNs Virtual Private Networks Main objective o Enable communication between two LANs as if they were together and separated. Telemedicina e e-saúde 2009/10

More information

Concurrency Control II and Distributed Transactions

Concurrency Control II and Distributed Transactions Concurrency Control II and Distributed Transactions CS 240: Computing Systems and Concurrency Lecture 18 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material.

More information

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary Consistency and Replication Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary Reasons for Replication Reliability/Availability : Mask failures Mask corrupted data Performance: Scalability

More information

Cloud-Hosted Databases: Technologies, Challenges and Opportunities

Cloud-Hosted Databases: Technologies, Challenges and Opportunities Cloud-Hosted Databases: Technologies, Challenges and Opportunities Sherif Sakr Abstract One of the main advantages of the cloud computing paradigm is that it simplifies the time-consuming processes of

More information

A Scalable, Commodity Data Center Network Architecture

A Scalable, Commodity Data Center Network Architecture A Scalable, Commodity Data Center Network Architecture B Y M O H A M M A D A L - F A R E S A L E X A N D E R L O U K I S S A S A M I N V A H D A T P R E S E N T E D B Y N A N X I C H E N M A Y. 5, 2 0

More information

Module 7 - Replication

Module 7 - Replication Module 7 - Replication Replication Why replicate? Reliability Avoid single points of failure Performance Scalability in numbers and geographic area Why not replicate? Replication transparency Consistency

More information

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service. Goals Memcache as a Service Tom Anderson Rapid application development - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every

More information

Topic 6: SDN in practice: Microsoft's SWAN. Student: Miladinovic Djordje Date:

Topic 6: SDN in practice: Microsoft's SWAN. Student: Miladinovic Djordje Date: Topic 6: SDN in practice: Microsoft's SWAN Student: Miladinovic Djordje Date: 17.04.2015 1 SWAN at a glance Goal: Boost the utilization of inter-dc networks Overcome the problems of current traffic engineering

More information

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques Recap CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A?

More information

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

More information

LNET MULTI-RAIL RESILIENCY

LNET MULTI-RAIL RESILIENCY 13th ANNUAL WORKSHOP 2017 LNET MULTI-RAIL RESILIENCY Amir Shehata, Lustre Network Engineer Intel Corp March 29th, 2017 OUTLINE Multi-Rail Recap Base Multi-Rail Dynamic Discovery Multi-Rail performance

More information

Geographic State Machine Replication

Geographic State Machine Replication Università della Svizzera italiana USI Technical Report Series in Informatics Geographic State Machine Replication Paulo Coelho 1, Fernando Pedone 1 1 Faculty of Informatics, Università della Svizzera

More information

FUJITSU Software Interstage Information Integrator V11

FUJITSU Software Interstage Information Integrator V11 FUJITSU Software V11 An Innovative WAN optimization solution to bring out maximum network performance October, 2013 Fujitsu Limited Contents Overview Key technologies Supported network characteristics

More information

ETSF10 Internet Protocols Transport Layer Protocols

ETSF10 Internet Protocols Transport Layer Protocols ETSF10 Internet Protocols Transport Layer Protocols 2012, Part 2, Lecture 2.1 Kaan Bür, Jens Andersson Transport Layer Protocols Process-to-process delivery [ed.4 ch.23.1] [ed.5 ch.24.1] Transmission Control

More information

Configuring Cisco IOS IP SLAs Operations

Configuring Cisco IOS IP SLAs Operations CHAPTER 39 This chapter describes how to use Cisco IOS IP Service Level Agreements (SLAs) on the switch. Cisco IP SLAs is a part of Cisco IOS software that allows Cisco customers to analyze IP service

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers

More information

Lecture 5: Flow Control. CSE 123: Computer Networks Alex C. Snoeren

Lecture 5: Flow Control. CSE 123: Computer Networks Alex C. Snoeren Lecture 5: Flow Control CSE 123: Computer Networks Alex C. Snoeren Pipelined Transmission Sender Receiver Sender Receiver Ignored! Keep multiple packets in flight Allows sender to make efficient use of

More information

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Vaibhav Arora, Tanuj Mittal, Divyakant Agrawal, Amr El Abbadi * and Xun Xue, Zhiyanan,

More information

Announcements. No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6

Announcements. No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6 Announcements No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6 Copyright c 2002 2017 UMaine Computer Science Department 1 / 33 1 COS 140: Foundations

More information

Q-Balancer Range FAQ The Q-Balance LB Series General Sales FAQ

Q-Balancer Range FAQ The Q-Balance LB Series General Sales FAQ Q-Balancer Range FAQ The Q-Balance LB Series The Q-Balance Balance Series is designed for Small and medium enterprises (SMEs) to provide cost-effective solutions for link resilience and load balancing

More information

Interoute Use Case. SQL 2016 Always On in Interoute VDC. Last updated 11 December 2017 ENGINEERED FOR THE AMBITIOUS

Interoute Use Case. SQL 2016 Always On in Interoute VDC. Last updated 11 December 2017 ENGINEERED FOR THE AMBITIOUS Interoute Use Case SQL 2016 Always On in Interoute VDC Last updated 11 December 2017 ENGINEERED FOR THE AMBITIOUS VERSION HISTORY Version Date Title Author 1 11 / 12 / 17 SQL 2016 Always On in Interoute

More information

Random Early Drop with In & Out (RIO) for Asymmetrical Geostationary Satellite Links

Random Early Drop with In & Out (RIO) for Asymmetrical Geostationary Satellite Links Proc. Joint Int l Conf. IEEE MICC 001, LiSLO 001, ISCE 001, Kuala Lumpur, Malaysia Oct 1-4, 001 Random Early Drop with In & Out (RIO) for Asymmetrical Geostationary Satellite Links Tat Chee Wan, Member,

More information

Distributed Systems. 27. Engineering Distributed Systems. Paul Krzyzanowski. Rutgers University. Fall 2018

Distributed Systems. 27. Engineering Distributed Systems. Paul Krzyzanowski. Rutgers University. Fall 2018 Distributed Systems 27. Engineering Distributed Systems Paul Krzyzanowski Rutgers University Fall 2018 1 We need distributed systems We often have a lot of data to ingest, process, and/or store The data

More information

WarpTCP WHITE PAPER. Technology Overview. networks. -Improving the way the world connects -

WarpTCP WHITE PAPER. Technology Overview. networks. -Improving the way the world connects - WarpTCP WHITE PAPER Technology Overview -Improving the way the world connects - WarpTCP - Attacking the Root Cause TCP throughput reduction is often the bottleneck that causes data to move at slow speed.

More information

Exam - Final. CSCI 1680 Computer Networks Fonseca. Closed Book. Maximum points: 100 NAME: 1. TCP Congestion Control [15 pts]

Exam - Final. CSCI 1680 Computer Networks Fonseca. Closed Book. Maximum points: 100 NAME: 1. TCP Congestion Control [15 pts] CSCI 1680 Computer Networks Fonseca Exam - Final Due: 11:00am, May 10th, 2012 Closed Book. Maximum points: 100 NAME: 1. TCP Congestion Control [15 pts] a. TCP Tahoe and Reno have two congestion-window

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions

More information

Homework 2 COP The total number of paths required to reach the global state is 20 edges.

Homework 2 COP The total number of paths required to reach the global state is 20 edges. Homework 2 COP 5611 Problem 1: 1.a Global state lattice 1. The total number of paths required to reach the global state is 20 edges. 2. In the global lattice each and every edge (downwards) leads to a

More information

Data Storage Revolution

Data Storage Revolution Data Storage Revolution Relational Databases Object Storage (put/get) Dynamo PNUTS CouchDB MemcacheDB Cassandra Speed Scalability Availability Throughput No Complexity Eventual Consistency Write Request

More information

Riak. Distributed, replicated, highly available

Riak. Distributed, replicated, highly available INTRO TO RIAK Riak Overview Riak Distributed Riak Distributed, replicated, highly available Riak Distributed, highly available, eventually consistent Riak Distributed, highly available, eventually consistent,

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6

No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6 Announcements No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 33 COS 140:

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

CS519: Computer Networks. Lecture 5, Part 1: Mar 3, 2004 Transport: UDP/TCP demux and flow control / sequencing

CS519: Computer Networks. Lecture 5, Part 1: Mar 3, 2004 Transport: UDP/TCP demux and flow control / sequencing : Computer Networks Lecture 5, Part 1: Mar 3, 2004 Transport: UDP/TCP demux and flow control / sequencing Recall our protocol layers... ... and our protocol graph IP gets the packet to the host Really

More information

CTCP (Circuit TCP) v1.0

CTCP (Circuit TCP) v1.0 Outline C (Circuit ) v1.0 Helali Bhuiyan and Malathi Veeraraghavan {helali,mv5g}@virginia.edu Purpose of C Requirements to run C code C Components C Operation C details C Usage Usage of C across an Internet

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

Replicated State Machine in Wide-area Networks

Replicated State Machine in Wide-area Networks Replicated State Machine in Wide-area Networks Yanhua Mao CSE223A WI09 1 Building replicated state machine with consensus General approach to replicate stateful deterministic services Provide strong consistency

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

TCP over Wireless PROF. MICHAEL TSAI 2016/6/3

TCP over Wireless PROF. MICHAEL TSAI 2016/6/3 TCP over Wireless PROF. MICHAEL TSAI 2016/6/3 2 TCP Congestion Control (TCP Tahoe) Only ACK correctly received packets Congestion Window Size: Maximum number of bytes that can be sent without receiving

More information

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing

More information

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University Ovid A Software-Defined Distributed Systems Framework Deniz Altinbuken, Robbert van Renesse Cornell University Ovid Build distributed systems that are easy to evolve easy to reason about easy to compose

More information

Scalable Causal Consistency for Wide-Area Storage with COPS

Scalable Causal Consistency for Wide-Area Storage with COPS Don t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU The Key-value

More information

Outline. Circuit Switching. Circuit Switching : Introduction to Telecommunication Networks Lectures 13: Virtual Things

Outline. Circuit Switching. Circuit Switching : Introduction to Telecommunication Networks Lectures 13: Virtual Things 8-5: Introduction to Telecommunication Networks Lectures : Virtual Things Peter Steenkiste Spring 05 www.cs.cmu.edu/~prs/nets-ece Outline Circuit switching refresher Virtual Circuits - general Why virtual

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information