Volley: Automated Data Placement for Geo-Distributed Cloud Services

Similar documents
volley: automated data placement for geo-distributed cloud services

Data Center Services and Optimization. Sobir Bazarbayev Chris Cai CS538 October

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Finding a needle in Haystack: Facebook's photo storage

DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines

Research Faculty Summit Systems Fueling future disruptions

VERITAS Volume Replicator. Successful Replication and Disaster Recovery

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment

Building Infrastructure for Private Clouds Cloud InterOp 2014"

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Cost-Effective Virtual Petabytes Storage Pools using MARS. FrOSCon 2017 Presentation by Thomas Schöbel-Theuer

CouchDB-based system for data management in a Grid environment Implementation and Experience

Minimum-cost Cloud Storage Service Across Multiple Cloud Providers

CloudAP: Improving the QoS of Mobile Applications with Efficient VM Migration

Network Requirements for Resource Disaggregation

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

vsan Disaster Recovery November 19, 2017

Datacenter replication solution with quasardb

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

Network-Aware Resource Allocation in Distributed Clouds

Zero Downtime Migrations

QuartzV: Bringing Quality of Time to Virtual Machines

Real-time Protection for Microsoft Hyper-V

Discovering Dependencies between Virtual Machines Using CPU Utilization. Renuka Apte, Liting Hu, Karsten Schwan, Arpan Ghosh

High-Availability Practice of ZTE Cloud-Based Core Network

Towards Optimal Data Replication Across Data Centers

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC

Review of Morphus Abstract 1. Introduction 2. System design

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

EXAM Pro: Windows Server 2008 R2, Virtualization Administrator. Buy Full Product.

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Copyright 2012 EMC Corporation. All rights reserved.

Venice: Reliable Virtual Data Center Embedding in Clouds

Cost-Effective Virtual Petabytes Storage Pools using MARS. LCA 2018 Presentation by Thomas Schöbel-Theuer

Sinbad. Leveraging Endpoint Flexibility in Data-Intensive Clusters. Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica. UC Berkeley

VoltDB vs. Redis Benchmark

Scaling the LTE Control-Plane for Future Mobile Access

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Correlation based File Prefetching Approach for Hadoop

Leveraging the power of Flash to Enable IT as a Service

The vsphere 6.0 Advantages Over Hyper- V

Software Defined Storage for the Evolving Data Center

31270 Networking Essentials Focus, Pre-Quiz, and Sample Exam Answers

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution

Nimble Storage Adaptive Flash

Group-Based Management of Distributed File Caches

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

A Scalable Content- Addressable Network

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Fault-Tolerant Storage and Implications for the Cloud Charles Snyder

I/O CANNOT BE IGNORED

Deployment Models for Cisco UMG

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Search Engines. Information Retrieval in Practice

Selective Data Replication for Online Social Networks with Distributed Datacenters Guoxin Liu *, Haiying Shen *, Harrison Chandler *

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

VERITAS Volume Replicator Successful Replication and Disaster Recovery

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Data Centers and Cloud Computing

Understanding Cloud Migration. Ruth Wilson, Data Center Services Executive

StorMagic SvSAN: A virtual SAN made simple

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

An EUREKA Celtic-plus project SENDATE-EXTEND. Adj. prof. Tor Björn Minde, CEO SICS North Swedish ICT AB (Head of Strategy Ericsson Research)

Replication architecture

Blizzard: A Distributed Queue

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

Lecture 16: Data Center Network Architectures

Data Centers and Cloud Computing. Data Centers

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai, Mosharaf Chowdhury, Harsha Madhyastha

BigDataBench-MT: Multi-tenancy version of BigDataBench

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

Getafix: Workload-aware Distributed Interactive Analytics

Delivering unprecedented performance, efficiency and flexibility to modernize your IT

Model-Driven Geo-Elasticity In Database Clouds

Replica Placement. Replica Placement

Differentiating Your Datacentre in the Networked Future John Duffin

VMware vcloud Air User's Guide

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

HEP replica management

Replication. Some uses for replication:

Distributed Autonomous Virtual Resource Management in Datacenters Using Finite- Markov Decision Process

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Peter X. Gao, Andrew R. Curtis, Bernard Wong, S. Keshav. Cheriton School of Computer Science University of Waterloo

Computing Environments

School of Computing & Singapore-MIT Alliance, National University of Singapore {cuibin, lindan,

Tech Note: vsphere Replication with vsan First Published On: Last Updated On:

The Google File System

Democratically Finding The Cause of Packet Drops

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

2014 VMware Inc. All rights reserved.

Consolidating Complementary VMs with Spatial/Temporalawareness

Assignment 5. Georgia Koloniari

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

The Desired State. Solving the Data Center s N-Dimensional Challenge

Transcription:

Volley: Automated Data Placement for Geo-Distributed Cloud Services Authors: Sharad Agarwal, John Dunagen, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bogan 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2010) Martin Beyer December 14, 2010 1 / 45

Introduction Cloud Services Users from all continents want to collaborate through cloud services Do not accept high latencies Cloud services deal with highly dynamic data (e.g. Facebook wall) Placement of user & application data 2 / 45

Introduction Worldwide Distribution Serve all users from the best datacenter (DC) with respect to user perceived latency Cloud service providers use many geographically dispersed DCs What data to store at which datacenter? Interdependencies between data items Minimize operational cost of the datacenters Inter-DC traffic due to data sharing or interdependencies Provisioned capacity at each DC 3 / 45

Introduction Replication Data replication for fault-tolerance Hardware failures Natural disasters Replication for availability Large scale outages No single point of failure Replicas need to communicate frequently Synchronization Ensure consistency 4 / 45

Introduction Impacts of Data Placement Latency increases between distant locations Move data near the users that most frequently access it Amount of inter-dc traffic influences bandwidth costs Colocate data items Capacity skew among DCs increases hardware costs Uniformly distribute among DCs 5 / 45

Introduction Approaches to Data Placement How to find a good data placement that reduces latency and operational cost? Full replication at each datacenter Lowest latency for the users Excessive costs for DC operators Single DC holds all data No inter-dc traffic Many unhappy users due to high latency Partition data across multiple DCs Challenging problem to find good placement Need to analyze patterns of data access Process 10 8 objects 6 / 45

1 Introduction 2 Analysis of Cloud Services 3 Data Placement 4 Evaluation 5 Conclusion

Analysis of Cloud Services Challenges of Data Placement Cloud services deal with highly dynamic data High update rates lead to stale replicas Updates need to be visible worldwide Collaboration around the world Users work together on a shared data item Data interdependencies Publish-Subscribe mechanisms; Friend of a friend Can be modeled as dependency graph Generate huge data sets Need solutions for efficient analysis of the dependency graph 8 / 45

Analysis of Cloud Services Challenges of Data Placement (cont d) Applications change frequently Need to continuously adapt to changing usage patterns Increasing user mobility When should data be migrated to new location? Infrastructure can change Capacity limits or latencies between DCs may change 9 / 45

Analysis of Cloud Services Network Traces Datacenter applications collect workload traces Month-long log from Live Mesh and Live Messenger Analysis focuses on the aspects of Shared data Data interdependencies User mobility 10 / 45

Analysis of Cloud Services Live Mesh File & Application synchronization Cloud storage Data feeds 11 / 45

Analysis of Cloud Services Live Messenger Instant Messaging Video conferencing Continuous group conversation Contact status updates 12 / 45

Analysis of Cloud Services Facebook Facebook wall Connects users to all of their friends Users can receive updates via RSS feeds Interdependencies between walls and RSS feeds Client 1 Client 2 Facebook Wall 1 Facebook Wall 2 Facebook RSS Feed 1 Facebook RSS Feed 2 13 / 45

Analysis of Cloud Services Data Sharing in Live Mesh Clients access Live Mesh through Web frontend Update to device connectivity status Multiple queue items can subscribe to publish-subscribe object Client 1 Client 2 Frontend Device Connectivity Frontend Queue Pub-Sub 14 / 45

Analysis of Cloud Services Data Interdependencies in Live Mesh Change to a document creates an update message at Publish-Subscribe objects Queue objects receive a copy of that message Long tail of very popular data items 15 / 45

Analysis of Cloud Services Geographically Distant Data Sharing Compute sharing centroid for each data item Weighted mean between the users that access it Large amount of sharing occurs between distant clients 16 / 45

Analysis of Cloud Services Client Mobility Geo-location database quova.com Maps IP address to geographic region Centroid computed from all locations where the client contacted the service Large movements in the Live Messenger trace 17 / 45

1 Introduction 2 Analysis of Cloud Services 3 Data Placement 4 Evaluation 5 Conclusion

Data Placement Known Heuristics Determine user location Move data to closest datacenter for that user with the goal to reduce user latency Ignores major sources of operational costs WAN bandwidth between DCs Overprovisioned datacenter capacity due to skewed load 19 / 45

Data Placement Volley s Approach Volley optimizes data placement for latency and allows to limit operational costs Correlates application logs into graph that captures a global view on data accesses Analyzes data interdependencies and user behavior within cloud services Compute data placement and output recommendations when data should be migrated 20 / 45

Data Placement Volley in a Nutshell Input: logs & models Datacenter logs in distributed storage system Models for cost, capacity and latency Constraints on placement Iterative optimization algorithm Distributed computing framework Output: migration recommendations Capacity Model Cost Model Latency Model DC 1 DC 2 DC 3 LOG Distributed Storage VOLLEY Proposal Constraints on Placement 21 / 45

Data Placement Requirements for Logs Capture logical flow of control across components Construct dependency graph Provide unique identifiers for data items: GUID users: IP Request log record: Timestamp Source-entity: IP or GUID Request size Destination entity: GUID Transaction ID: trace request in logs 22 / 45

Data Placement Logged Events Live Mesh Trace Changes to files Device connectivity Live Messenger Trace Login/Logoff events Participants in each conversation Number of messages between users 23 / 45

Data Placement Datacenter Cost Model Cost per transaction, such as RAM, disk and CPU Capacity model for all DCs, e.g. amount of data stored at each DC Cost model for all DCs Models change on slower time scales Specify the hardware provisioning in DCs to run the service Required network bandwidth Charging model for service use of network bandwidth 24 / 45

Data Placement Additional Inputs Location of each data item Model of latency Network coordinate system: n-dimensional space specified by the model Locations of nodes predicted latency Constraints on placement Replication at distant datacenters Legal constraints Allows to make placement decisions 25 / 45

Data Placement Algorithm Phase 1 Compute initial placement Phase 2 Iteratively move data to reduce latency Phase 3 Collapse data to datacenters 26 / 45

Data Placement Phase 1: Initial Placement Map data items to the weighted average of the geographic coordinates of the clients that access it Weight = amount of communication client data item data items: compute weighted spherical mean Interpolate between 2 initial points (clients) Average in additional points Some data items may never be accessed directly by a client Move them near the already fixed data items Ignores data interdependencies! 27 / 45

Data Placement Phase 1: Initial Placement 28 / 45

Data Placement Phase 2: Iteratively Improve Placement Move data items closer to users and other data items that frequently interact data items: determine movement to another node Current latency and amount of communication increases the contracting force Updates to placement pull nodes together Data items moveable Client locations fixed Replicas treated as separate data items that interact frequently Reduce latency Reduce inter-dc traffic (if data items colocated) 29 / 45

Data Placement Phase 3: Collapse Data to Datacenters Move data to nearest datacenter If DC over specified capacity Identify data objects with fewest accesses Move them to the next closest DC Iterations #DCs 30 / 45

Data Placement Output: Migration Proposals Application-specific migration Supports diverse datacenter applications Proposal record: Entity: GUID New datacenter Average latency change per request Ongoing bandwidth change per day One-time migration bandwidth 31 / 45

1 Introduction 2 Analysis of Cloud Services 3 Data Placement 4 Evaluation 5 Conclusion

Evaluation Test Environment Month-long Live Mesh trace Compute placement on week 1 Evaluate placement on weeks 2-4 12 datacenters as potential locations Capacity limit: 10% of all data at each DC Analytic evaluation using network coordinate system 33 / 45

Evaluation Heuristics commonip onedc hash Place data near IP with most frequent access Optimizes for latency Place all data in one datacenter Optimizes for zero inter-dc traffic Place data according to hash function Optimizes for zero capacity skew 34 / 45

Evaluation Capacity Skew & Inter-DC Traffic 35 / 45

Evaluation Latency Volley performs better than commonip and provides lower capacity skew fewer inter-dc messages 36 / 45

Evaluation Live System Latency Live Mesh prototype Frontend: allows clients to connect to any DC Document Service: stores IP addresses of the clients Publish-Subscribe Service: notifies about changes in the document service Message Queue Service: buffers messages from the Publish-Subscribe Service Live Mesh trace replayed from 109 nodes scattered around the world 20 VMs 12 DCs 109 Client nodes 37 / 45

Evaluation Live System Latency External sources of noise Less client locations than real world scenario Connectivity of the simulated clients does not conform to Volley s latency model 38 / 45

Evaluation Impact of Iteration Count on Capacity Skew Most objects do not move after phase 1 Capacity skew smoothed in phase 3 39 / 45

Evaluation Impact of Iteration Count on Client Latency Latency remains stable after few iterations of phase 2 Almost no penalty from phase 3 40 / 45

Evaluation Re-Computation Volley should be re-run frequently Stale placements increase request latency due to client mobility Inter-datacenter traffic increases due to new objects that cannot be placed intelligently Changing access patterns may require data movement New clients need to be served 41 / 45

Evaluation Migrated Objects Percentage of objects moved in placement computed after week X compared to first week Most old objects do not move 42 / 45

Conclusion Summary Automatic recommendations for data-placement under constraints Placement can be controlled to take resource usage into account (Cost & Capacity Models) ensure replication (Constraints on Placement) Application independence allowing for specialized migration mechanisms Analysis of cloud services hightlighted the trends that motivated Volley: shared data, data interdependencies and user mobility Evaluation shows that Volley simultaneously reduces latency and operational costs Improvement over state-of-the-art heuristic 43 / 45

Conclusion Open Questions Volley handles placement decisions within cloud service Extension to output recommendations to DC operators to upgrade their DCs or build new ones Can we allow new objects to be registered such that they get a good initial placement? Volley handles replicas as separate data items Better alternative for modeling replicas? 44 / 45

Thanks for your attention 45 / 45