Data Center Services and Optimization. Sobir Bazarbayev Chris Cai CS538 October

Similar documents
volley: automated data placement for geo-distributed cloud services

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Towards Optimal Data Replication Across Data Centers

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Minimum-cost Cloud Storage Service Across Multiple Cloud Providers

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Scaling Service-oriented Applications into Geo-distributed Clouds

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai, Mosharaf Chowdhury, Harsha Madhyastha

A Scalable Content- Addressable Network

Selective Data Replication for Online Social Networks with Distributed Datacenters Guoxin Liu *, Haiying Shen *, Harrison Chandler *

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics

Application Placement and Demand Distribution in a Global Elastic Cloud: A Unified Approach

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Cost-Effective Virtual Petabytes Storage Pools using MARS. FrOSCon 2017 Presentation by Thomas Schöbel-Theuer

Network-Aware Resource Allocation in Distributed Clouds

Infiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin

Jinho Hwang and Timothy Wood George Washington University

ΕΠΛ372 Παράλληλη Επεξεργάσια

VoltDB vs. Redis Benchmark

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Documents. Case Study 2: Document Retrieval

The MapReduce Abstraction

A Network-aware Scheduler in Data-parallel Clusters for High Performance

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

The Future of Storage

A QoS Load Balancing Scheduling Algorithm in Cloud Environment

Consolidating Complementary VMs with Spatial/Temporalawareness

DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines

Introduction to MapReduce

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

MapReduce: A Programming Model for Large-Scale Distributed Computation

Modeling and Optimization of Resource Allocation in Cloud

RIAL: Resource Intensity Aware Load Balancing in Clouds

Efficient On-Demand Operations in Distributed Infrastructures

Review of Morphus Abstract 1. Introduction 2. System design

Energy Efficient in Cloud Computing

Correlation based File Prefetching Approach for Hadoop

CS 655 Advanced Topics in Distributed Systems

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

Introduction to MapReduce (cont.)

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Democratically Finding The Cause of Packet Drops

Model-Driven Geo-Elasticity In Database Clouds

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Large-Scale GPU programming

Public Cloud Leverage For IT/Business Alignment Business Goals Agility to speed time to market, adapt to market demands Elasticity to meet demand whil

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

Stateless Network Functions:

Map Reduce Group Meeting

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

High-performance aspects in virtualized infrastructures

Programming Models MapReduce

CS 102. Big Data. Spring Big Data Platforms

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI

Tools for Social Networking Infrastructures

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Large Scale 3D Reconstruction by Structure from Motion

FastPlace 2.0: An Efficient Analytical Placer for Mixed- Mode Designs

Introduction to MapReduce

Energy Management of MapReduce Clusters. Jan Pohland

Scaling the LTE Control-Plane for Future Mobile Access

Network Design Considerations for Grid Computing

An Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Global Analytics in the Face of Bandwidth and Regulatory Constraints

Large-Scale Web Applications

Distributed computing: index building and use

MapReduce. Stony Brook University CSE545, Fall 2016

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud

DiPerF: automated DIstributed PERformance testing Framework

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab

Distributed Systems LEEC (2006/07 2º Sem.)

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

TAIL LATENCY AND PERFORMANCE AT SCALE

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

ibench: Quantifying Interference in Datacenter Applications

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

The Future of High Performance Computing

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CSE 124: TAIL LATENCY AND PERFORMANCE AT SCALE. George Porter November 27, 2017

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel

An Economical and SLO- Guaranteed Cloud Storage Service across Multiple Cloud Service Providers Guoxin Liu and Haiying Shen

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

End-user mapping: Next-Generation Request Routing for Content Delivery

Distributed computing: index building and use

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Transcription:

Data Center Services and Optimization Sobir Bazarbayev Chris Cai CS538 October 18 2011

Outline Background Volley: Automated Data Placement for Geo-Distributed Cloud Services, by Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bhogan, NSDI '10 Wide Area Placement of Data Replicas for Fast and Highly Available Data Access, by Fan Ping, Xiaohu Li, Christopher McConnell, Rohini Vabbalareddy, Jeong-Hyon Hwang, DIDC '11 Dynamic Co-Scheduling of Distributed Computation and Replication, by Huadong Liu, Micah Beck, and Jian Huan, CCGRID '06 Questions

Data Replica Placement Data replication goals: availability reliability load balancing user latency Great need for automatically placing data replicas to achieve above goals

Challenges Scalability Solutions that address operator costs Taking advantage of geographically distributed infrastructures

Outline Background Volley: Automated Data Placement for Geo-Distributed Cloud Services, by Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bhogan, NSDI '10 Wide Area Placement of Data Replicas for Fast and Highly Available Data Access, by Fan Ping, Xiaohu Li, Christopher McConnell, Rohini Vabbalareddy, Jeong-Hyon Hwang, DIDC '11 Dynamic Co-Scheduling of Distributed Computation and Replication, by Huadong Liu, Micah Beck, and Jian Huan, CCGRID '06 Questions

Volley: Automated Data Placement for Geo-Distributed Cloud Services Cloud services span globally distributed datacenters Volley automated mechanism for data placement main goals minimize user latencies lower operator costs designed to run on SCOPE (a scalable MapReduce-style platform) evaluated on Live Mesh (cloud storage service) [Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bhogan. NSDI, April 2010. Link]

Volley Volley is motivated by: shared data, (ex. Google Docs) data inter-dependencies (ex. Facebook) reaching datacenter capacity limits user mobility Main Challenges: scalability service operator costs dynamic data sharing

[Volley Slides, Page 7, http://www.usenix.org/events/nsdi10/tech/slides/agarwal.pdf]

[Volley Slides, Page 8, http://www.usenix.org/events/nsdi10/tech/slides/agarwal.pdf]

Volley Design Applications provide following information to Volley: processed requests log requirements on RAM, disk, and CPU per transaction a capacity cost model a model of latency between datacenters and between datacenters and clients current locations of every data item

Volley Algorithm Phase 1: Compute Initial Placement calculate geographic centroid for each data based on client locations, ignore data-interdependencies Phase 2: Iteratively Move Data to Reduce Latency refine centroid for each data iteratively consider client locations and data-interdependencies use weighted spring model and spherical coordinate system

Volley Algorithm Cont. Phase 3: Iteratively Collapse Data to Datacenters modify data placement to satisfy data center capacity constraints for datacenters over their capacity, move least accessed items to next closest data center

Volley Evaluation Live Mesh from June 2009 12 geographically diverse datacenters month long evaluation analytical evaluation using latency model (Agarwal SIGCOMM 09) based on 49.9 million measurements across 3.5 million end-hosts live experiment using Planetlab clients [Volley Slides, Page 18, http://www.usenix.org/events/nsdi10/tech/slides/agarwal.pdf]

Volley Evaluation Results compared to following heuristics commonip place data closest to IP that most commonly accessed it onedc put all data in one datacenter hash hash data to datacenters so as to optimize for loadbalancing

[Volley Slides, Page 20, http://www.usenix.org/events/nsdi10/tech/slides/agarwal. pdf]

[Volley Slides, Page 21, http://www.usenix.org/events/nsdi10/tech/slides/agarwal. pdf]

Volley Summary Volley solves the following problems partition user data across multiple data centers reduce operator cost and user latency Main results from LiveMesh evaluation reduces datacenter capacity skew by over 2x reduces inter-dc traffic by over 1.8x reduces user latency by 30% at 75th percentile [Volley Slides, Page 23, http://www.usenix.org/events/nsdi10/tech/slides/agarwal. pdf]

Outline Background Volley: Automated Data Placement for Geo-Distributed Cloud Services, by Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bhogan, NSDI '10 Wide Area Placement of Data Replicas for Fast and Highly Available Data Access, by Fan Ping, Xiaohu Li, Christopher McConnell, Rohini Vabbalareddy, Jeong-Hyon Hwang, DIDC '11 Dynamic Co-Scheduling of Distributed Computation and Replication, by Huadong Liu, Micah Beck, and Jian Huan, CCGRID '06 Questions

Wide Area Placement of Data Replicas for Fast and Highly Available Data Access - Ping, et al. Similar goal as Volley Volley: automated data placement considering data interdependencies minimize user latency and operation cost Wide Area Placement: focuses on data replica placement maximize user defined objective function

Problem Statement Given a target degree of replication r, choose r out of all possible locations to store the replicas such that a system optimization goal is achieved. System optimization is based on a user-defined objective function, e.g. o(d,a) = 1/d or o(d,a) = 1/(d * sqrt(1-a))

Replication Location Selection Algorithm Basically brute force Considering all possible r locations out of all candidates But can do this efficiently (why?)

[Subsequent graphs are borrowed from Fan Ping, et al link]

Evaluation Results compared to following replica placement strategies Random choose r locations randomly Optimal [delay] or [availability] or [delay, availability] calculate replica locations assuming we know the future data access (infeasible) Most Available Nodes choose r locations with highest availability Optimized [delay] or [availability] or [delay, availability] using the replica location selection algorithm we discussed

Average Access Delay vs Number of Replicas

Availability vs Number of Replicas

Outline Background Volley: Automated Data Placement for Geo-Distributed Cloud Services, by Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bhogan, NSDI '10 Wide Area Placement of Data Replicas for Fast and Highly Available Data Access, by Fan Ping, Xiaohu Li, Christopher McConnell, Rohini Vabbalareddy, Jeong-Hyon Hwang, DIDC '11 Dynamic Co-Scheduling of Distributed Computation and Replication, by Huadong Liu, Micah Beck, and Jian Huan, CCGRID '06 Questions

Dynamic Co-Scheduling of Distributed Computation and Replication Volley high data interdependency automate data placement to minimize user latency and operator cost Wide Area Placement k-way replicated data for lower latency and higher availability automate replication placement to maximize user defined objective function (latency, availability) This paper addresses data movement in different environment considers data movement in both computation and data intensive distributed system similar goal as last two papers of minimizing latency

Dynamic Co-Scheduling of Distributed Computation and Replication Dynamic co-scheduling algorithm that integrates scheduling of computation and runtime data movement Goals: improve server utilization minimize latency Overview adaptively measures server performance computation power data transfer rate dynamically assign tasks to servers direct data movement between servers [Huadong Liu, Micah Beck, and Jian Huan. Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2006.]

Adaptive Scheduling of Computation Assign as many tasks to fast servers, avoid being stalled by slow or faulty servers Three generic mechanisms Dynamically ranked pool of servers Two level priority queue of tasks Competition avoidant task management scheme [Dynamic Co-Scheduling of Distributed Computation and Replication, Page 4]

Dynamic Scheduling of Replication Fast servers can help slow servers by repeating tasks on replicas Some partitions only reside on set of slow servers Can move partitions to fast servers if time spent on data movement does not exceed the profit that we gain from migrating the task

[Dynamic Co-Scheduling of Distributed Computation and Replication, Page 7]

Summary Maximizing server utilization and minimizing application execution time by co-scheduling algorithm Greatly improving both server utilization and application performance even with a small number of replicas

Outline Background Volley: Automated Data Placement for Geo-Distributed Cloud Services, by Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bhogan, NSDI '10 Wide Area Placement of Data Replicas for Fast and Highly Available Data Access, by Fan Ping, Xiaohu Li, Christopher McConnell, Rohini Vabbalareddy, Jeong-Hyon Hwang, DIDC '11 Dynamic Co-Scheduling of Distributed Computation and Replication, by Huadong Liu, Micah Beck, and Jian Huan, CCGRID '06 Questions

Questions As distributed datacenters are becoming more and more geographically distributed how important is it to develop good techniques to automate data placement? When are the applications good candidates for data replica placement algorithms? Which techniques we have described fit following applications? Google Docs Facebook status updates and wall posts Flickr