Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds 1

Similar documents
GRID computing serves high-performance applications

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Network Design Considerations for Grid Computing

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

Module Day Topic. 1 Definition of Cloud Computing and its Basics

Introduction Distributed Systems

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Scalability of the BitTorrent P2P Application

Peer-to-Peer Streaming Systems. Behzad Akbari

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

A Performance Evaluation Architecture for Hierarchical PNNI and Performance Evaluation of Different Aggregation Algorithms in Large ATM Networks

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Broadcast Routing. Chapter 5 Multicast and P2P. In-network Duplication. Spanning Tree

Rate Allocation for Peer-to-Peer Video Conferencing using Layered Video

The Scalability of Swarming Peer-to-Peer Content Delivery

Overview. Overview. OTV Fundamentals. OTV Terms. This chapter provides an overview for Overlay Transport Virtualization (OTV) on Cisco NX-OS devices.

P2P content distribution Jukka K. Nurminen

EBOOK DATABASE CONSIDERATIONS FOR DOCKER

Cross-Site Virtual Network Provisioning in Cloud and Fog Computing

Application container orchestration for the distributed internet of things

IntServ and RSVP. Overview. IntServ Fundamentals. Tarik Cicic University of Oslo December 2001

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

CLOUD COMPUTING. Rajesh Kumar. DevOps Architect.

Applying Self-Aggregation to Load Balancing: Experimental Results

BitTorrent and CoolStreaming

Advanced Computer Networks. Flow Control

Many-to-Many Communications in HyperCast

Falling Out of the Clouds: When Your Big Data Needs a New Home

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

Peer-to-Peer Systems. Chapter General Characteristics

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Peer Assisted Content Distribution over Router Assisted Overlay Multicast

Peer-to-Peer Internet Applications: A Review

Application-Layer Protocols Peer-to-Peer Systems, Media Streaming & Content Delivery Networks

Messaging Overview. Introduction. Gen-Z Messaging

Early Measurements of a Cluster-based Architecture for P2P Systems

Coherence & WebLogic Server integration with Coherence (Active Cache)

EE 122: Peer-to-Peer Networks

Configure IBM Security Identity Manager Virtual Appliance in Cloud

SYMMETRIC STORAGE VIRTUALISATION IN THE NETWORK

Scalability Considerations

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

CSE6331: Cloud Computing

Conversing in the Cloud. Ryan Kupfer, Scott Wetter, Bryan Welfel, Shekhar Pradhan

Contents. Overview Multicast = Send to a group of hosts. Overview. Overview. Implementation Issues. Motivation: ISPs charge by bandwidth

Understanding Mesh-based Peer-to-Peer Streaming

Distributed Systems COMP 212. Lecture 18 Othon Michail

C2OF2N: a low power cooperative code offloading method for femtolet-based fog network

Shenick Network Systems. diversifeye TeraVM. Massively Scaled IP Test Solutions using Cisco Unified Computing Systems

Max-1: Algorithm for Constructing Tree Topology for heterogeneous networks for Peer-To-Peer Live Video Streaming

Master s Thesis. A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

P2P content distribution

Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches

Applying Network Coding to Peer-to-Peer File Sharing

Top 40 Cloud Computing Interview Questions

Extreme Computing. BitTorrent and incentive-based overlay networks.

COP Cloud Computing. Presented by: Sanketh Beerabbi University of Central Florida

Delaunay Triangulation Overlays

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

2017 Paul Krzyzanowski 1

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Choosing the Right Acceleration Solution

Overview. AWS networking services including: VPC Extend your network into a virtual private cloud. EIP Elastic IP

Distributed Systems. Chapter 1: Introduction

Introduction to parallel Computing

Data Sheet Gigamon Visibility Platform for AWS

05 Indirect Communication

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Introducing CloudGenix Clarity

Overlay Networks in ScaleNet

CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters

Deployments and Network Topologies

Scalable overlay Networks

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

Zorilla: a peer-to-peer middleware for real-world distributed systems

Motivation and goal Design concepts and service model Architecture and implementation Performance, and so on...

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

Optimization of MPI Applications Rolf Rabenseifner

Implementation of Near Optimal Algorithm for Integrated Cellular and Ad-Hoc Multicast (ICAM)

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

WP-PD Wirepas Mesh Overview

Automated Mapping of Regular Communication Graphs on Mesh Interconnects

WAN Edge MPLSoL2 Service

SUPPORTED HYPERVISORS. FusionHub runs on nearly all mainstream virtual machine software including VMware, Citrix XenServer and Oracle VirtualBox.

GUIDE. Optimal Network Designs with Cohesity

Data Center Fundamentals: The Datacenter as a Computer

Public Key Infrastructure scaling perspectives

Efficient Multicast for Grid Environment Using Robber and Bit Torrent Protocol

ITTC High-Performance Networking The University of Kansas EECS 881 Architecture and Topology

A Survey of Peer-to-Peer Content Distribution Technologies

Cisco Service Advertisement Framework Deployment Guide

John L. Miller and Jon Crowcroft NetGames 2010

Transcription:

Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds 1

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 2

The Problem deploy large data sets from the cloud s storage facility to all compute nodes as fast as possible. the network performance within clouds is dynamic keeping network monitoring data and topology information up-to-date is almost impossible 3

Common multicast algorithms construct one or more spanning trees based on the network topology monitoring data in order to maximize available bandwidth and avoid bottleneck links. 4

We propose two hp multicast algorithms Combine optimization ideas from multicast algorithms used in: parallel distributed systems P2P systems Efficiently transfer large amounts of data stored in Amazon S3 to multiple Amazon EC2 nodes. 5

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 6

Multicast on parallel distributed systems 1/2 collective operation algorithms in message passing systems like MPI. The target environment : 1) network performance is high and stable 2) network topology does not change 3) available bandwidth between nodes is symmetric 7

Multicast on parallel distributed systems 2/2 construct one or more optimized spanning trees by using network topology information and other monitoring data. root node sender-driven technique 8

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 9

Multicast on P2P systems 1/6 file sharing applications data streaming protocols The target environment : 1) network performance is very dynamic 2) nodes can join and leave 3) available bandwidth between nodes can be asymmetric 10

Multicast on P2P systems 2/6 New nodes joined the topology Using network topology is not a good idea 11

Multicast on P2P systems 3/6 divide data into small pieces exchanged it with neighbor nodes * receiver-driven technique-> can improve throughput 12

Multicast on P2P systems 4/6 all nodes tell their neighbors which pieces they have and request pieces they lack 13

Multicast on P2P systems 5/6 14

Multicast on P2P systems 6/6 15

Example : Mob optimizes multicast communication between multiple homogeneous clusters connected via a WAN. based on BitTorrent, and takes node locality into account to reduce the number of wide-area transfers. 16

Each node in a mob steals an equal part of all data from peers in remote clusters 17

Each node in a mob distributes the stolen pieces locally 18

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 1 9

Cloud systems 1/2 are based on large clusters in which nodes are densely connected storage resources provided by clouds are fully or partly virtualized. -> A multicast algorithm for clouds can therefore not assume anything about the exact physical infrastructure 20

Cloud systems 2/2 the uplink and downlink of a virtual compute node can be affected by other virtual compute nodes that are running on the same physical host. routing changes and load balancing will also affect network performance. the data to distribute is often located in the cloud s storage service, which introduces another potential shared bottleneck 21

Cloud platform used in our experiments Amazon EC2/S3 Amazon Elastic Compute Cloud (EC2) Simple Storage Service (S3) EC2 provides virtualized computational resources that can range from one to hundreds of nodes as needed. virtual machines on top of Xen are called instances Xen hypervisor- open source standard for virtualization 22

Amazon EC2/S3 S3 stores files as objects in a unique name space, called a bucket. Buckets have to be created before putting objects into S3. They have a location and an arbitrary but globally unique name. The size of objects in a bucket can currently range from 1 byte to 5 gigabytes. 23

Multicast on clouds Requirements that multicast operations on clouds should fulfill: maximized utilization of available aggregate download throughput from cloud storage minimization of multicast completion time of each node non-dependence on monitoring network throughput nor estimation of network topology 24

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 2 5

High performance multicast algorithms Features of our algorithms : construct an overlay network on clouds without network topology information; optimize the total throughput dynamically; increase the download throughput by letting nodes cooperate with each other. 26

How they work? divide the data to download from the cloud storage service over all nodes exchanges the data via a mesh overlay network the first non-steal algorithm lets each node download an equal share of all data the second steal algorithm uses work stealing to counter the effect of heterogeneous download bandwidth. 27

Van de Geijn non-steal algorithm Scatter phase: the root node divides the data to be multicast into blocks of equal size depending on the number of nodes. Blocks are then send to each corresponding node using a binomial tree. Allgather phase : the missing blocks are exchanged and collected by using the recursive doubling technique 28

The non-steal algorithm assigned range of pieces scatter phase allgather phase Once a node finishes downloading the assigned range of pieces, it waits until all the other nodes 29 have finished too and then exchanges information with its neighbors in the mesh.

The steal algorithm assigned range of pieces Scatter phase Allgather phase When a node has finished downloading its pieces, it asks other nodes if they have any work 30 remaining. The amount of work download is proportional to download bandwidth.

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 3 1

Implementation On top of Ibis, a Java-based grid programming environment. Each node is notified of the identity of new nodes that join the Ibis network, as well as its unique rank assigned by Ibis. The steal and non-steal algorithms are implemented as an extension to the S3MulticastChannel object. S3MulticastChannel manages an S3Connection object and a number of PeerConnection objects. The S3Connection object downloads pieces from S3, while the PeerConnection objects send and receive messages to and from other nodes. 32

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 3 3

PERFORMANCE EVALUATION 1/3 We compared the multicast performance of the flat tree, nonsteal and steal algorithms on Amazon EC2/S3. We only use EC2/S3 sites located in Europe, with a small EC2 instance type. piece size = 32KB 34

PERFORMANCE EVALUATION 2/3 The best completion time is similar across all algorithms: about 100 s The slowest completion time: *at least one node in the flat tree takes approximately 200s *but none of the nodes in either the non-steal or steal algorithms takes more than 140s 35

PERFORMANCE EVALUATION 3/3 ->number of nodes increases -> the time spent in phase 1 decrease ->time spent in phase 2 increases ->total time spent in phase 1 and phase 2 remains nearly constant -> 32 nodes take slightly much time than 16 nodes. 36

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 3 7

Summary different multicast algorithms for distributed parallel and P2P systems. we have presented two high performance multicast algorithms. we have focused on Amazon EC2/S3 38

Contents: Introduction Multicast on parallel distributed systems Multicast on P2P systems Multicast on clouds High performance multicast algorithms Implementation Performance Evaluation Summary Conclusions 3 9

CONCLUSIONS We evaluate our algorithms on EC2/S3, and show that they are scalable and consistently achieve high throughput. As a result, all nodes can download files from S3 quickly, even when the network performance changes while the algorithm is running. 40

Take home message As a best practice : Use High performance multicast algorithms because the nodes cooperate with each other for getting the data from the cloud storage and the performance is increased. 41

Thank you! 42