Cornell University, Hebrew University * (*) Others are also involved in some aspects of this project I ll mention them when their work arises

Similar documents
CS 514: Transport Protocols for Datacenters

CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME SETTINGS KEN BIRMAN. Rao Professor of Computer Science Cornell University

Indirect Communication

ZooKeeper. Table of contents

TCP and BBR. Geoff Huston APNIC

Media-Ready Network Transcript

Understanding Internet Speed Test Results

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

TCP and BBR. Geoff Huston APNIC

Interface The exit interface a packet will take when destined for a specific network.

CS5412: THE REALTIME CLOUD

FUJITSU Backup as a Service Rapid Recovery Appliance

A Ready Business rises above infrastructure limitations. Vodacom Power to you

Indirect Communication

Lesson 1 Parts are adapted from Windows 98 by Mark Twain Media, Inc.

Chapter 8 Fault Tolerance

05 Indirect Communication

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

Java Without the Jitter

IPv6 Neighbor Discovery (ND) Problems with Layer-2 Multicast State

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 20: Networks and Distributed Systems

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 19: Networks and Distributed Systems

Reliable Distributed Systems. Scalability

CS5412: BIMODAL MULTICAST ASTROLABE

CS505: Distributed Systems

The SD-WAN security guide

Real-Time Insights from the Source

Identifying Workloads for the Cloud

Dynamic Routing: Exploiting HiperSockets and Real Network Devices

Welfare Navigation Using Genetic Algorithm

Vortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D.

Rx for Data Center Communica4on Scalability

Our goal today. Using Gossip to Build Scalable Services. Gossip 101. Gossip scales very nicely. Gossip about membership. Gossip in distributed systems

CS5412: TRANSACTIONS (I)

MRG - AMQP trading system in a rack. Carl Trieloff Senior Consulting Software Engineer/ Director MRG Red Hat, Inc.

Assignment 5. Georgia Koloniari

GFS: The Google File System. Dr. Yingwu Zhu

Module SDS: Scalable Distributed Systems. Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Rethinking VDI: The Role of Client-Hosted Virtual Desktops. White Paper Virtual Computer, Inc. All Rights Reserved.

Ananta: Cloud Scale Load Balancing. Nitish Paradkar, Zaina Hamid. EECS 589 Paper Review

Making the case for SD-WAN

Considerations for Mobilizing your Lotus Notes Applications

TBGP: A more scalable and functional BGP. Paul Francis Jan. 2004

Data Communication. Chapter # 1: Introduction. By: William Stalling

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

TCP and BBR. Geoff Huston APNIC. #apricot

The importance of monitoring containers

MASERGY S MANAGED SD-WAN

Analysis, Dekalb Roofing Company Web Site

Comprehensive Guide to Evaluating Event Stream Processing Engines

Path Optimization in Stream-Based Overlay Networks

GFS: The Google File System

Dynamic Routing: Exploiting HiperSockets and Real Network Devices

The Attraction of Complexity

How to Get Your Inbox to Zero Every Day

White Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents

When Milliseconds Matter. The Definitive Buying Guide to Network Services for Healthcare Organizations

Spotfire and Tableau Positioning. Summary

Ken Birman. Cornell Dept of Computer Science Colloquium

Transform your video services with a cloud platform: Succeed in a fragmented marketplace

Distributed Systems COMP 212. Lecture 19 Othon Michail

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

COMMVAULT. Enabling high-speed WAN backups with PORTrockIT

Experimental Extensions to RSVP Remote Client and One-Pass Signalling

Emergency Response: How dedicated short range communication will help in the future. Matthew Henchey and Tejswaroop Geetla, University at Buffalo

CS5412: TIER 2 OVERLAYS

Designing and debugging real-time distributed systems

1 GSW Bridging and Switching

The Microsoft Large Mailbox Vision

BECOME A LOAD TESTING ROCK STAR

Multicast Transport Protocol Analysis: Self-Similar Sources *

PORTrockIT. IBM Spectrum Protect : faster WAN replication and backups with PORTrockIT

Building blocks: Connectors: View concern stakeholder (1..*):

Up and Running Software The Development Process

CS514: Intermediate Course in Computer Systems

White Paper: VANTIQ Digital Twin Architecture

Optimizing and Managing File Storage in Windows Environments

MODERNIZE YOUR DATA CENTER. With Cisco Nexus Switches

It s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.

Microservice Splitting the Monolith. Software Engineering II Sharif University of Technology MohammadAmin Fazli

Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Full file at

Prepared by Agha Mohammad Haidari Network Manager ICT Directorate Ministry of Communication & IT

CS4700/CS5700 Fundamentals of Computer Networks

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Microsoft DFS Replication vs. Peer Software s PeerSync & PeerLock

iscsi Technology: A Convergence of Networking and Storage

Peer-to-Peer Systems and Distributed Hash Tables

LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT

The Salesforce Migration Playbook

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks

Reliable Communication for Datacenters

Contents. Overview Multicast = Send to a group of hosts. Overview. Overview. Implementation Issues. Motivation: ISPs charge by bandwidth

QUALITY of SERVICE. Introduction

Lecture 16: Wireless Networks

Intelligent Mobile App Testing

Introduction to creating mashups using IBM Mashup Center

VMware vsphere 4. The Best Platform for Building Cloud Infrastructures

Transcription:

Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev Cornell University, Hebrew University * (*) Others are also involved in some aspects of this project I ll mention them when their work arises

Live Objects in an Active Web Imagine a world of Live Objects.. and an Active Web created with drag and drop

Live Objects in an Active Web Imagine a world of Live Objects.. and an Active Web created with drag and drop

Live Objects in an Active Web User builds applications much like powerpoint Drag things onto a live document or desktop Customize them via a properties p sheet Then share the live document Opening a document joins a session New instance can obtain a state checkpoint All see every update Platform offers privacy, security, reliability properties

When would they be useful? Build a disaster response system in the field (with no programming needed!) Coordinated planning and plan execution Create role-playing simulations, games Integrate data from web services into databases, spreadsheets Visualize complex distributed state Track business processes, status of major projects, even state of an application

Big deal? We think so! It is very hard to build distributed systems today. If non-programmers can do the job numbers of such applications will soar Live objects are robust to the extent that our platform is able to offer properties such as security, privacy protection, fault-tolerance, stability Live objects might be a way to motivate users to j g y adopt a trustworthy technology

The drag and drop world It needs a global namespace of objects Video feeds, other data feeds, live maps, etc Our thinking: download them from a repository or (rarely) build new ones Users make heavy use of live documents, share other kinds of live objects And this gives rise to a world with Lots of live traffic, huge numbers of live objects Any given node may be in lots of object groups

Overlapping groups Control Events Background Radar Images ATC events Radar track updates Weather notifications Multicast groups supporting live objects Nodes running live applications

posing technical challenges How can we build a system that Can sustain high data rates in groups Can scale to large numbers of overlapping groups Can guarantee reliability and security properties Existing multicast systems can t solve these problems!

Existing technologies won t work Kind of technology IP multicast, pt-to-pt TCP Software group multicast solutions ( heavyweight ) h Lightweight groups Why we rejected it Too many IPMC addrs. Too many TCP streams Protocols designed for just one group at a time; overheads soar. Instability in large deployments Nodes get undesired traffic, data sent indirectly Publish-subscribe b ib bus Unstable in large deployments, data sent indirectly Content-filtering event notification. Peer-to-peer overlays Very expensive. Nodes see undesired traffic. High latency paths are common Similar to content-filtering scenario

Steps to a new system! 1. First, we ll look at group overlap and will show that we can simplify a system with overlap and focus on a single cover set with a regular, hierarchical overlap 2. Next, we ll design a simple fault-tolerance protocol for high-speed data delivery in such systems 3. We ll look at its performance (and arrive at surprising insights that greatly enhance scalability under stress) 4. Last, ask how our solution can be enhanced to address need for stronger reliability, security

Coping with Group Overlap In a nutshell: Start by showing that even if groups overlap in an irregular way, we can decompose the structure into a collection of overlayed cover sets Cover sets will have regular overlap Clean, hierarchical inclusion Other good properties

Regular Overlap groups nodes Likely to arise in a data center that replicates services and automates layout of services on nodes

Live Objects Irregular overlap Likely because users will have different interests

Tiling an irregular overlap Build some (small) number of regularly overlapped sets of groups ( cover sets ) s.t. Each group is in one cover set Cover sets are nicely hierarchical Traffic is as concentrated as possible Seems hard: O(2 G ) possible cover sets In fact we ve developed d a surprisingly simple algorithm that works really well. Ymir Vigfusson has been helping us study this:

Algorithm in a nutshell 1. Remove tiny groups and collapse identical ones 2. Pick a big, busy group 1. Look for another big, busy group with extensive overlap 2. Given multiple candidates, take the one that creates the largest regions of overlap 3. Repeat within overlap regions (if large enough) A B Nodes only in group A Nodes in A and B Nodes only in group B

Why this works in general, it wouldn t work! But many studies suggest that groups would have power-law popularity distributions Seen in studies of financial trading systems, RSS feeds Explained by preferential attachment models In such cases the overlap has hidden structure and the algorithm finds it! It also works exceptionally well for obvious cases such as exact overlap or hierarchical overlap

It works remarkably well! Lots of processes join 10% of thousands of groups with Zipf-like (α=1.5) popularity. re egions / nod de 15 12 9 6 3 e in ons des that are s many regio nod this 2000 1500 1000 500 0 Heavily loaded d 1 2 3 4 5 6 7 8 9 10 1 4 7 10131619222528 number of groups (thousands) regions / node 250 500 750 1000 2000 all regions 95% most loaded dregions total Nodes end up in very few regions (100:1 1 ratio ) And even fewer busy regions (1000:1 1 ratio)!

Effect of different stages Each step of the algorithm concentrates load Initial groups Remove small or identical groups Run algorithm

but not always It works very poorly with uniform random topic popularity It works incredibly well with artificially generated power-law popularity of a type that might arise in some real systems, or with artificial group layouts (as seen in IBM Websphere) But the situation for human preferential attachment scenarios is unclear right now we re studying it

Digression: Power Laws Zipf: Popularity of kth k th-ranked group 1/k α A law of nature

Zipf-like things Web page visitors, outlinks, inlinks File sizes Popularity and data rates for equity prices Network traffic from collections of clients Frequency of word use in natural language Income distribution in Western society and many more things

Dangers of common belief Everyone knows that if something is Zipf-like, instances will look like power-law curves Reality? These models are just approximate With experimental data, try and extract statistically supported model With groups, people plot log-log graphs (x axis is the topic popularity, ranked; y-axis counts subscribers) Gives something that looks more or less like a straight line with a lot of noise ose

Dangers of common belief * * * * * * * * * * * * * * * * * * Power law with α = 2.1 * * *

But Much of the structure is in the noise Would our greedy algorithm work on real world data? Hard to know: Live Objects aren t widely used in the real world yet For some guesses of how the real world would look, the region-finding algorithm should work for others, it might not a mystery until we can get more data!

When in doubt. Why not just build one and see how it does?

Building Our System First, build a live objects framework Basically, a structure for composing components Has a type system and a means of activating components. The actual components may not require code, but if they do, that code can be downloaded from remote sites User opens live documents or applications this triggers our runtime system, and it activates the objects The objects make use of communication streams that are themselves live objects

Example Even our airplanes were mashups Four objects (at least), with type-checked event channels connecting them Most apps will use a lot of objects XNA display interface Airplane Model GPS coordinates (x,y,z,t) Multicast protocol

When is an X an object? Given choice of implementing X or A+B Use one object if functionality is contained Use two or more if there is a shared function and then a plug-in specialization function Idea is a bit like plug-and-play device drivers Enables us to send an object to a strange environment and then configure it on the fly to work properly in that particular setting

Type checking Live objects are type-checked Each component exposes interfaces Events travel on these, and have types types must match In addition, objects may constraint their peers I expect this from my peer I provide this to my peer Here s a checker I would like to use Multiple opportunities for checking Design time mashup time runtime

Reflection At runtime, can Generate an interface: B s interface just for A Substitute a new object: B replaces B Interpose an object: A+B becomes A+B +B Tremendously flexible and powerful But does raise some complicated security issues!

Overall architecture User-Visible Application Objects Live Objects Platform QuickSilver Ricochet Gossip Scalable Time-Critical Objects Multicast Multicast Plaform

So why will it scale? Many dimensions that matter Lots of live objects on one machine, maybe using multicore Lots of machines using lots of objects In remainder of talk focus on multicast scaling

Building QSM Given an enterprise (for now, LAN-based) Build a map of the nodes in the system annotated by the live objects running on each Feed this into our cover set algorithm it will output a set of covers Each node instantiates QSM to build the needed communication infrastructure for those covers

Building QSM Given a regular cover set, break it into regions of identical group membership Assign each region its own IP multicast address

Building QSM To send to a group, multicast to regions it spans If possible, aggregate traffic into each region

Building QSM A hierarchical recovery architecture recovers A hierarchical recovery architecture recovers from message loss without overloading sender

memory footprint: a key issue At high data rates, performance is dominated by the reliability protocol Its latency turns out to be a function of 1. Ring size and hierarchy depth, 2. CPU loads in QSM, 3. Memory footprint of QSM (!!) This third factor was crucial it turned out to determine the other two! QSM has a new memory minimizing design

oscillatory behavior We also struggled with a form of thrashing messag ges /s 12000 10000 8000 6000 4000 2000 0 250 400 550 700 850 time (s)

Overcoming oscillatory behavior Essence of the problem: Some message gets dropped But the recovery ypacket is delayed by other data By the time the it arrives a huge backload forms The repair event triggers a surge overload causing more loss. The system begins to oscillate A form of priority inversion!

Overcoming oscillatory behavior Solution mimics emergency vehicles on a crowded roadway: pull over and let them past!

The bottom line? It works! QSM sustains high data rates (even under stress) and scales well. messag ges / s 10000 9500 9000 8500 8000 7500 0 50 100 150 200 number of nodes 1 sender 2 senders

The bottom line? It works! and Scalability with the limited number by memory of groups & (topics) ges / s CPU loads at the sender as confirmed by artificially inflating sender s per-group costs 8000 7500 messa 7000 6500 0 2000 4000 6000 8000 number of topics normal "heavyweight" (profiling on)

What next? Live objects in WAN settings with enriched language support for extensions Configuration Mgt. Svc Port to Linux Gossip Objects Platform PPLive/LO Properties Framework

Learning more http://liveobjects.cs.cornell.edu