Arvind Krishnamurthy Fall 2003

Similar documents
Overlay Networks for Multimedia Contents Distribution

Scalable Application Layer Multicast

Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Overlay Networks. Behnam Momeni Computer Engineering Department Sharif University of Technology

! Naive n-way unicast does not scale. ! IP multicast to the rescue. ! Extends IP architecture for efficient multi-point delivery. !

CS4700/CS5700 Fundamentals of Computer Networks

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Key: Location and Routing. Driving Applications

Delaunay Triangulation Overlays

An Evaluation of Three Application-Layer Multicast Protocols

Internet Measurements. Motivation

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Announcements. EECS 122: Introduction to Computer Networks Multicast and Overlay Networks. Motivational Example: Streaming Media

Overlay Multicast. Application Layer Multicast. Structured Overlays Unstructured Overlays. CAN Flooding Centralised. Scribe/SplitStream Distributed

Interdomain Routing Reading: Sections P&D 4.3.{3,4}

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Interdomain Routing Reading: Sections K&R EE122: Intro to Communication Networks Fall 2007 (WF 4:00-5:30 in Cory 277)

Overview. Overview. OTV Fundamentals. OTV Terms. This chapter provides an overview for Overlay Transport Virtualization (OTV) on Cisco NX-OS devices.

CS4450. Computer Networks: Architecture and Protocols. Lecture 15 BGP. Spring 2018 Rachit Agarwal

3. Evaluation of Selected Tree and Mesh based Routing Protocols

CSCE 463/612 Networks and Distributed Processing Spring 2018

Kapitel 5: Mobile Ad Hoc Networks. Characteristics. Applications of Ad Hoc Networks. Wireless Communication. Wireless communication networks types

Many-to-Many Communications in HyperCast

Data Communication. Guaranteed Delivery Based on Memorization

An overview of how packets are routed in the Internet

Multicast EECS 122: Lecture 16

Interdomain Routing Design for MobilityFirst

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Course Routing Classification Properties Routing Protocols 1/39

Overlay Multicast/Broadcast

Outline Computer Networking. Inter and Intra-Domain Routing. Internet s Area Hierarchy Routing hierarchy. Internet structure

A Case for End System Multicast

BGP. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Searching for Shared Resources: DHT in General

Locality in Structured Peer-to-Peer Networks

6.852 Lecture 10. Minimum spanning tree. Reading: Chapter 15.5, Gallager-Humblet-Spira paper. Gallager-Humblet-Spira algorithm

GIAN Course on Distributed Network Algorithms. Spanning Tree Constructions

Searching for Shared Resources: DHT in General

IP Multicast Load Splitting across Equal-Cost Paths

EE122: Multicast. Kevin Lai October 7, 2002

Routing Overview for Firepower Threat Defense

EE122: Multicast. Internet Radio. Multicast Service Model 1. Motivation

Routing. Jens A Andersson Communication Systems

Configuring STP. Understanding Spanning-Tree Features CHAPTER

Distributed Systems Final Exam

CS 640: Introduction to Computer Networks. Intra-domain routing. Inter-domain Routing: Hierarchy. Aditya Akella

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

Internet Anycast: Performance, Problems and Potential

An Expresway over Chord in Peer-to-Peer Systems

Architectures for Distributed Systems

Computer Networks. Routing

Routing. Information Networks p.1/35

Routing Unicast routing protocols

Lecture 2: January 24

MULTICAST EXTENSIONS TO OSPF (MOSPF)

GIAN Course on Distributed Network Algorithms. Spanning Tree Constructions

Architecture and Implementation of a Content-based Data Dissemination System

Routing, Routing Algorithms & Protocols

Adaptive Routing of QoS-Constrained Media Streams over Scalable Overlay Topologies

Table of Contents 1 PIM Configuration 1-1

Overview 4.2: Routing

Configuring Spanning Tree Protocol

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

A local area network that employs either a full mesh topology or partial mesh topology

IP Multicast Technology Overview

Study and Comparison of Mesh and Tree- Based Multicast Routing Protocols for MANETs

Distributed Systems (5DV020) Group communication. Fall Group communication. Fall Indirect communication

Implementation of Near Optimal Algorithm for Integrated Cellular and Ad-Hoc Multicast (ICAM)

Important Lessons From Last Lecture Computer Networking. Outline. Routing Review. Routing hierarchy. Internet structure. External BGP (E-BGP)

CS BGP v4. Fall 2014

Network-on-chip (NOC) Topologies

A Protocol for Scalable Application Layer Multicast. Suman Banerjee, Bobby Bhattacharjee, Srinivasan Parthasarathy

What is Multicasting? Multicasting Fundamentals. Unicast Transmission. Agenda. L70 - Multicasting Fundamentals. L70 - Multicasting Fundamentals

Internet Routing : Fundamentals of Computer Networks Bill Nace

CS514: Intermediate Course in Computer Systems

Lecture 15: Measurement Studies on Internet Routing

How does a router know where to send a packet next?

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Contents. Overview Multicast = Send to a group of hosts. Overview. Overview. Implementation Issues. Motivation: ISPs charge by bandwidth

Distributed Routing. EECS 228 Abhay Parekh

Top-Down Network Design

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Peer-to-Peer Systems. Chapter General Characteristics

LECT-05, S-1 FP2P, Javed I.

CSEP 561 Routing. David Wetherall

Scalable overlay Networks

Introduction to Communication Networks Spring Unit 13 Network extensions Bridges.

CS 640: Introduction to Computer Networks

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Computation of Multiple Node Disjoint Paths

Configuring BGP. Cisco s BGP Implementation

Configuring Spanning Tree Protocol

Inter-Domain Routing: BGP

Lecture 13: Traffic Engineering

Turning Heterogeneity into an Advantage in Overlay Routing

Transcription:

Overlay Networks Arvind Krishnamurthy Fall 003 Internet Routing Internet routing is inefficient: Does not always pick the lowest latency paths Does not always pick paths with low drop rates Experimental evidence with 43 nodes: (Detour project at Washington) 50% of routes had a faster alternate one-hop route 80% of routes had an alternate route with lower loss rate 1

Reasons for path inflation #1 Using AS-hop-count as routing metric: Using actual latency or router-hop-count would be better Shortest AS path AS4 AS AS8 destination AS1 AS5 AS7 source Shortest router path AS9 Reasons for path inflation # Policy routing might result in picking a longer path No-valley routing policy: An AS does not provide transit between any two of its providers or peers. AS3 AS4 AS AS5 AS6 AS1 source Prefer Customer routing policy: Prefer the free of charge customer route over the peer or provider route. Early exit strategy: AS7 AS might try to get rid of a packet as soon as possible and minimize its intra-domain traffic AS8 destination

Internet Path Outages When a path fails try to find an alternate path if possible Internet: Path outages are reasonably common Recovery time could be substantial Paxson 95-97 Labovitz 97-00 3.3% of all routes had serious problems 10% of routes available < 95% of the time 65% of routes available < 99.9% of the time 3-min minimum detection+recovery time; often 15 mins 40% of outages took 30+ mins to repair Chandra 01 5% of faults last more than.75 hours Advanced Routing Mechanisms Internet routers rarely support advanced protocols such as IP multicast Ideally, routers should have intelligence to form multicast trees, maintain membership information, and split flows More advanced applications that could benefit from network-embedded intelligence include wide-area file systems Multicast source 3

Solution: Overlay Networks MIT Berkeley Yale Route around faults Use an overlay path that comprises of two or more physical connections through the internet Route around inefficiencies Intelligence (for multicasting, wide-area file-systems, etc.) is pushed to the edge of the internet Overlay Multicast Yale Berkeley Yale Berkeley UCLA Stanford UCLA Stanford Multicast performed through multiple unicasts Edge-hosts serve as forwarding agents (results in a distribution tree) Question: what constitutes a good distribution tree? How would you design such a overlay multicast system? 4

Narada Project at Step 0 Step 1 Maintain a complete overlay graph of all group members Links correspond to unicast paths Link costs maintained by polling Mesh : Subset of complete graph may have cycles and includes all group members Members have low degrees Shortest path delay between any pair of members along mesh is small Stan Stan1 Stan Stan1 Berk Berk1 Yale Berk Berk1 Yale Narada Design (contd.) Step Source rooted shortest delay spanning trees of mesh Constructed using distance vector routing Members have low degrees Small delay from source to receivers Stan Stan1 Berk Berk1 Yale 5

Narada Components Mesh Management: Ensures mesh remains connected in face of membership changes Mesh Optimization: Distributed heuristics for ensuring shortest path delay between members along the mesh is small Spanning tree construction: Routing algorithms for constructing data-delivery trees Distance vector routing, and reverse path forwarding Optimizing Mesh Quality Stan Stan1 A poor overlay topology Berk1 Yale1 Yale Members periodically probe other members at random New Link added if Utility Gain of adding link > Add Threshold Members periodically monitor existing links Existing Link dropped if Cost of dropping link < Drop Threshold 6

The terms defined Utility gain of adding a link based on The number of members to which routing delay improves How significant the improvement in delay to each member is Cost of dropping a link based on The number of members to which routing delay increases, for either neighbor Add/Drop Thresholds are functions of: Member s estimation of group size Current and maximum degree of member in the mesh Desirable properties of heuristics Stability: A dropped link will not be immediately added Partition Avoidance: A partition of the mesh is unlikely to be caused as a result of any single link being dropped Stan Berk1 Stan1 Delay improves to Stan1, but marginally. Do not add link! Probe Yale1 Yale Stan Berk1 Stan1 Yale1 Probe Yale Delay improves to, Yale1 and significantly. Add link! 7

Mesh Improvement Stan Stan1 Berk1 Yale1 Yale Used by Berk1 to reach only Yale and vice versa. Drop Stan Stan1 Berk1 Yale1 An improved mesh Yale Narada Mesh Construction Discussion on metrics: Option 1: use latency as the link weight metric Option : use bandwidth (or inverse-bandwidth) as link weight Option 3: use some function of latency and bandwidth (such as cost of sending 8K data) Hybrid metrics: For some applications, have bandwidth as the important metric and use latency to choose between links that have similar bandwidth Transform bandwidths into coarse levels For paths/links with bandwidth at a certain coarse level, use latency as the discriminating factor 8

Mesh Applications Narada is used for end-system multicast Resilient Overlay Networks (RON) uses a complete graph as the mesh Used for finding backup overlay paths when direct connections fail Routing protocol overheads dominate RON uses link state protocol similar overheads for distance vector Scales only to about 50 or so nodes Overlay network can also be used for multipath routing Can increase net bandwidth Multi-tree routing can be used to improve multicast performance Can send redundant data to improve loss-rates for real-time applications Mesh Construction Strategies Narada: Start with random graph Perform mesh improvement Another strategy: Start with k good links and k random links per node k random links ensures that initial mesh is connected Perform mesh improvement Can we do better? Would like a mesh that is: k connected (to support path redundancy) Contains the best physical connections Has low degree on each node Has low diameter for the entire mesh 9

Graph Theoretic Results Problem: Find a subgraph that is k -connected and minimizes total edge weight of edges included in the subgraph NP-Hard problem for k > 1 k = 1, solution is minimum spanning tree k =, becomes NP-Hard because you can reduce Hamiltonian path problem to this problem Best approximation algorithms: Approximate by a factor of -- an algorithm by Gabow that: Converts original graph into a directed graph Finds k directed trees of minimum cumulative weight Problems with the approach: Complex algorithm uses matroid graph theory Does not adapt to distributed implementation Approximate the Approximation Stick with the general idea of finding a subgraph comprising of k trees Find the trees in a greedy manner Find the minimum-spanning tree (MST) of the graph Remove the edges, find the second MST Repeat the process k times Mesh comprises of the k-msts 10

k-mst Properties For each node, it includes the k best links coming out of the node If a new link with lower weight is found : try to add the link creates a cycle delete the link with the highest weight along the cycle If an existing link becomes more expensive: Let S1 and S be the sets connected by the link Find the link connecting S1 and S in the next higher tree Replace existing link with the link from higher tree 1 5 4 3 Distributed algorithm Extend the Gallager-Humblet-Spira algorithm for finding Minimum Spanning tree Impose degree and diameter heuristics When two components merge in GHS, accept/reject the merge based on current degree bound Also reject merge if it happens at the outer ends of the components Setup a pipeline of k GHS computations Edge rejected in an earlier computation is passed to a later computation When a node detects a lower weight link, it tries to lock the links by sending a message along the associated fundamental cycle Once the links along the cycle are locked, it can swap in the new link for an old link 11

Mesh-based approaches Require a routing algorithm eventually To find alternate shortest paths Or find a distribution tree Overheads associated with routing algorithms limit scalability Fully connected graph (as in RON) scales to only 50-100 nodes Narada/k-MST approaches scale to about 1000 nodes Can we devise implicit overlay structures that do not require routing algorithms? NICE: Hierarchical Multicast Another scheme for overlay multicast Primary motivation: Scalability avoid routing overheads of Narada Make low-bandwidth applications efficient Large receiver-set applications like news and sports ticker Method: Tree based, use hierarchy Group nodes into clusters Nodes inside a cluster have all-to-all connectivity 1

NICE Clusters Nodes are added to the cluster which is closest Geographic center of cluster is chosen as a leader Cluster membership varies between k and 3k-1 Cluster is split if number exceeds 3k 1 NICE Hierarchy Group cluster leaders into higher-level cluster, elect its leader and process continues to form a hierarchy When a node joins a cluster, exceeds the cluster limit, and forms two clusters new leaders added to the higher level cluster 13

NICE Multicast Any node in the system can initiate a multicast Multicast data would travel up and down the tree links to all of the cluster leaders Cluster leaders propagate data to all of its cluster members NICE Protocol Operations Member join Find tree leader Walk down the tree, at each step find a cluster member which is closest to the joining node Eventually join the lowest level cluster Might be elected as a leader might have to replace existing leader in higher level cluster Cluster split: if joining node exceeds cluster size (3k-1) Member departure Cluster size might drop below threshold perform cluster merge with some other cluster Cluster merge Balance the number of nodes between two clusters Cluster refine Occasionally nodes seek to find better clusters 14

Overlay Multicast using DHTs Nodes in DHTs form multiple implicit tree structures Each node in a DHT has limited degree DHT maintenance costs are low: Typically O(log n) insertion and deletion costs Typically, there are no bottlenecks in the system 3 0x035E Incremental suffix-based routing 3 1 0x73FF Tapestry Mesh 0x79FE 0x44FE 0x555E 3 4 0x3FE 0xABFE 3 0x43E 0x73FE 4 4 0x13FE 0x43FE 4 3 0x39E 3 1 1 0x04FE 0x993E 1 0x190 3 1 4 0x9990 0xF990 15

Tapestry-based Multicast (Bayeux) Assume tapestry network routing tables are optimized: If multiple nodes can be used to fill in a routing table entry, use the best node Usually accomplished with limited information Create a key for each multicast Map the key to the node that hashes close to the key This node will serve as the multicast root Send data to this node Each node interested in the multicast: Routes a request to the root (similar to locating a key value) Requests from different nodes might intersect at different points in the system Nodes at intersection points remember who subscribes to a certain piece of data Announcements Final exam on Dec. 16 th Time: :00 5:00 Location: DL 0 Open book, open notes exam 16

Course Wrapup Theoretical Topics Basic distributed algorithms (message/time complexity, liveness/safety issues) Asynchronous algorithms (GHS-MST) Reasoning about distributed systems (clocks, consistent cuts, snapshots, global predicates) Consensus (synchronous/asynchronous, fail-stop/byzantine, impossibility results, Paxos) Distributed systems in practice PP file-sharing systems Distributed hash tables Routing algorithms in Internet Ad-hoc routing algorithms Security attacks and securing distributed computations Overlay networks Acknowledgements: Course material: Attiya-Welch, Lynch, Coulouris-Dollimore-Kindberg Lecture notes derived from graduate courses taught by: Jennifer Welch (Texas A&M), Srini Seshan (), Lorenzo Alvisi (UT-Austin), Hari Balakrishnan (MIT) Final Thoughts Field has matured substantially in the last few years: Previously, there were two communities that didn t quite interact with each other: Theoretical community designed complex algorithms Operating systems community designed distributed file systems and cluster operating systems Recently, focus has been on designing algorithms that were immediately put into practice Highly sophisticated execution: Most systems are analyzed, simulated, and implemented! Challenges: Issues of scale More complex services (not just object location and multicast!) Autonomous entities: not just fail-stop/byzantine node, but selfish agents Securing distributed systems 17