Module SDS: Scalable Distributed Systems. Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Similar documents
Large-scale distributed systems. Anne-Marie Kermarrec

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

08 Distributed Hash Tables

CS4700/CS5700 Fundamentals of Computer Networks

Peer- Peer to -peer Systems

Distributed Hash Tables

Distributed Hash Table

Telematics Chapter 9: Peer-to-Peer Networks

Architectures for Distributed Systems

Slides for Chapter 10: Peer-to-Peer Systems. From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design

GNUnet Distributed Data Storage

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS514: Intermediate Course in Computer Systems

Motivation for peer-to-peer

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Slides for Chapter 10: Peer-to-Peer Systems

A Scalable Content- Addressable Network

Reminder: Distributed Hash Table (DHT) CS514: Intermediate Course in Operating Systems. Several flavors, each with variants.

Chapter 10: Peer-to-Peer Systems

Symphony. Symphony. Acknowledgement. DHTs: : The Big Picture. Spectrum of DHT Protocols. Distributed Hashing in a Small World

CS5412: TIER 2 OVERLAYS

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Overlay Networks. Behnam Momeni Computer Engineering Department Sharif University of Technology

Athens University of Economics and Business. Dept. of Informatics

Peer-to-peer systems and overlay networks

Making Gnutella-like P2P Systems Scalable

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Peer-to-Peer Networks Pastry & Tapestry 4th Week

P2P Network Structured Networks (IV) Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Analysis of Peer-to-Peer Protocols Performance for Establishing a Decentralized Desktop Grid Middleware

GLive: The Gradient overlay as a market maker for mesh based P2P live streaming

Distributed Hash Tables

Unit 8 Peer-to-Peer Networking

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Kademlia: A P2P Informa2on System Based on the XOR Metric

Searching for Shared Resources: DHT in General

Telecommunication Services Engineering Lab. Roch H. Glitho

Definition of a DS A collection of independent computers that appear to its user as a single coherent system.

Should we build Gnutella on a structured overlay? We believe

PUB-2-SUB: A Content-Based Publish/Subscribe Framework for Cooperative P2P Networks

Introduction to Peer-to-Peer Systems

Searching for Shared Resources: DHT in General

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

internet technologies and standards

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-Peer Systems. Chapter General Characteristics

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

Debunking some myths about structured and unstructured overlays

Distributed Information Processing

Performance and dependability of structured peer-to-peer overlays

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Distributed Hash Tables: Chord

INF5070 media storage and distribution systems. to-peer Systems 10/

Secure Distributed Storage in Peer-to-peer networks

CIS 700/005 Networking Meets Databases

INF5071 Performance in distributed systems: Distribution Part III

Peer-to-Peer Internet Applications: A Review

Distributed Systems. peer-to-peer Johan Montelius ID2201. Distributed Systems ID2201

Structured Peer-to-Peer Networks

Ken Birman. Cornell University. CS5410 Fall 2008.

Overlay Networks for Multimedia Contents Distribution

LECTURE 3: CONCURRENT & DISTRIBUTED ARCHITECTURES

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

Distributed Knowledge Organization and Peer-to-Peer Networks

6. Peer-to-peer (P2P) networks I.

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Scalable overlay Networks

Scalable overlay Networks

Winter CS454/ Assignment 2 Instructor: Bernard Wong Due date: March 12 15, 2012 Group size: 2

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Distributed Systems: Architectural Issues

Priority Based Routing for Mobile Peer-To-Peer Communications

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Overlay and P2P Networks. Unstructured networks: Freenet. Dr. Samu Varjonen

Peer-to-Peer (P2P) Systems

Fast Topology Management in Large Overlay Networks

Ossification of the Internet

Path Optimization in Stream-Based Overlay Networks

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Lecture 8: Application Layer P2P Applications and DHTs

Exploiting the Synergy between Peer-to-Peer and Mobile Ad Hoc Networks

Overlay Multicast. Application Layer Multicast. Structured Overlays Unstructured Overlays. CAN Flooding Centralised. Scribe/SplitStream Distributed

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

A Decentralized Content-based Aggregation Service for Pervasive Environments

Hybrid Overlay Structure Based on Random Walks

Peer-to-Peer Systems and Distributed Hash Tables

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Modern Technology of Internet

Distributed K-Ary System

Transcription:

Module SDS: Scalable Distributed Systems Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Staff Gabriel Antoniu, DR INRIA, KERDATA Team gabriel.antoniu@inria.fr Davide Frey, CR INRIA, ASAP Team davide.frey@inria.fr

3- step Evaluation Step 1: select a research paper Step 2: Write a report on the paper Step 3: Give a talk presenting the paper Then we ll give you a grade ;)

The Course in one slide P2P for large-scale and dynamic distributed systems Fully decentralized Self-organizing Search, Load balancing Data dissemination, Local knowledge vs Global Publish-subscribe Convergencesystems (RSS) VoIP, Video streaming, Not only file-sharing Grid computing Archival systems RW file sharing applications Application-level multicast

Ok, two slides! Peer To Peer Peer-to-Peer Architectures Structured Overlays: DHT Unstructured Overlays: Gossip Applications Application-Level Multicast Video Streaming Recommendation

Module SDS The Peer-to-Peer Model

Distributed System Definition [Tan9] «A collection of independent computers that appears to its users as a single coherent system» Software: Unique image Distributed applications Distributed Systems Hardware: Autonomous computers Local OS network

Why should we decentralize? Economical reasons Performance Availability Resource aggregation Fleibility (load balancing) Privacy Growing need of working collaboratively, sharing and aggregating distributed (geographically) distributed.

How to decentralize? Tools (some) Client-server Model Peer-to-peer Model Grid Computing Cloud Goals Scalability Reliability Availability Security/Privacy

The Peer-To-Peer Model Name one peer-to-peer technology

Peer-to-Peer Systems

The Peer-to-Peer Model End-nodes become active components! previously they were just clients Nodes participate, interact, contribute to the services they use. Harness huge pools of resources accumulated in millions of end-nodes.

P2P Application Areas File sharing Information Mgmt Discover Collaboration Aggregate Filter Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Internet/Intranet Distributed Computing Grid Computing Content Storage Network Storage Caching Replication Bandwidth Content Distribution Collaborative download Edge Services VoIP

Why are We Talking of P2P Use resources at the edge of the Internet Storage CPU cycles Bandwidth Content Collectively produce services Nodes share both benefits and duties Irregularities and dynamics become the norm Essential for Large Scale Distributed System

Main Advantages of P2P Scalable higher demand à higher contribution! Increased (massive) aggregate capacity Utilize otherwise wasted resources Fault Tolerant No single point of control Replication makes it possible to withstand failures Inherently handle dynamic conditions

Main Challenges in P2P Fairness and Load Balancing Dynamics and Adaptability Fault-Tolerance: Continuous Maintenance Self-Organization

Key Concept: Overlay Network B A C Overlay Network Physical Network

Overlay types Unstructured P2P Structured P2P Any two nodes can establish a link Topology evolves at random Topology strictly determined by node IDs Topology reflects desired properties of linked nodes

Distributed Hash Tables

Hash Table Keys Hash function Hash values Stored data Efficient information lookup

Distributed Hash Table (DHT) nodes Operations: insert(k,v) lookup(k,v) k1,v1 k4,v4 k2,v2 k3,v3 routing k,v k,v Store <key,value> pairs Efficient access to a value given a key Must route hash keys to nodes.

Distributed Hash Table Operations: insert(k,v) lookup(k,v) map to P2P overlay network k1,v1 k4,v4 k2,v2 nodes k3,v3 send(m,k) k,v k,v Insert and Lookup send messages keys P2P Overlay defines mapping between keys and physical nodes Decentralized routing implements this mapping

DHT Eamples

Pastry (MSR/RICE) node key Id space NodeId = 128 bits Nodes and key place in a linear space (ring) Mapping : a key is associated to the node with the numerically closest nodeid to the key

Pastry (MSR/Rice) Naming space : Ring of 128 bit integers nodeids chosen at random Identifiers are a set of digits in base 1 Key/node mapping key associated with the node with the numerically closest node id Routing table: Matri of 128/4 lines et 1 columns routetable(i,j): nodeid matching the current node identifier up to level I with the net digit is j Leaf set 8 or 1 closest numerical neighbors in the naming space Proimity Metric Bias selection of nodes

Pastry: Routing table(#a1fc) 0 1 2 3 4 7 8 9 a b c d e f 0 1 2 3 4 7 8 9 a b c d e f 0 1 2 3 4 7 8 9 b c d e f a 0 a 2 a 3 a 4 a a a 7 a 8 a 9 a a a b a c a d a e a f log 1 N liges Line 0 Line 1 Line 2 Line 3

Pastry: Routing Route(d4a1c) d4a1c d471f1 d47c4 d42ba d4213f d13da3 Properties log 1 N hops Size of the state maintained (routing table): O(log N) a1fc

Routing algorithm (on node A) (1) if ( ) (2) // is within range of our leaf namespace set set (3) forward to,s.th. is minimal; (4) else i Rl () // use the routing table line l, 0 l ë () Let ; Li (7) if ( ) Dl (8) forward to ; (9) (10) else (11) // rare case (12) forward to,s.th. (13), (14) (1) (1) :entry of the routing table R, 128/b :ith closest nodeid in the leafset : value of the l digits of û key D 0 i 2 SHL( A, B) :length of the shared prefi between A and B b,

Pastry Eample 023 232 333 b=4 Route to 311 333 023 100 113 122 132 102 N/A 103 321 132 313 310 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 N/A 321 330 N/A 332 132 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 313 333 320 322 N/A 132 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 321 333 310 N/A N/A 132 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 321 333 N/A N/A 313 132 282 232

Node departure Eplicit departure or failure Replacement of a node The leafset of the closest node in the leafset contains the closest new node, not yet in the leafset Update from the leafset information Update the application

Failure detection Detected when immediate neighbours in the name space (leafset) can no longer communicate Detected when a contact fails during the routing Routing uses an alternative route

Fiing the routing table of A Repair R d l A contacts another entry (at random) so that ( i from :entry of the routing table of A to repair R i l+ ¹ d) and asks for entry R d l 1 ( i ¹ d) if no node in line l R i l from the same line, otherwise another entry answers the request.

State maintenance Leaf set is aggressively monitored and fied Routing table are lazily repaired When a hole is detected during the routing Periodic gossip-based maintenance

as possible Reducing latency Random assignment of nodeid: Nodes d47f numerically close are geographically (topologically) distant Objective: fill the routing table with nodes so that routing hops are d47c4 fdacd as short (latency wise)

Eploiting locality in Pastry Neighbour selected based of a network proimity metric: Closest topological node Satisfying the constraints of the routing table routetable(i,j): nodeid corresponding to the current nodeid up to level i net digit = j nodes are close at the top level of the routing table Farther nodes at the bottom levels of the routing tables

Proimity routing in Pastry Leaf set d47c4 d4a1c Route(d4a1c) d471f1 d47c4 d42ba d4213f d13da3 Topological space a1fc Naming space d42ba d4213f a1fc d13da3

Locality 1. Joining node X routes asks A to route to X Path A,B, -> Z Z numerically closest to X X initializes line i of its routing table with the contents of line i of the routing table of the ith node encountered on the path 2. Improving the quality of the routing table X asks to each node of its routing table its own routing state and compare distances Gossip-based update for each line (20mn) Periodically, an entry is chosen at random in the routing table Corresponding line of this entry sent Evaluation of potential candidates Replacement of better candidates New nodes gradually integrated

Node insertion in Pastry d4a1c d471f1 d47c4 d42ba d4213f d47c4 Topological space Route(d4a1c) d13da3 a1fc New node: d4a1c Naming space d42ba d4213f a1fc d13da3

Node discovery: approimation of the closest node discover(seed) //destination nodes=getleafset(seed) //probes on the leaf set nearnode=pickclosest(nodes) //pick closest in the leafset depth=getmaroutingtablelevel(nearnode) //highest line in RT closest=nil while (closest!= nearnode) end closest=nearnode nodes = getroutingtable(nearnode, depth) nearnode =pickclosest(nodes) if (depth >0) depth = depth 1 return closest Routing table property used to move eponentially closer to the closest Pick the closest node at each level and get the the net level from it (bottom up) Constant number of probes at each level Probed nodes get eponentially closer at each level Fill routing table entries from the closest node

Performance 1.9 slower than IP on average

References Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-topeer systems", Middleware'2001, Germany, November 2001.