Module SDS: Scalable Distributed Systems. Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Module SDS: Scalable Distributed Systems Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Staff Gabriel Antoniu, DR INRIA, KERDATA Team gabriel.antoniu@inria.fr Davide Frey, CR INRIA, ASAP Team davide.frey@inria.fr

3- step Evaluation Step 1: select a research paper Step 2: Write a report on the paper Step 3: Give a talk presenting the paper Then we ll give you a grade ;)

The Course in one slide P2P for large-scale and dynamic distributed systems Fully decentralized Self-organizing Search, Load balancing Data dissemination, Local knowledge vs Global Publish-subscribe Convergencesystems (RSS) VoIP, Video streaming, Not only file-sharing Grid computing Archival systems RW file sharing applications Application-level multicast

Ok, two slides! Peer To Peer Peer-to-Peer Architectures Structured Overlays: DHT Unstructured Overlays: Gossip Applications Application-Level Multicast Video Streaming Recommendation

Module SDS The Peer-to-Peer Model

Distributed System Definition [Tan9] «A collection of independent computers that appears to its users as a single coherent system» Software: Unique image Distributed applications Distributed Systems Hardware: Autonomous computers Local OS network

Why should we decentralize? Economical reasons Performance Availability Resource aggregation Fleibility (load balancing) Privacy Growing need of working collaboratively, sharing and aggregating distributed (geographically) distributed.

How to decentralize? Tools (some) Client-server Model Peer-to-peer Model Grid Computing Cloud Goals Scalability Reliability Availability Security/Privacy

The Peer-To-Peer Model Name one peer-to-peer technology

Peer-to-Peer Systems

The Peer-to-Peer Model End-nodes become active components! previously they were just clients Nodes participate, interact, contribute to the services they use. Harness huge pools of resources accumulated in millions of end-nodes.

P2P Application Areas File sharing Information Mgmt Discover Collaboration Aggregate Filter Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Internet/Intranet Distributed Computing Grid Computing Content Storage Network Storage Caching Replication Bandwidth Content Distribution Collaborative download Edge Services VoIP

Why are We Talking of P2P Use resources at the edge of the Internet Storage CPU cycles Bandwidth Content Collectively produce services Nodes share both benefits and duties Irregularities and dynamics become the norm Essential for Large Scale Distributed System

Main Advantages of P2P Scalable higher demand à higher contribution! Increased (massive) aggregate capacity Utilize otherwise wasted resources Fault Tolerant No single point of control Replication makes it possible to withstand failures Inherently handle dynamic conditions

Main Challenges in P2P Fairness and Load Balancing Dynamics and Adaptability Fault-Tolerance: Continuous Maintenance Self-Organization

Key Concept: Overlay Network B A C Overlay Network Physical Network

Overlay types Unstructured P2P Structured P2P Any two nodes can establish a link Topology evolves at random Topology strictly determined by node IDs Topology reflects desired properties of linked nodes

Distributed Hash Tables

Hash Table Keys Hash function Hash values Stored data Efficient information lookup

Distributed Hash Table (DHT) nodes Operations: insert(k,v) lookup(k,v) k1,v1 k4,v4 k2,v2 k3,v3 routing k,v k,v Store <key,value> pairs Efficient access to a value given a key Must route hash keys to nodes.

Distributed Hash Table Operations: insert(k,v) lookup(k,v) map to P2P overlay network k1,v1 k4,v4 k2,v2 nodes k3,v3 send(m,k) k,v k,v Insert and Lookup send messages keys P2P Overlay defines mapping between keys and physical nodes Decentralized routing implements this mapping

DHT Eamples

Pastry (MSR/RICE) node key Id space NodeId = 128 bits Nodes and key place in a linear space (ring) Mapping : a key is associated to the node with the numerically closest nodeid to the key

Pastry (MSR/Rice) Naming space : Ring of 128 bit integers nodeids chosen at random Identifiers are a set of digits in base 1 Key/node mapping key associated with the node with the numerically closest node id Routing table: Matri of 128/4 lines et 1 columns routetable(i,j): nodeid matching the current node identifier up to level I with the net digit is j Leaf set 8 or 1 closest numerical neighbors in the naming space Proimity Metric Bias selection of nodes

Pastry: Routing table(#a1fc) 0 1 2 3 4 7 8 9 a b c d e f 0 1 2 3 4 7 8 9 a b c d e f 0 1 2 3 4 7 8 9 b c d e f a 0 a 2 a 3 a 4 a a a 7 a 8 a 9 a a a b a c a d a e a f log 1 N liges Line 0 Line 1 Line 2 Line 3

Pastry: Routing Route(d4a1c) d4a1c d471f1 d47c4 d42ba d4213f d13da3 Properties log 1 N hops Size of the state maintained (routing table): O(log N) a1fc

Routing algorithm (on node A) (1) if ( ) (2) // is within range of our leaf namespace set set (3) forward to,s.th. is minimal; (4) else i Rl () // use the routing table line l, 0 l ë () Let ; Li (7) if ( ) Dl (8) forward to ; (9) (10) else (11) // rare case (12) forward to,s.th. (13), (14) (1) (1) :entry of the routing table R, 128/b :ith closest nodeid in the leafset : value of the l digits of û key D 0 i 2 SHL( A, B) :length of the shared prefi between A and B b,

Pastry Eample 023 232 333 b=4 Route to 311 333 023 100 113 122 132 102 N/A 103 321 132 313 310 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 N/A 321 330 N/A 332 132 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 313 333 320 322 N/A 132 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 321 333 310 N/A N/A 132 282 232

Pastry Eample b=4 Route to 311 333 023 100 313 310 321 023 100 232 302 321 333 N/A N/A 313 132 282 232

Node departure Eplicit departure or failure Replacement of a node The leafset of the closest node in the leafset contains the closest new node, not yet in the leafset Update from the leafset information Update the application

Failure detection Detected when immediate neighbours in the name space (leafset) can no longer communicate Detected when a contact fails during the routing Routing uses an alternative route

Fiing the routing table of A Repair R d l A contacts another entry (at random) so that ( i from :entry of the routing table of A to repair R i l+ ¹ d) and asks for entry R d l 1 ( i ¹ d) if no node in line l R i l from the same line, otherwise another entry answers the request.

State maintenance Leaf set is aggressively monitored and fied Routing table are lazily repaired When a hole is detected during the routing Periodic gossip-based maintenance

as possible Reducing latency Random assignment of nodeid: Nodes d47f numerically close are geographically (topologically) distant Objective: fill the routing table with nodes so that routing hops are d47c4 fdacd as short (latency wise)

Eploiting locality in Pastry Neighbour selected based of a network proimity metric: Closest topological node Satisfying the constraints of the routing table routetable(i,j): nodeid corresponding to the current nodeid up to level i net digit = j nodes are close at the top level of the routing table Farther nodes at the bottom levels of the routing tables

Proimity routing in Pastry Leaf set d47c4 d4a1c Route(d4a1c) d471f1 d47c4 d42ba d4213f d13da3 Topological space a1fc Naming space d42ba d4213f a1fc d13da3

Locality 1. Joining node X routes asks A to route to X Path A,B, -> Z Z numerically closest to X X initializes line i of its routing table with the contents of line i of the routing table of the ith node encountered on the path 2. Improving the quality of the routing table X asks to each node of its routing table its own routing state and compare distances Gossip-based update for each line (20mn) Periodically, an entry is chosen at random in the routing table Corresponding line of this entry sent Evaluation of potential candidates Replacement of better candidates New nodes gradually integrated

Node insertion in Pastry d4a1c d471f1 d47c4 d42ba d4213f d47c4 Topological space Route(d4a1c) d13da3 a1fc New node: d4a1c Naming space d42ba d4213f a1fc d13da3

Node discovery: approimation of the closest node discover(seed) //destination nodes=getleafset(seed) //probes on the leaf set nearnode=pickclosest(nodes) //pick closest in the leafset depth=getmaroutingtablelevel(nearnode) //highest line in RT closest=nil while (closest!= nearnode) end closest=nearnode nodes = getroutingtable(nearnode, depth) nearnode =pickclosest(nodes) if (depth >0) depth = depth 1 return closest Routing table property used to move eponentially closer to the closest Pick the closest node at each level and get the the net level from it (bottom up) Constant number of probes at each level Probed nodes get eponentially closer at each level Fill routing table entries from the closest node

Performance 1.9 slower than IP on average

References Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-topeer systems", Middleware'2001, Germany, November 2001.